Adoption of Automatic Distributed Analysis Environment in MetaCentrum (Czech National Grid Initiative)
CESNET technical report 13/2010
J. Kmuníček, L. Hejtmánek, J. Brezovský, V. Kaplan, T. Hnízdil, L. Matyska
Received 29. 11. 2010
Abstract
This study is devoted to deployment of DIANE/Ganga framework for solving computational demands through utilization of pilot jobs approach. The DIANE/Ganga framework has been modified to fully support Czech NGI environment. Two distinct application areas – virtual molecular screening and radiative transfer analysis – have been selected as the use cases for demonstration of the applicability and usability of the implemented system. Here we describe the current MetaCentrum scheduling and jobs planning system and especially its latest modifications required to support user communities with their specific computational jobs demands that do not optimally fit into present MetaCentrum utilization. Adoption of automatic distributed analysis environment DIANE has been selected as the way how to support these new jobs types within MetaCentrum NGI through MetaCentrum.
Keywords: Grid, pilot jobs, automatic job submission, DIANE, Ganga
1 Introduction
Historically, MetaCentrum [1] has been oriented towards support and processing of batch jobs (either one or many–processors). However, there are application domains capable to generate huge number of relatively short computational jobs (i.e., thousands or tens of thousands jobs with several hours of individual job length). Submitting such amounts of jobs into regular batch system causes their non-proportional waiting in a queue. In MetaCentrum, we have recently started to deploy new approach that is able to handle such enormous amounts of jobs. Requests for such a support are fully consistent with current advances in virtualisation [2] allowing to support new types of computational jobs such an interactive or even virtual clusters [3] (in case of user requests for environments fully customised according to researcher needs).
Based on the current MetaCentrum utilization statistics, we identified two areas producing vast amounts of jobs submitted into a standard batch system in non-optimal way. The first area is virtual screening of enzymatic molecules, haloalkane dehalogenases, that is able to produce tens/hundreds thousands of jobs based on the volume of ligands database used in the study. The second use case is related to radiative transfer studies – application domain studying energy transfer – through generation of many short lifetime jobs. In the following, we introduce shortly both the use cases.
1.1 Virtual Molecular Screening
Haloalkane dehalogenases (HLD) are bacterial enzymes that cleave the carbon-halogen bond of the halogenated aliphatic compounds by a hydrolytic mechanism. Haloalkane dehalogenases require a water molecule as the only co-factor for the reaction. These enzymes became an important model system for investigation of fundamental principles of enzymatic catalysis [4].
However their catalytic rates determined up to now with various substrates are much lower than usual rates of many other enzymes [5]. This and other evidences let us speculate, that “natural” substrates for some of the haloalkane dehalogenases are yet to be discovered. In practical terms, haloalkane dehalogenases have a potential application in chemical synthesis of compounds for pharmaceutical use, detoxification and biosensing of subsurface pollutants and recovery of industrial side products [7].
![[Image]](figure1.png)
Figure 1. Distribution of corrected binding energies of 3413 potential substrates of DhaA. Control substrates of DhaA are indicated by red bars.
Virtual screening offers a very efficient way to study a substrate specificity of enzymes and to complement experimental screening [11]. The virtual screening was employed for the first time to investigate substrate specificity of halogenalkane dehalogenase DhaA aiming to find new classes of potential substrates and inhibitors. 44,963 ligands were extracted from EDULISS database of small molecules. These molecules were docked with AutoDock4 [13] into the crystal structure of DhaA together with its 18 known substrates as positive controls. 9818 docked ligands have their halogen atom within 4.5 Å from halogen-stabilizing residues and their binding energies corrected for molecular weight ranged from -3.63 to 25.70 kcal/mol. 3413 ligands were selected as potential substrates of DhaA enzyme on the basis of their favorable binding energies, their position towards functionally important amino acid residues of the enzyme active site and their fulfillment of geometric criteria necessary for initiation of enzymatic reaction (Figure 1).
![[Image]](figure2.png)
Figure 2. Set of 54 potential substrates (cyan sticks) docked into the active site of DhaA enzyme (black wires) with catalytic pentade shown in green sticks.
Of these, 54 unique ligands ranked better than the best known substrate of DhaA indicating that these molecules belong to the new substrate classes currently unidentified for haloalkane dehalogenases (Figure 2). There are seven other HLDs with known crystal structures which are currently being processed by virtual screening work-flow described above.
1.2 Radiative Transfer Study
Chlorophyll is a green foliar pigment, which plays an important role in photosynthesis. It absorbs and transports the energy of incoming sunlight. Five different chlorophyll types have been identified, the highest chlorophyll concentration in tissues is of Ca and Cb types. Concentration of Ca+b varies in different plant species during the year. Chlorophyll concentration also responds to leaf physiological stress and overall concentration of Ca+b can be used as a proxy of vegetation health status. Indirect estimation of chlorophyll content by the remote sensing techniques became possible with the introduction of hyperspectral remote sensors. Imagery acquired by hyperspectral instruments offers possibility of narrow band evaluation, which made possible more precise evaluation of chlorophyll absorption peaks (430 and 662 nm for Ca; 465 and 642 nm for Cb).
Remote Sensing workgroup on ISBE (Institute of Systems Biology and Ecology of Academy of Science Czech Republic) uses MetaCentrum computing resources mainly to prepare supplemental data for estimation of chlorophyll content of Norway spruce (Picea abies Karst.) tissues. The retrieval of chlorophyll content is done from hyperspectral image data using radiative transfer inversion techniques with DART radiative transfer model. Discrete Anisotropic Radiative Transfer model (DART) has been developed in CESBIO laboratory (Center for the Study of the BIOsphère) for simulation of radiative transfer in the “Earth-Atmosphere” system. The input of the model is a 3D model of landscape including optical, geometric and biophysical parameters of model elements – i.e. optical properties of the ground, positions and dimensions of trees on the landscape, distribution of the tree biomass, tree leaves types and many other properties.
The DART model [14] can be applied in these two modes:
in forward mode, the DART model simulated transfer of energy in the simulated scene. Results of the model are images which would be (theoretically) acquired by airborne/satellite imaging sensors flown over the simulated scene.
in indirect inverse mode, the images generated by DART forward runs are compared to images acquired by imaging sensor. The matching of acquired images to simulated images with known parameters allows estimation of parameters from set-up of the simulated scene (e.g. chlorophyll content).
![[Image]](figure3.png)
Figure 3. Aggregated simulation results, arranged for further processing. Each simulation run results in one tile.
In order to perform a radiative transfer model based retrieval, configuration of trees and parameters which are expected to appear in a studied area have to be prepared. The more precise parametrization of input scenes means the more precise retrieval results, so appropriate care should be taken to achieve better results.
Preparation of simulation parameters consists of field-work measuring of tree parameters (as mean trunk dimensions, mean live and dead crown height, …), optical properties (needle reflectance and transmittance) and some other supplemental measurements (vegetation under-story optical properties, …) All measured properties are summarized, evaluated and appropriate simulation mock-ups are built. Unknown properties (chlorophyll content, leaf area index, water content, …) are used in several expected values.
![[Image]](figure4.png)
Figure 4. Image acquired by AISA/Eagle, CIR colors. Černá Hora, Šumava National Park.
![[Image]](figure5.png)
Figure 5. Map of chlorophyll content.
Depends on amount of known and unknown properties, number of simulations varies from several hundreds to several thousands. Prepared simulations are submitted to (super-)computing facilities. Each of the simulation runs are independent on each other, making this type of computation load suitable for computing grid environment. Each simulation runs for a relatively short time – from one to three hours. Results of all simulations are stored to the database (sometimes called lookup tables) with respect of their input properties. The prepared lookup tables are then used to train neural network or some other form of optimization tool. Trained neural network can be used to perform the retrieval.
2 Pilot jobs
Currently, MetaCentrum uses a batch system to run user jobs. We
are migrating from PBSPro
(portable batch system) to Torque [16]
(which is a free clone of PBSPro). Both these systems have common
interface to the users. The user submits jobs into work
queues. The batch system picks up the jobs from the queues and
assigns them to the workers according to actual schedule, policy
and queue priorities. During the submit, the user can specify a
job queue and some other parameters (usually specifying properties
of the worker the job can run on, memory limit, number of required
processors and so on). The user submits a job using qsub
command.
The batch system has several shortcomings. First, a job submitted into job queue can be delayed for a very long time. This is the case when all the worker nodes are busy. Second, a job submission takes some amount of time (e.g., several seconds). If the user is about to submit 10,000 jobs, it would take several hours which is not user friendly. Last but not least, the batch system is unable to automatically resubmit failed jobs (i.e., jobs that did not finish correctly).
Consequently, the used batch system is not appropriate for use cases which spawn hundreds or thousands jobs, the jobs that have short running time and jobs which are not trivial to manually resubmit in case of job failures.
Our use cases fit into this area. In both cases, the user spawns a number of relatively short jobs and it is not feasible that the user checks all results and resubmits the failed jobs manually.
The so-called “pilot jobs” bring solution for this problem. Instead of submitting user jobs directly into the batch system, the user submits “pilots”. The pilot is a worker that sequentially executes several user jobs until the pilot is terminated either by the batch system (exceeding work queue time limits) or because all user jobs finished. Pilot job containers are just standard jobs for the batch system, including limits for numbers of running or submitted jobs and their maximum wall clock time. Per-job and per-user limits apply to all jobs in a pilot as a whole, the pilot therefore doesn’t loosen the limits except – but on purpose – the number of user jobs (jobs executed by the pilot) that can be submitted.
Beside pilots, there exists the so-called “master” that manages the pilots. The master is a process that runs on any node that is reachable by the pilots over the network. The pilots queries the master for jobs to execute or the master pushes jobs to pilots (depending on particular approach). The master is able to resubmit failed jobs (e.g., several times before the job is marked as a recidivist).
The mechanism of pilot jobs is described in the literature by many different names. Apart from term pilot jobs one can encounter terms as remote agent control, pilot agent, leasing jobs slots, infiltration system, just-in-time model, dispatcher-executor model or master-worker architecture.
3 DIANE Framework
DIANE is a lightweight job execution control framework used for distributed scientific applications. The development of the framework started in 2002 in R&D section of CERN IT department. The primary aim of DIANE usage is to improve the efficiency of computational jobs execution by provision of automatic load balancing, fine-grained scheduling and failure recovery.
DIANE framework itself is implemented in object-oriented Python utilizing the basic advantages of pilot jobs. The core functionality is based on master–worker approach where a master is a server executed by a user for one single job and a worker is a pilot job executed at a computing (worker) node. The interaction between master server and pilots is ensured by omniORB application that is the implementation of CORBA standard. Generic architecture can be seen in Figure 6.
![[Image]](figure6.png)
Figure 6. Generic architecture of DIANE/Ganga model of distributed analysis tool. WN – Worker Node, CE – Computing Element.
DIANE framework consists of two parts: (i) job scheduler and manager (DIANE), (ii) pilot submitter (Ganga). This approach strictly separates resource allocation and application execution. The user prepares his job database for the DIANE (job scheduler and manager) and sets up Ganga (pilot submitter). The pilot submitter is further responsible for pilot spawning (e.g., maintaining a specific amount of pilots always active), the job scheduler is responsible for job to pilot assignment, re-execution of failed jobs and so on.
The Ganga submitter supports a number of backends to various
batch system such as PBS, LSF, SGE, Condor, LCG/EGEE/EGI Grid. The
backend responsible for the particular batch system is called
Agent Factory. We are focused on PBS. Interaction between the
DIANE framework and PBS can be seen in Figure 7. The user sets up the pilot server (Ganga) that
uses PBS commands (qsub) to instruct the PBS server to
run the pilots. The main feature of this concept is the user does
not need to understand details and internals of the PBS
system. The user is given only a generic pilot interface.
![[Image]](figure7.png)
Figure 7. Illustrative scheme of DIANE/Ganga interaction with PBS.
4 DIANE Usage in MetaCentrum
The DIANE framework is not intended to be used as is. The user adopting the DIANE for actual production infrastructure has to provide: a job scheduler (for DIANE) and agent factory (for Ganga). Both user groups can share the same Agent Factory but there has to be different job schedulers.
4.1 Job scheduler
DIANE provides sample job submitters (Python object classes) that are able to parse a job database, extract particular jobs and maintain sets of submitted, running, failed and succeeded jobs. Both groups use different databases of their jobs and we can hardly provide interface common enough so that we can have only a single job submitter. We developed two DIANE job submitters, one for each group.
4.2 Agent factory
Similarly to DIANE, the Ganga provides sample Agent Factories (Python object classes) that are able to submit pilots through a batch system. In MetaCentrum environment, we use PBS which is supported by Ganga. Ganga provides interface to gLite as well but we did not examine it so far (MetaCentrum does not currently run gLite). The sample agent factory is able to do simple submits only. The user cannot specify a job queue nor properties of worker nodes. Ganga operates with a single job queue by default. Number of running pilots is limited by the user, i.e., the user can specify that up to N pilots should run.
We had to create a new Agent Factory that is able to provide
similar interface as the qsub command. The user can
specify job queue and worker properties when starting Ganga (and
its Agent Factory). We needed to support multiple queues for a
single set of jobs (as MetaCentrum uses production and testing
environment distinguished by queues, our user groups use both
production and testing environments concurrently, thus they are
submitting jobs into different queues concurrently). To provide
maximum throughput (to complete as many jobs as possible in as
short time as possible), we changed pilot submission logic as
follows. A user can specify how many pilots should be submitted
(enqueued, not running). It means that there are as many pilots
running as possible – if all submitted pilots are running, new pilots are submitted into job queues.
4.3 Application module
We deployed an application module for the use cases. Typical usage
is as follows, more details can be found on Diane and DIANE_MetaCentrum
pages. The source code can be found in /software/diane-2.0 on
all MetaCentrum nodes.
Loading the module:
module add diane-2.0
Starting the master:
diane-autodock-run autodockShare.run \ -f /home/login/mydatabase.dat \ -s /home/login/bin/executable
where /home/login/mydatabase.dat contains user job items that are passed as arguments to the /home/login/bin/executable script.
Starting pilot submitter:
ganga SimpleAgentFactory.py --enable=True \ --location=/afs/domain/myhome/benchmark.tar.gz \ --name=benchmarkExecutable --cycles=10000 \ --diane-worker-number=2
where
--enablespecifies whether to run or not benchmark script,--locationspecifies location of benchmark tarball,–namespecifies name of a script inside the benchmark tarball to run,--cyclesspecifies expected benchmark score,--diane-worker-numberspecifies number of submitted (enqueued, not running) pilots.
4.4 Implementation Details
Beside generic enhancements described above, we had to solve a problem related to the first user group – virtual-molecular screening. Individual jobs (dockings) are not homogeneous, similarly, the worker nodes are not homogeneous (CPU power differs). Running times of particular jobs are from several minutes to many hours. Jobs should be assigned to pilots effectively, i.e., a pilot should be given such a job that is able to finish before the pilot terminates.
The user developed a heuristics to estimate running time of each job. He has a benchmark that evaluates CPU power of a worker node (the node that runs the pilot). Using benchmark evaluation and job database, we are able to estimate which jobs can be finished on a pilot before the pilot terminates.
As job assignment to pilots is fully managed by the DIANE and pilots submission is fully managed by the Ganga’s Agent factory. It is obvious that we need to benchmark the pilot only once and not before each execution of user job. The DIANE nor the Ganga are not prepared for such benchmarking.
We changed both the DIANE and Ganga so that the running pilot executes the benchmark, sends results to the DIANE which selects appropriate jobs for the pilot. This modus operandi is optional thus the Agent Factory is usable for both the user groups.
We further enhanced DIANE by adding two features. The first feature adds an option to specify an epilogue script that is executed after all jobs are successfully finished. The script can notify the user via mail for example. The second feature adds an option to dump job database to a file after DIANE terminates. In the file, the user can find which jobs are completed, which are pending, and which are failed. Using the file, job processing can be resumed.
4.5 Implementation Issues
During testing and pilot deployment, we had to solve several problems with DIANE and Ganga framework.
The first problem is related to Ganga. Ganga monitors pilots via a shared file on a global file system. This file system needs to be accessible from the node running the Ganga submitter and all nodes running pilots. We chose NFS version 4 for these control files. However, we discovered a bug in NFSv4 code on MetaCentrum worker nodes that led to node crashes. We fixed the bug in Linux kernel but in mean time, we used AFS (the kernel updates are not immediate across whole MetaCentrum). With AFS, the Ganga submitter saw all the pilots in submitted state, none was running. We had to change the Ganga code to respect AFS open-close semantics of shared files. Content of an opened file is not visible to other clients until the file is closed, the pilot kept the file opened.
The second problem is related to both Ganga and DIANE. Both parts fork processes or threads for each running pilot. In the case of processes, it means about 20 MB RAM per running pilot. If we had 200 running pilots, we would need 4 GB RAM only for management which is not sane. In the case of threads, only about 400 MB RAM is consumed for 200 running pilots. Choosing processes or threads seems to be depending on architecture. IA32 architecture uses processes, x86_64 architecture uses threads. Further investigation is needed to either eliminate threads/processes dependency on running pilots or to use always the threads.
The third problem is related to Python. Due to some incompatibilities among Python 2.4 and the others, it is required that whole framework strictly uses Python 2.4. While on the control server (running DIANE submitter and Ganga submitter) it is easily achievable, the pilot itself is a Python script that is interpreted by default Python interpreter (which is Python 2.5 in MetaCentrum). We needed to force the pilot script to use particular Python interpreter.
The fourth problem is related to interface to PBS. If qsub refuses to submit the pilot (e.g., because of wrong queue specification, wrong CPU requirements or because of failure of PBS), Ganga tries to spin very fast submitting the pilots each failing. The original Ganga did not even display any error returned from the qsub which led to user confusion. We are working on this issue.
5 Future Work
Our future work is focused on fixing problems mentioned in the previous section. We need to find a solution to processes/threads problem otherwise we need to dedicate some powerful worker nodes just to DIANE/Ganga management. We also need improve interface to PBS and react appropriately on any qsub error, we need to add qsub throttling if Ganga submitter is refused because of queue or any other limits.
6 Conclusions
Here we present a implementation of automatic analysis tools setup supporting two different application use cases. The generic DIANE/Ganga framework has been successfully adopted to local requirements for run in Czech NGI environment and simultaneously we plan serious modification to allow its smooth transition for run in the worldwide EGI environment supporting gLite middleware. Currently, our prototype is being seriously tested by primarily targeted researchers and in near future we expect its transition form prototype solution to fully production type of utilization.
Moreover, as we provide a generic solution this can be advantageously provided to other user communities and scientific domains with similar application patterns as various parametric studies or data-mining tasks.
7 Acknowledgements
The presented work is supported by EGI-InSPIRE project funded by European Commission (contract number RI-261323).
References
| [1] | KŘENKOVÁ, I.; ANTOŠ, D.; MATYSKA, L. MetaCentrum Yearbook 2009. Praha: CESNET, 2010. x,160 p. ISBN 978-80-904173-7-3. |
| [2] | DENEMARK, J.; RUDA, M.; MATYSKA L. Virtualizing METACenter Resources Using Magrathea. Technical report 25/2007. Praha: CESNET, 2007. |
| [3] | RUDA, M.; ŠUSTR, Z.; SITERA, J.; ANTOŠ, D.; HEJTMÁNEK, L.; HOLUB, P.; MULAČ, M. Virtual Clusters as a New Service of MetaCentrum, the Czech NGI. Technical report 17/2009. Praha: CESNET, 2009. |
| [4] | JANSSEN, D. B. Discovery of Stereoselective Haloalkane Dehalogenase: New Tool for Asymmetric Synthesis. Current Opinion in Chemical Biology. 2004, vol. 8, p. 150–159. |
| [5] | DAMBORSKÝ, J.; RORIJE, E.; JESENSKÁ, A.; NAGATA, Y.; KLOPMAN, G.; PEIJNENBURG, W. J. G. M. Structure-specificity relationships for haloalkane dehalogenases. Environmental Toxicology and Chemistry. 2001, vol. 20, p. 2681–2689. |
| [6] | PROKOP, Z.; MONINCOVÁ, M.; CHALOUPKOVÁ, R.; KLVAŇA, M.; NAGATA, Y.; JANSSEN, D. B.; DAMBORSKÝ, J. Catalytic Mechanism of the Haloalkane Dehalogenase LinB from Sphingomonas paucimobilis UT26. Journal of Biological Chemistry. 2003, vol. 278, p. 45094–45100. |
| [7] | SWANSON, P. E. Dehalogenases Applied to Industrial-scale Biocatalysis. Current Opinion in Biotechnology. 1999, vol. 10, p. 365–369. |
| [8] | STUCKI, G.; THUER, M. Experiences of a Large-scale Application of 1,2-dichloroethane Degrading Microorganisms for Groundwater Treatment. Environmental Science and Technology. 1995, vol. 29, p. 2339–2345. |
| [9] | PROKOP, Z.; DAMBORSKÝ, J.; NAGATA, Y.; JANSSEN, D. B. Patent PCT/CZ2005/000099. 2004. |
| [10] | PROKOP, Z.; DAMBORSKÝ, J.; OPLUŠTIL, F.; JESENSKÁ, A.; NAGATA, Y. In CZ Patent 298287. 2005. |
| [11] | GUIDO, R. V. C.; OLIVA, G.; ANDRICOPULO, A. D. Virtual Screening and Its Integration with Modern Drug Design Technologies. Current Medical Chemistry. 2008, vol. 15, p. 37–46. |
| [12] | STAHURA, F. L.; BAJORATH, J.; Novel Methodologies for Virtual Screening. Current Pharmaceutical Design. 2005, vol. 11, p. 1189–1202. |
| [13] | MORRIS, G. M.; GOODSELL, D. S.; HALLIDAY, R. S.; HUEY, R.; HART, W. E.; BELEW, R. K.; OLSON, A. J. Automated Docking using a Lamarckian Genetic Algorithm and Empirical Binding Free Energy Function. Journal of Computational Chemistry. 1998, vol. 19, p. 1639–1662. |
| [14] | GASTELLU-ETCHEGORRY, J. P.; MARTIN, E.; GASCON, F.; BELOT, A.; LEFEVRE, M. J.; BOYAT, P.; GENTINE, P.; ADER, G.; DESCHARD, J.; TORRUELLA, P.; CHOURAK, K. DART: 3-D Model of Optical Satellite Images and Radiation Budget. In Proc. Geoscience and Remote Sensing Symposium, 2003. IGARSS ’03. IEEE International, 2003, p. 3242–3244. |
| [15] | GASTELLU-ETCHEGORRY, J. P.; MARTIN, E.; GASCON, F.; DART: A 3D Model for Simulating Satellite Images and Studying Surface Radiation Budget. International Journal of Remote Sensing. 2004, p. 73–96. |
| [16] | RUDA, M.; TÓTH, Š. Transition to Inter-Cluster Scheduling Architecture in MetaCentrum. Technical report 21/2009. Praha: CESNET, 2009. |