COSMOS is a Python library to manage large-scale workflows that allows formal description of pipelines and partitioning of jobs. It includes a user-interface for tracking the progress of jobs, abstraction of the job queuing system (to allow interface to multiple queuing systems) and fine-grained control over the workflow. COSMOS runs on cloud-based services such as Amazon Web Services and Google Cloud, as well as traditional high-performance computing (HPC) clusters. This is a collaboration project with the Tonellato lab at Harvard Medical School.
COSMOS was developed specifically to support our clinical-time, next-generation sequencing (NGS) variant calling pipeline, but can be used to develop pipelines for any large-scale scientific workflow. It runs our parallelized workflow for optimum and scalable processing of NGS data (currently whole genome and whole exome) and supports projects such as the 6000 exome and genome analysis
COSMOS is available through our website: http://cosmos.hms.harvard.edu/
- Gafni, Luquette, Lancaster, Hawkins, Jung, Souilmi, Wall, Tonellato “COSMOS: Python library for massively parallel workflows” Bioinformatics 2014. (open access)