Snakemake pipeline to map DNA datasets to a given reference genome using BWA
MEM and Samtools.
Input datsets are fastq files gzipped. Can be organized in
multiple
units for each sample or also only one unit per sample; in the last case, file will be
copied.
Output can be produced in BAM or CRAM (default) format.
Workflow
Requirements
The pipeline’s requirements are specified into the environment.yaml file and packages dependency are resolved using Conda.
Usage
Manual deployment (a.k.a. hard way)
Clone the repository and cd in it
git clone https://github.com/solida-core/dima.git
cd dima
Edit the configuration file and the Snakefile to match your environment
nano config.yaml
nano Snakefile
Create conda environment
conda env create -n dima --file environment.yaml
then activate it
source activate dima
Launch Snakemake
snakemake --use-conda --configfile config.yaml
Automatic deployment (a.k.a. easy way)
Use Solida.
Output
Default output of the pipeline in CRAM format, but can changed easily in BAM editing the Snakefile (look for OUTPUT_FORMAT variable)
Contributing
Contributions from everyone and anyone are welcome.
Fork this repository, make your changes and create a Pull Request.
Then one of the maintainers will review your changes.
When all comments have been addressed and all tests pass, your changes will
be merged.