I'm implementing GATK pipeline in Galaxy following the recommendations from http://www.broadinstitute.org/gsa/wi...th_the_GATK_v3. All the tools are already in the test server(http://test.g2.bx.psu.edu/), and they can be installed locally using galaxy-central branch. Picard and GATK tools are labeled "BETA", but in my experience mostly everything is working.
This is what I have so far:
This workflow is for Human(hg_g1k_v37), but can be easily adapted to any other genome, although in the test server that's the only genome available. I couldn't leave the reference genome to be set at runtime because of a bug in 'Workflows', Galaxy's authors commented they are working on it.
These are the steps I have so far and would love to receive comments on, you can take a detailed look at the link above and could even import it into your own history:
I'm now trying to set each tool like it is described in this post http://seqanswers.com/forums/showthread.php?t=14038, thanks to raonyguimaraes for the suggestion and thanks to ulz_peter for a great document with detailed instructions.
Any help or comments will be highly appreciated.
Thanks,
Carlos
Edits:
Nov 30, 2011
- added steps for tools "Variant Annotator", "Variant Recalibrator", "Apply Variant Recalibration" and "Variant Filtration"
Nov 21, 2011
- added 'Paired Read Mate Fixer' step
- added 'ROD file' binding option for steps 'Count Covariates' and 'Indel Realigner'. I'll be using for example 'Get Data/USCS Main':
clade: Mammal
genome: Mouse
assembly: July 2007 (NCBI37/mm9)
group: Variations and Repeats
track: SNP (128)
table: snp128
ouput format: BED - browser extensible data
This is what I have so far:
This workflow is for Human(hg_g1k_v37), but can be easily adapted to any other genome, although in the test server that's the only genome available. I couldn't leave the reference genome to be set at runtime because of a bug in 'Workflows', Galaxy's authors commented they are working on it.
These are the steps I have so far and would love to receive comments on, you can take a detailed look at the link above and could even import it into your own history:
Code:
Step 1: Map with BWA for Illumina Step 2: Filter SAM - filtering by Read is paired: Yes Read is mapped in a proper pair: Yes The read is unmapped: No Step 3: Replace SAM/BAM Header - because the header is lost during the filtering Step 4: SAM-to-BAM - this also orders the BAM file Step 5: Mark Duplicate reads - I was impress by how many dupes are being marked Step 6: Count Covariates - I'm using the options to select standard covariates, as I don't know which should I use for better results. Is there a place I could find documentation about this?. Step 7: Table Recalibration Step 8: Analyze Covariates Step 9: Realigner Target Creator Step 10: Count Covariates - For the moment I count and analyze covariates before and after to see the differences. Step 11: Indel Realigner Step 12: Analyze Covariates Step 13: Paired Read Mate Fixer Step 14: Unified Genotyper Step 15: Variant Annotator Step 16: Variant Recalibrator Step 17: Apply Variant Recalibration Step 18: Variant Filtration
Any help or comments will be highly appreciated.
Thanks,
Carlos
Edits:
Nov 30, 2011
- added steps for tools "Variant Annotator", "Variant Recalibrator", "Apply Variant Recalibration" and "Variant Filtration"
Nov 21, 2011
- added 'Paired Read Mate Fixer' step
- added 'ROD file' binding option for steps 'Count Covariates' and 'Indel Realigner'. I'll be using for example 'Get Data/USCS Main':
clade: Mammal
genome: Mouse
assembly: July 2007 (NCBI37/mm9)
group: Variations and Repeats
track: SNP (128)
table: snp128
ouput format: BED - browser extensible data
Comment