SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Pipeline for de novo RNA sequencing, and Galaxy LizBent Bioinformatics 3 05-08-2012 06:41 PM
GATK pipeline runtime alexbmp Bioinformatics 12 11-11-2011 07:57 PM
RNA-seq Galaxy workflow for PE barcoded samples? jjw14 Bioinformatics 0 04-19-2011 12:58 PM
Galaxy workflow management system customization - avoiding duplication of files tooony13 Bioinformatics 2 11-16-2010 09:57 AM
Newbler Error: no progress DNAjunk Bioinformatics 5 09-04-2009 11:21 AM

Reply
 
Thread Tools
Old 11-16-2011, 03:13 PM   #1
Carlos Borroto
Member
 
Location: Baltimore, MD

Join Date: Mar 2011
Posts: 19
Default Galaxy workflow for GATK pipeline [Work in progress]

I'm implementing GATK pipeline in Galaxy following the recommendations from http://www.broadinstitute.org/gsa/wi...th_the_GATK_v3. All the tools are already in the test server(http://test.g2.bx.psu.edu/), and they can be installed locally using galaxy-central branch. Picard and GATK tools are labeled "BETA", but in my experience mostly everything is working.

This is what I have so far:
http://test.g2.bx.psu.edu/u/cjav/w/gatk

This workflow is for Human(hg_g1k_v37), but can be easily adapted to any other genome, although in the test server that's the only genome available. I couldn't leave the reference genome to be set at runtime because of a bug in 'Workflows', Galaxy's authors commented they are working on it.

These are the steps I have so far and would love to receive comments on, you can take a detailed look at the link above and could even import it into your own history:
Code:
Step 1: Map with BWA for Illumina

Step 2: Filter SAM
- filtering by
Read is paired: Yes
Read is mapped in a proper pair: Yes
The read is unmapped: No

Step 3: Replace SAM/BAM Header
- because the header is lost during the filtering

Step 4: SAM-to-BAM
- this also orders the BAM file

Step 5: Mark Duplicate reads
- I was impress by how many dupes are being marked

Step 6: Count Covariates
- I'm using the options to select standard covariates, as I don't know which should I use for better results. Is there a place I could find documentation about this?.

Step 7: Table Recalibration

Step 8: Analyze Covariates

Step 9: Realigner Target Creator

Step 10: Count Covariates
- For the moment I count and analyze covariates before and after to see the differences.

Step 11: Indel Realigner

Step 12: Analyze Covariates

Step 13: Paired Read Mate Fixer

Step 14: Unified Genotyper

Step 15: Variant Annotator

Step 16: Variant Recalibrator

Step 17: Apply Variant Recalibration

Step 18: Variant Filtration
I'm now trying to set each tool like it is described in this post http://seqanswers.com/forums/showthread.php?t=14038, thanks to raonyguimaraes for the suggestion and thanks to ulz_peter for a great document with detailed instructions.

Any help or comments will be highly appreciated.
Thanks,
Carlos

Edits:
Nov 30, 2011
- added steps for tools "Variant Annotator", "Variant Recalibrator", "Apply Variant Recalibration" and "Variant Filtration"
Nov 21, 2011
- added 'Paired Read Mate Fixer' step
- added 'ROD file' binding option for steps 'Count Covariates' and 'Indel Realigner'. I'll be using for example 'Get Data/USCS Main':
clade: Mammal
genome: Mouse
assembly: July 2007 (NCBI37/mm9)
group: Variations and Repeats
track: SNP (128)
table: snp128
ouput format: BED - browser extensible data

Last edited by Carlos Borroto; 11-30-2011 at 07:33 AM.
Carlos Borroto is offline   Reply With Quote
Old 11-16-2011, 03:41 PM   #2
raonyguimaraes
Member
 
Location: Belo Horizonte - Brazil

Join Date: Jun 2010
Posts: 38
Default

Thanks a lot Carlos, I was planning to do something similar with this thread http://seqanswers.com/forums/showthread.php?t=14038

Now I can use your workflow to start !

For dbSNP ROD I usually use the VCF file provided by DBSNP. Since you are working with mouse you would have two options: create a VCF file with the SNPs of your organisms, or don't include this file in your analysis.
raonyguimaraes is offline   Reply With Quote
Old 11-21-2011, 08:53 AM   #3
Carlos Borroto
Member
 
Location: Baltimore, MD

Join Date: Mar 2011
Posts: 19
Default

Quote:
Originally Posted by raonyguimaraes View Post
Thanks a lot Carlos, I was planning to do something similar with this thread http://seqanswers.com/forums/showthread.php?t=14038
Great document! thanks for pointing me to it. I'll be adding some modifications to the workflow based on what I'm reading there.

Please if you can share here or better yet, at Galaxy as your own workflow, what modifications you added to this workflow. I would love to keep improving it base on commets from others.
Carlos Borroto is offline   Reply With Quote
Old 11-30-2011, 07:41 AM   #4
Carlos Borroto
Member
 
Location: Baltimore, MD

Join Date: Mar 2011
Posts: 19
Default

I added a few more steps and also ran into some troubles with the string name used for annotations:
http://getsatisfaction.com/gsa/topic...nd_short_names
https://bitbucket.org/galaxy/galaxy-...tor-error-with

You will have to edit tool xml file in galaxy to let you select the right annotation or edit your VCF files to replace the annotations names before continuing with the pipeline. I haven't receive a response from Galaxy devs, so can't tell what they think will be the best approach to solve this issue.
Carlos Borroto is offline   Reply With Quote
Reply

Tags
galaxy, gatk, snp

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:11 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO