Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Galaxy workflow for GATK pipeline [Work in progress]

    I'm implementing GATK pipeline in Galaxy following the recommendations from http://www.broadinstitute.org/gsa/wi...th_the_GATK_v3. All the tools are already in the test server(http://test.g2.bx.psu.edu/), and they can be installed locally using galaxy-central branch. Picard and GATK tools are labeled "BETA", but in my experience mostly everything is working.

    This is what I have so far:
    Galaxy is a community-driven web-based analysis platform for life science research.


    This workflow is for Human(hg_g1k_v37), but can be easily adapted to any other genome, although in the test server that's the only genome available. I couldn't leave the reference genome to be set at runtime because of a bug in 'Workflows', Galaxy's authors commented they are working on it.

    These are the steps I have so far and would love to receive comments on, you can take a detailed look at the link above and could even import it into your own history:
    Code:
    Step 1: Map with BWA for Illumina
    
    Step 2: Filter SAM
    - filtering by
    Read is paired: Yes
    Read is mapped in a proper pair: Yes
    The read is unmapped: No
    
    Step 3: Replace SAM/BAM Header
    - because the header is lost during the filtering
    
    Step 4: SAM-to-BAM
    - this also orders the BAM file
    
    Step 5: Mark Duplicate reads
    - I was impress by how many dupes are being marked
    
    Step 6: Count Covariates
    - I'm using the options to select standard covariates, as I don't know which should I use for better results. Is there a place I could find documentation about this?.
    
    Step 7: Table Recalibration
    
    Step 8: Analyze Covariates
    
    Step 9: Realigner Target Creator
    
    Step 10: Count Covariates
    - For the moment I count and analyze covariates before and after to see the differences.
    
    Step 11: Indel Realigner
    
    Step 12: Analyze Covariates
    
    Step 13: Paired Read Mate Fixer
    
    Step 14: Unified Genotyper
    
    Step 15: Variant Annotator
    
    Step 16: Variant Recalibrator
    
    Step 17: Apply Variant Recalibration
    
    Step 18: Variant Filtration
    I'm now trying to set each tool like it is described in this post http://seqanswers.com/forums/showthread.php?t=14038, thanks to raonyguimaraes for the suggestion and thanks to ulz_peter for a great document with detailed instructions.

    Any help or comments will be highly appreciated.
    Thanks,
    Carlos

    Edits:
    Nov 30, 2011
    - added steps for tools "Variant Annotator", "Variant Recalibrator", "Apply Variant Recalibration" and "Variant Filtration"
    Nov 21, 2011
    - added 'Paired Read Mate Fixer' step
    - added 'ROD file' binding option for steps 'Count Covariates' and 'Indel Realigner'. I'll be using for example 'Get Data/USCS Main':
    clade: Mammal
    genome: Mouse
    assembly: July 2007 (NCBI37/mm9)
    group: Variations and Repeats
    track: SNP (128)
    table: snp128
    ouput format: BED - browser extensible data
    Last edited by Carlos Borroto; 11-30-2011, 08:33 AM.

  • #2
    Thanks a lot Carlos, I was planning to do something similar with this thread http://seqanswers.com/forums/showthread.php?t=14038

    Now I can use your workflow to start !

    For dbSNP ROD I usually use the VCF file provided by DBSNP. Since you are working with mouse you would have two options: create a VCF file with the SNPs of your organisms, or don't include this file in your analysis.

    Comment


    • #3
      Originally posted by raonyguimaraes View Post
      Thanks a lot Carlos, I was planning to do something similar with this thread http://seqanswers.com/forums/showthread.php?t=14038
      Great document! thanks for pointing me to it. I'll be adding some modifications to the workflow based on what I'm reading there.

      Please if you can share here or better yet, at Galaxy as your own workflow, what modifications you added to this workflow. I would love to keep improving it base on commets from others.

      Comment


      • #4
        I added a few more steps and also ran into some troubles with the string name used for annotations:



        You will have to edit tool xml file in galaxy to let you select the right annotation or edit your VCF files to replace the annotations names before continuing with the pipeline. I haven't receive a response from Galaxy devs, so can't tell what they think will be the best approach to solve this issue.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 11:49 AM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-24-2024, 08:47 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        61 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Working...
        X