Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • going from RNA seq TopHat output to variant calls

    I have RNA seq data aligned to a reference genome using TopHat. I would now like to take these SAM/BAM files as an input and get as an output information for where sequence variants are - chromosome, base pair coordinates, where the variants are (gene names, introns, non-genic regions, etc.), what type of mutations they are (SNPs, substitutions, deletions), what effect they have on amino acid sequences (frame shift, nonsense, missense, silent, etc.) and ideally also whether the variant has been reported as a SNP. DNAnexus will do these things but it's pretty expensive and (I believe) not ideal for RNA seq with splice junctions. Does anyone have suggestions for useful tools?

    Thank you.

    Eric

  • #2
    Any progress on this? would love to see how other people does mRNAseq variants calling with TopHat output.

    Comment


    • #3
      I use GATK to call variants from TopHat aligned BAM files. First, you'll need to add @RG information and sort using PICARD tools so as to configure the BAM files for GATK; otherwise, it will fail. Depending on what species you are interrogating, you can then realign around indels and recalibrate the quality scores, or go straight to the Unified Genotyper. Because you would be expecting differentially expressed genes with very low and variable coverage across exons, set the --stand_emit_conf and --stand_call_conf to something really low, like 2, and then use the variant annotater option (-A I think) in the Unified Genotyper to add the ReadPosRankSumTest quality score. Take the VCF file generated by GATK and run it through SNPeff (if human, submit the GATK vcf file to SeattleSNP)and then take that vcf file as raw input to GATK's VariantAnnotator to annotate the raw GATK vcf file. Now filter for what you are interested in. Enjoy.

      Comment


      • #4
        Originally posted by Dameon View Post
        I use GATK to call variants from TopHat aligned BAM files. First, you'll need to add @RG information and sort using PICARD tools so as to configure the BAM files for GATK; otherwise, it will fail. Depending on what species you are interrogating, you can then realign around indels and recalibrate the quality scores, or go straight to the Unified Genotyper. Because you would be expecting differentially expressed genes with very low and variable coverage across exons, set the --stand_emit_conf and --stand_call_conf to something really low, like 2, and then use the variant annotater option (-A I think) in the Unified Genotyper to add the ReadPosRankSumTest quality score. Take the VCF file generated by GATK and run it through SNPeff (if human, submit the GATK vcf file to SeattleSNP)and then take that vcf file as raw input to GATK's VariantAnnotator to annotate the raw GATK vcf file. Now filter for what you are interested in. Enjoy.
        Hi Dameon,

        Thanks very much. I've run GATK with DNA but not RNA. Do you see any problem with using GATK with RNA seq? The Broad Institute people are kind of ambiguous about whether it works with RNA seq. Anyway, I'll give it a try. Thanks for the detailed instructions.

        Best,

        Eric

        Comment


        • #5
          Originally posted by efoss View Post
          Hi Dameon,

          Thanks very much. I've run GATK with DNA but not RNA. Do you see any problem with using GATK with RNA seq? The Broad Institute people are kind of ambiguous about whether it works with RNA seq. Anyway, I'll give it a try. Thanks for the detailed instructions.

          Best,

          Eric
          The only problems I forsee of using GATK to call variants from RNA-seq data is the filtering. You want to set the Unified Genotyper as sensitive as possible, don't worry about this as GATK is very aggressive in calling SNPs by default, and then use as many options as possible from VariantAnnotator to whittle down the variants to what you believe to be true SNP calls. It would probably help to use --glm SNP so that you only have to worry about filtering for false positive SNP calls for now. Let me know how everything turns out.

          Comment


          • #6
            normalization of the aligned data

            do we not need to go for any normalization method before calling variations on mRNA Seq data?

            Comment


            • #7
              This is a tricky problem and simply using Tophat with GATK will give you an incredible amount of false positives.
              Read the comments on this paper to get an idea of the issues as well as some methods to deal with it:


              Here are several other papers that deal with this issue:


              A monthly journal publishing high-quality, peer-reviewed research on all topics related to RNA and its metabolism in all organisms


              There are more out there too, but the basic idea is that if you want to call variants from RNA Seq data, you have to be very careful.

              Comment


              • #8
                One question, how will the difference between single-end and paired-end seq effect SNPs call i mRNAseq?

                Comment


                • #9
                  you may have a look at this http://allaboutbioinfo.blogspot.com/...53107057687822

                  Comment


                  • #10
                    Thats fantastic!
                    Do you have any more good things to read like this one?

                    Thanks a lot!

                    Comment


                    • #12
                      Its so bad Im so tired of courses today, this was really inspiring! Will read it all tomorrow.

                      Thank you!!

                      Comment


                      • #13
                        Originally posted by Dameon View Post
                        I use GATK to call variants from TopHat aligned BAM files. First, you'll need to add @RG information and sort using PICARD tools so as to configure the BAM files for GATK; otherwise, it will fail. Depending on what species you are interrogating, you can then realign around indels and recalibrate the quality scores, or go straight to the Unified Genotyper. Because you would be expecting differentially expressed genes with very low and variable coverage across exons, set the --stand_emit_conf and --stand_call_conf to something really low, like 2, and then use the variant annotater option (-A I think) in the Unified Genotyper to add the ReadPosRankSumTest quality score. Take the VCF file generated by GATK and run it through SNPeff (if human, submit the GATK vcf file to SeattleSNP)and then take that vcf file as raw input to GATK's VariantAnnotator to annotate the raw GATK vcf file. Now filter for what you are interested in. Enjoy.
                        Why SeattleSNP instead of SNPeff for humans? And which software form the SeattleSNP are you referreing to?

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 06:37 PM
                        0 responses
                        11 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 06:07 PM
                        0 responses
                        10 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        51 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        68 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X