SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to determine strand from tophat output for paired-end RNA-seq data jay2008 Bioinformatics 1 05-30-2012 04:46 AM
Variant calls with a low fraction of alt reads Jeremy37 Bioinformatics 9 04-17-2012 06:18 PM
convert CASAVA variant calls to VCF? krish Bioinformatics 0 12-01-2011 08:44 PM
Complete Genomics Variant Calls quicksand21 Bioinformatics 2 10-11-2011 07:21 AM
merging and de-duplicating structural variant calls (bedpe) splaisan Bioinformatics 0 06-27-2011 07:29 AM

Reply
 
Thread Tools
Old 07-20-2011, 08:33 AM   #1
efoss
Member
 
Location: Seattle

Join Date: Jul 2011
Posts: 98
Default going from RNA seq TopHat output to variant calls

I have RNA seq data aligned to a reference genome using TopHat. I would now like to take these SAM/BAM files as an input and get as an output information for where sequence variants are - chromosome, base pair coordinates, where the variants are (gene names, introns, non-genic regions, etc.), what type of mutations they are (SNPs, substitutions, deletions), what effect they have on amino acid sequences (frame shift, nonsense, missense, silent, etc.) and ideally also whether the variant has been reported as a SNP. DNAnexus will do these things but it's pretty expensive and (I believe) not ideal for RNA seq with splice junctions. Does anyone have suggestions for useful tools?

Thank you.

Eric
efoss is offline   Reply With Quote
Old 12-09-2011, 12:56 PM   #2
liux
Member
 
Location: Midwest

Join Date: Mar 2009
Posts: 30
Default

Any progress on this? would love to see how other people does mRNAseq variants calling with TopHat output.
liux is offline   Reply With Quote
Old 12-12-2011, 07:30 AM   #3
Dameon
Member
 
Location: St. Louis, MO - USA

Join Date: Dec 2011
Posts: 14
Default

I use GATK to call variants from TopHat aligned BAM files. First, you'll need to add @RG information and sort using PICARD tools so as to configure the BAM files for GATK; otherwise, it will fail. Depending on what species you are interrogating, you can then realign around indels and recalibrate the quality scores, or go straight to the Unified Genotyper. Because you would be expecting differentially expressed genes with very low and variable coverage across exons, set the --stand_emit_conf and --stand_call_conf to something really low, like 2, and then use the variant annotater option (-A I think) in the Unified Genotyper to add the ReadPosRankSumTest quality score. Take the VCF file generated by GATK and run it through SNPeff (if human, submit the GATK vcf file to SeattleSNP)and then take that vcf file as raw input to GATK's VariantAnnotator to annotate the raw GATK vcf file. Now filter for what you are interested in. Enjoy.
Dameon is offline   Reply With Quote
Old 12-12-2011, 07:44 AM   #4
efoss
Member
 
Location: Seattle

Join Date: Jul 2011
Posts: 98
Default

Quote:
Originally Posted by Dameon View Post
I use GATK to call variants from TopHat aligned BAM files. First, you'll need to add @RG information and sort using PICARD tools so as to configure the BAM files for GATK; otherwise, it will fail. Depending on what species you are interrogating, you can then realign around indels and recalibrate the quality scores, or go straight to the Unified Genotyper. Because you would be expecting differentially expressed genes with very low and variable coverage across exons, set the --stand_emit_conf and --stand_call_conf to something really low, like 2, and then use the variant annotater option (-A I think) in the Unified Genotyper to add the ReadPosRankSumTest quality score. Take the VCF file generated by GATK and run it through SNPeff (if human, submit the GATK vcf file to SeattleSNP)and then take that vcf file as raw input to GATK's VariantAnnotator to annotate the raw GATK vcf file. Now filter for what you are interested in. Enjoy.
Hi Dameon,

Thanks very much. I've run GATK with DNA but not RNA. Do you see any problem with using GATK with RNA seq? The Broad Institute people are kind of ambiguous about whether it works with RNA seq. Anyway, I'll give it a try. Thanks for the detailed instructions.

Best,

Eric
efoss is offline   Reply With Quote
Old 12-12-2011, 09:29 AM   #5
Dameon
Member
 
Location: St. Louis, MO - USA

Join Date: Dec 2011
Posts: 14
Default

Quote:
Originally Posted by efoss View Post
Hi Dameon,

Thanks very much. I've run GATK with DNA but not RNA. Do you see any problem with using GATK with RNA seq? The Broad Institute people are kind of ambiguous about whether it works with RNA seq. Anyway, I'll give it a try. Thanks for the detailed instructions.

Best,

Eric
The only problems I forsee of using GATK to call variants from RNA-seq data is the filtering. You want to set the Unified Genotyper as sensitive as possible, don't worry about this as GATK is very aggressive in calling SNPs by default, and then use as many options as possible from VariantAnnotator to whittle down the variants to what you believe to be true SNP calls. It would probably help to use --glm SNP so that you only have to worry about filtering for false positive SNP calls for now. Let me know how everything turns out.
Dameon is offline   Reply With Quote
Old 08-07-2012, 10:57 PM   #6
bharati
Member
 
Location: USA

Join Date: Mar 2012
Posts: 38
Default normalization of the aligned data

do we not need to go for any normalization method before calling variations on mRNA Seq data?
bharati is offline   Reply With Quote
Old 08-08-2012, 05:20 AM   #7
pbluescript
Senior Member
 
Location: Boston

Join Date: Nov 2009
Posts: 224
Default

This is a tricky problem and simply using Tophat with GATK will give you an incredible amount of false positives.
Read the comments on this paper to get an idea of the issues as well as some methods to deal with it:
http://www.sciencemag.org/content/333/6038/53.abstract

Here are several other papers that deal with this issue:
http://genomebiology.com/2012/13/4/r26
http://www.nature.com/nmeth/journal/...meth.1982.html
http://rnajournal.cshlp.org/content/...3.112.abstract

There are more out there too, but the basic idea is that if you want to call variants from RNA Seq data, you have to be very careful.
pbluescript is offline   Reply With Quote
Old 10-28-2013, 05:04 AM   #8
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

One question, how will the difference between single-end and paired-end seq effect SNPs call i mRNAseq?
sindrle is offline   Reply With Quote
Old 10-28-2013, 06:13 AM   #9
crazyhottommy
Senior Member
 
Location: Gainesville

Join Date: Apr 2012
Posts: 140
Default

you may have a look at this http://allaboutbioinfo.blogspot.com/...53107057687822
crazyhottommy is offline   Reply With Quote
Old 10-28-2013, 07:20 AM   #10
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

Thats fantastic!
Do you have any more good things to read like this one?

Thanks a lot!
sindrle is offline   Reply With Quote
Old 10-28-2013, 09:55 AM   #11
crazyhottommy
Senior Member
 
Location: Gainesville

Join Date: Apr 2012
Posts: 140
Default

well, several more here
http://www.rna-seqblog.com/technolog...-rna-seq-data/

http://www.rna-seqblog.com/technolog...ity-filtering/

http://www.rna-seqblog.com/technolog...tide-variants/

http://www.rna-seqblog.com/news/comm...s-in-rna-data/

Quote:
Originally Posted by sindrle View Post
Thats fantastic!
Do you have any more good things to read like this one?

Thanks a lot!
crazyhottommy is offline   Reply With Quote
Old 10-28-2013, 11:33 AM   #12
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

Its so bad Im so tired of courses today, this was really inspiring! Will read it all tomorrow.

Thank you!!
sindrle is offline   Reply With Quote
Old 11-11-2013, 01:15 AM   #13
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

Quote:
Originally Posted by Dameon View Post
I use GATK to call variants from TopHat aligned BAM files. First, you'll need to add @RG information and sort using PICARD tools so as to configure the BAM files for GATK; otherwise, it will fail. Depending on what species you are interrogating, you can then realign around indels and recalibrate the quality scores, or go straight to the Unified Genotyper. Because you would be expecting differentially expressed genes with very low and variable coverage across exons, set the --stand_emit_conf and --stand_call_conf to something really low, like 2, and then use the variant annotater option (-A I think) in the Unified Genotyper to add the ReadPosRankSumTest quality score. Take the VCF file generated by GATK and run it through SNPeff (if human, submit the GATK vcf file to SeattleSNP)and then take that vcf file as raw input to GATK's VariantAnnotator to annotate the raw GATK vcf file. Now filter for what you are interested in. Enjoy.
Why SeattleSNP instead of SNPeff for humans? And which software form the SeattleSNP are you referreing to?
sindrle is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:02 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO