Seqanswers Leaderboard Ad

**liux** · 12-09-2011, 01:56 PM

Any progress on this? would love to see how other people does mRNAseq variants calling with TopHat output.

**Dameon** · 12-12-2011, 08:30 AM

I use GATK to call variants from TopHat aligned BAM files. First, you'll need to add @RG information and sort using PICARD tools so as to configure the BAM files for GATK; otherwise, it will fail. Depending on what species you are interrogating, you can then realign around indels and recalibrate the quality scores, or go straight to the Unified Genotyper. Because you would be expecting differentially expressed genes with very low and variable coverage across exons, set the --stand_emit_conf and --stand_call_conf to something really low, like 2, and then use the variant annotater option (-A I think) in the Unified Genotyper to add the ReadPosRankSumTest quality score. Take the VCF file generated by GATK and run it through SNPeff (if human, submit the GATK vcf file to SeattleSNP)and then take that vcf file as raw input to GATK's VariantAnnotator to annotate the raw GATK vcf file. Now filter for what you are interested in. Enjoy.

**efoss** · 12-12-2011, 08:44 AM

Originally posted by Dameon View Post

I use GATK to call variants from TopHat aligned BAM files. First, you'll need to add @RG information and sort using PICARD tools so as to configure the BAM files for GATK; otherwise, it will fail. Depending on what species you are interrogating, you can then realign around indels and recalibrate the quality scores, or go straight to the Unified Genotyper. Because you would be expecting differentially expressed genes with very low and variable coverage across exons, set the --stand_emit_conf and --stand_call_conf to something really low, like 2, and then use the variant annotater option (-A I think) in the Unified Genotyper to add the ReadPosRankSumTest quality score. Take the VCF file generated by GATK and run it through SNPeff (if human, submit the GATK vcf file to SeattleSNP)and then take that vcf file as raw input to GATK's VariantAnnotator to annotate the raw GATK vcf file. Now filter for what you are interested in. Enjoy.

Hi Dameon,

Thanks very much. I've run GATK with DNA but not RNA. Do you see any problem with using GATK with RNA seq? The Broad Institute people are kind of ambiguous about whether it works with RNA seq. Anyway, I'll give it a try. Thanks for the detailed instructions.

Best,

Eric

**Dameon** · 12-12-2011, 10:29 AM

Originally posted by efoss View Post

Hi Dameon,

Thanks very much. I've run GATK with DNA but not RNA. Do you see any problem with using GATK with RNA seq? The Broad Institute people are kind of ambiguous about whether it works with RNA seq. Anyway, I'll give it a try. Thanks for the detailed instructions.

Best,

Eric

The only problems I forsee of using GATK to call variants from RNA-seq data is the filtering. You want to set the Unified Genotyper as sensitive as possible, don't worry about this as GATK is very aggressive in calling SNPs by default, and then use as many options as possible from VariantAnnotator to whittle down the variants to what you believe to be true SNP calls. It would probably help to use --glm SNP so that you only have to worry about filtering for false positive SNP calls for now. Let me know how everything turns out.

**bharati** · 08-07-2012, 10:57 PM

normalization of the aligned data

do we not need to go for any normalization method before calling variations on mRNA Seq data?

**pbluescript** · 08-08-2012, 05:20 AM

This is a tricky problem and simply using Tophat with GATK will give you an incredible amount of false positives.
Read the comments on this paper to get an idea of the issues as well as some methods to deal with it:

Just a moment...

http://www.sciencemag.org/content/333/6038/53.abstract

Here are several other papers that deal with this issue:

Application Unavailable | Springer Nature

http://genomebiology.com/2012/13/4/r26

http://www.nature.com/nmeth/journal/v9/n6/full/nmeth.1982.html

RNA editing of protein sequences: A rare event in human transcriptomes

http://rnajournal.cshlp.org/content/early/2012/07/25/rna.033233.112.abstract

A monthly journal publishing high-quality, peer-reviewed research on all topics related to RNA and its metabolism in all organisms

There are more out there too, but the basic idea is that if you want to call variants from RNA Seq data, you have to be very careful.

**sindrle** · 10-28-2013, 05:04 AM

One question, how will the difference between single-end and paired-end seq effect SNPs call i mRNAseq?

**crazyhottommy** · 10-28-2013, 06:13 AM

you may have a look at this http://allaboutbioinfo.blogspot.com/...53107057687822

**sindrle** · 10-28-2013, 07:20 AM

Thats fantastic!
Do you have any more good things to read like this one?

Thanks a lot!

**crazyhottommy** · 10-28-2013, 09:55 AM

well, several more here

RNA-Seq Blog

http://www.rna-seqblog.com/technology/methods/data-analysis/snp-detection/snpir-identification-of-genomic-variants-from-rna-seq-data/

Transcriptome Research & Industry News

RNA-Seq Blog

http://www.rna-seqblog.com/technology/methods/data-analysis/unspliced-mapping-tools/blackops-increasing-confidence-in-variant-detection-through-mappability-filtering/

Transcriptome Research & Industry News

RNA-Seq Blog

http://www.rna-seqblog.com/technology/publications/analysis-and-design-of-rna-sequencing-experiments-for-identifying-rna-editing-and-other-single-nucleotide-variants/

Transcriptome Research & Industry News

RNA-Seq Blog

http://www.rna-seqblog.com/news/commentary/seqing-snps-in-rna-data/

Transcriptome Research & Industry News

Originally posted by sindrle View Post

Thats fantastic!
Do you have any more good things to read like this one?

Thanks a lot!

**sindrle** · 10-28-2013, 11:33 AM

Its so bad Im so tired of courses today, this was really inspiring! Will read it all tomorrow.

Thank you!!

**sindrle** · 11-11-2013, 02:15 AM

Originally posted by Dameon View Post

I use GATK to call variants from TopHat aligned BAM files. First, you'll need to add @RG information and sort using PICARD tools so as to configure the BAM files for GATK; otherwise, it will fail. Depending on what species you are interrogating, you can then realign around indels and recalibrate the quality scores, or go straight to the Unified Genotyper. Because you would be expecting differentially expressed genes with very low and variable coverage across exons, set the --stand_emit_conf and --stand_call_conf to something really low, like 2, and then use the variant annotater option (-A I think) in the Unified Genotyper to add the ReadPosRankSumTest quality score. Take the VCF file generated by GATK and run it through SNPeff (if human, submit the GATK vcf file to SeattleSNP)and then take that vcf file as raw input to GATK's VariantAnnotator to annotate the raw GATK vcf file. Now filter for what you are interested in. Enjoy.

Why SeattleSNP instead of SNPeff for humans? And which software form the SeattleSNP are you referreing to?

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 33 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 49 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 34 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 46 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

going from RNA seq TopHat output to variant calls

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News