Seqanswers Leaderboard Ad

**dpryan** · 10-06-2013, 11:52 AM

To be fair, did you give tophat a reference GTF/GFF to use? If so, you're asking it to align against the transcriptome first and then convert those coordinates back to the genome. You're rather likely to get results like this by doing that (yes, it's probably better to simply soft-clip that one base, but you didn't use local-alignment and, anyway, it then matched the transcriptome).

**drdna** · 10-06-2013, 12:37 PM

Tophat2 is stupid

Originally posted by dpryan View Post

To be fair, did you give tophat a reference GTF/GFF to use? If so, you're asking it to align against the transcriptome first and then convert those coordinates back to the genome. You're rather likely to get results like this by doing that (yes, it's probably better to simply soft-clip that one base, but you didn't use local-alignment and, anyway, it then matched the transcriptome).

Nope, no reference provided. These were reads mapped to a viral genome. When I imported the .bam file into IGV, it showed a fraction of reads that were predicted to be spliced. When I followed up on these, they turned out to be bogus because all of the reads contained a single nucleotide on one side of the predicted splice site. It turns out that Tophat2 looks for putative intron splice sites and automatically assumes that the introns are valid as long as it can align at least ONE nucleotide on the other side of the splice junction. How stupid is that?

**dpryan** · 10-06-2013, 12:44 PM

Ah, well then those are really junk then. Predicting splicing based on a single base without a reference annotation seems like a bad idea! Thanks for the heads-up and you might consider making a bug report for tophat.

**gringer** · 10-06-2013, 09:33 PM

Using tophat for a viral genome seems a little odd -- I wasn't aware that viruses had introns.

**drdna** · 10-07-2013, 12:38 AM

Originally posted by gringer View Post

Using tophat for a viral genome seems a little odd -- I wasn't aware that viruses had introns.

Viruses undergo RNA recombination which produces molecules that are identical in structure to spliced transcripts. The only difference is that the border sequences don't match consensus splice junctions. Therefore, I was looking for reads that mapped to different regions of the viral genome in the same manner as would an intron-spanning read.

**CompBio** · 10-07-2013, 02:37 AM

You might try a tool such as MapSplice instead. It uses statistics based on the alignments themselves (such as distribution of reads across a junction) rather than any sequence-related measures to validate spliced alignments. It's not hard to use, but can be finicky and can consume its fair share of disk space.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 17 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 46 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Tophat2 produces thousands of invalid alignments

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News