SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Junctions in tophat jay2008 Bioinformatics 0 01-26-2012 06:23 PM
Missing Junctions in Tophat! ( after providing a known junctions & gene models files) avi Bioinformatics 2 08-02-2011 04:21 PM
tophat - no junctions crh Bioinformatics 2 07-25-2011 11:33 AM
junctions in tophat agent99 RNA Sequencing 0 01-05-2011 09:51 AM
tophat: junctions and reads Nicolas902 Bioinformatics 0 03-09-2010 02:01 AM

Reply
 
Thread Tools
Old 05-05-2010, 12:10 PM   #1
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default Abberant junctions by tophat

I have tried to align my paired end RNA-Seq reads to the genome using Tophat. I ran a sample dataset from the SRA (SR018268_1 and _2) and the data looked fine. However, when I run my datasets, I get a lot of spurrious junctions. In the attached example, I show the junctions and coverage for one sample. All the exons map beautifully and have coverage > 200X, but the junctions between exons were not determined for almost all of these exons are not joined and the majority of "junctions" (>80%) in the dataset are intergenic (or intragenic) even with low coverage. For exampl, the far left junction is supported by 92 reads, the middle by 83, and the right by 2.

I have tried to manipulate the alignment parameters such as -r set to either 165 or 41. These correspond to 230 bp DNA identified from the bioanalyzer minus the inner distance alone (230-35-35=165) or including the primer sequences (230-35-35-119=41). This didn't really change things much.

So my questions are:
1) Why aren't these junctions being called by tophat?
2) Why would the junction on the right show up?
3) How do I get past this?
Attached Files
File Type: pdf hgt_genome_2a53_1cd3d0.pdf (6.0 KB, 15 views)
RockChalkJayhawk is offline   Reply With Quote
Old 05-05-2010, 01:30 PM   #2
Cole Trapnell
Senior Member
 
Location: Boston, MA

Join Date: Nov 2008
Posts: 212
Default

How long are these reads?
Cole Trapnell is offline   Reply With Quote
Old 05-05-2010, 01:31 PM   #3
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default

these are 2 x 35 bp reads. I also don't know if this matters, but my mapping qualities from the SAM files are:
632465 0
368741 1
38907221 255
1170126 3


Does this matter?

Last edited by RockChalkJayhawk; 05-05-2010 at 01:47 PM. Reason: Update
RockChalkJayhawk is offline   Reply With Quote
Old 05-05-2010, 03:13 PM   #4
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default

Also, when I was trying to figure all this out, I made a fastq file that was only 100K long to troubleshoot and I ran into another problem. There are instances where junctions (true ones) appear only when the small dataset is used and not when the full dataset is used. Otherwise, they are exactly the same.

For instance, this figure shows no junctions when this gene is sequenced 33,333 times, but by subselecting and mapping with only 45x coverage, most of the exons are joined together.

Where did they go in the full analysis?

I am using the following code:
Code:
tophat -r 41 -p 6 --solexa1.3-quals hg19 sequence_1 sequence_2
Attached Files
File Type: pdf hgt_genome_4b6a_1fa0b0.pdf (19.5 KB, 8 views)
RockChalkJayhawk is offline   Reply With Quote
Reply

Tags
tophat, tophat rna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:50 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO