Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Counting reads (not coverage...) smol Bioinformatics 4 08-24-2016 05:11 AM
counting junction reads muzz56 Bioinformatics 2 02-12-2012 05:36 AM
Tophat: find junction spanning reads thurisaz RNA Sequencing 4 11-14-2011 04:23 AM
Tophat Junction reads are very low comapred to ERANGE repinementer Bioinformatics 1 08-11-2010 02:41 AM
Genomic reads and Junction reads-Tophat repinementer Bioinformatics 0 08-10-2010 11:05 PM

Thread Tools
Old 10-19-2011, 01:14 PM   #1
Location: San Diego

Join Date: Apr 2009
Posts: 12
Default counting junction reads in TopHat

I am running TopHat and using the accepted_hits.bam file to generate counts for genes.
How does TopHat identify junctions in this bam file, as exons?
How are users generating read counts for genes which include the junction and exon reads?
schaffer is offline   Reply With Quote
Old 10-20-2011, 02:54 PM   #2
Senior Member
Location: Germany

Join Date: Feb 2011
Posts: 108

If you just want to use the junctions directly, then, in addition to bam file, you should also have a junctions, insertions and deletions .bed file.

If you are wanting to understand, then, you should look at Sam Format Specification. Your bam file can be viewed with samtools. If necessary, you can convert to sam with picard tools.
In RNA-Seq data, basically, when tophat finds a read that splices across previously identified exon regions, that is, for one read R (80bp), say, of gene G (with 5 exons, say), between E2 and E3 (the intron between E2 and E3 is 500bp, say); lets say R1 = 30bp maps to E2 and R2 = 50bp maps to E3, then tophat writes this as 30M500N50M. This is a CIGAR string (from the sam format specification). In addition to that there is a "start" position in the SAM file format and using this you can find out the junction (check out other possible options for CIGAR string). Is this what you asked for?
cedance is offline   Reply With Quote
Old 10-20-2011, 08:03 PM   #3
Junior Member
Location: Australia

Join Date: Oct 2011
Posts: 4
Default exon join frequencies from RNAseq data


Don't know whether this will help - or whether I went about it the correct way - but this is my related experience.

I was faced with a similar problem - determining the read evidence for particular exon joins within a certain genomic region.

Basically what I did was take the RNAseq read data and use tophat to align to a particular genomic region for which I knew the exon positions.

After getting a sam/bam file, I wrote a python script to parse the tophat match data i.e. the cigar string info: 30M500N50M to determine the exon join read frequencies.

Shoot me any questions you like.

Good luck,

agout is offline   Reply With Quote
Old 10-21-2011, 01:05 AM   #4
Senior Member
Location: Germany

Join Date: Feb 2011
Posts: 108

After going thro' your question again, the simple picture as far as I understand is this: tophat first uses bowtie to align reads against the reference genome. Bowtie aligns reads without identifying splice sites. It tries to map entire read directly to a matching region of your genome (and it is fast). From here, tophat "knows" potential exons and it tries to map the other unmapped reads by splitting them in to smaller parts and finding a region.
cedance is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 05:31 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO