Go Back   SEQanswers > Bioinformatics > Bioinformatics
Similar Threads
Thread Thread Starter Forum Replies Last Post
TopHat file.bam file.bed join Trudy Bioinformatics 1 05-21-2013 11:59 AM
tophat junctions.bed file upendra_35 RNA Sequencing 2 10-23-2012 07:17 AM
How to make sense of Tophat's output file 'junctions.bed' gsinghal RNA Sequencing 4 09-03-2012 06:49 AM
samtools sort bam file error: generate non-existent file mediator Bioinformatics 0 03-05-2012 08:42 PM
the tophat generate the bam file instead of sam files? dingkai0564 Bioinformatics 1 11-10-2010 07:33 PM

Thread Tools
Old 04-10-2013, 01:26 PM   #1
Julien Roux
Location: Chicago

Join Date: Dec 2011
Posts: 24
Default Tophat: generate junctions.bed file from BAM file

Dear all,
I have previously mapped RNA-seq samples to the human genome using Tophat 2.0.6. This generated an "accepted_hits.bam" result file, and a "junctions.bed" file I am interested in for details on exon-exon junctions mapping.
To be sure to compare fairly my different samples, I want to downsample the BAM files to have the same number of mapped reads for each sample. This is pretty straighforward using samtools or picard.
Now I would like to regenerate new "junctions.bed" files from the downsampled BAM files. I couldn't find in the log files which program of the tophat package does that.
Anyone ever dealt with a similar issue? Any hint?
Thanks for your help
Julien Roux is offline   Reply With Quote
Old 04-12-2013, 12:07 PM   #2
Julien Roux
Location: Chicago

Join Date: Dec 2011
Posts: 24

I am in the process of writing a perl script that generates the file junctions.bed from a given tophat alignment BAM file and I discovered a small bug that sometimes affects the definition of the flanking regions covered by mapped reads next to the exon-exon junction. When there is an insertion or a deletion on one side of the junction, the number of flanking covered nucleotides is not counted correctly.
For example in the following cigar line 14M1I90M664N2M, there are 14+90 nucleotides covered on the left of the junction, but tophat erroneously counts only 14. The same thing happens when there is a deletion.

I thought this would be of interest for people working on related matters.
Julien Roux is offline   Reply With Quote
Old 04-12-2013, 12:18 PM   #3
huma Asif
Location: Japan

Join Date: Oct 2010
Posts: 53
Unhappy VCF file from lifescope

hi everyone
i need help in understanding the VCF file output
0/1: 13: 22: 13: 2,9: 2,6: 31,21
chr1 762273 . G A . PASS DP=22
At this site GT is 0/1 (i guess it is heterozygous) the confidence is not good GQ=13 largely because there are only 22 reads at this site DP= 22, why AD is 2,9 what i understand is it should be equal to DP
i read somewhere that this Sum can be smaller than DP because low-quality bases are not counted is it correct.
how will this FDP, AST and AMQV be interpreted?
Another question is
how i can correlate the SNPse to specific diseases by comparing allelic frequencies
huma Asif is offline   Reply With Quote
Old 07-09-2013, 06:47 PM   #4
Senior Member
Location: China

Join Date: Feb 2009
Posts: 116

Hi Julien, very important information. Thank you.

I have the same needness, and find sam_juncs, component of tophat, can output the junction position information without depth and overhang.

The needness of a script to get junctions.bed from bam file produced by tophat is, from my point:
I want to subset the bam file by several different criteria and check the influence on the alternative splicing detection. This script will make the alignment unnecessary.

Thank you.

Last edited by pengchy; 07-09-2013 at 06:58 PM. Reason: add information
pengchy is offline   Reply With Quote
Old 08-18-2013, 07:24 PM   #5
Senior Member
Location: China

Join Date: Feb 2009
Posts: 116

There is a bug for sam_juncs script, which is 1 smaller than the output of bed_to_juncs script.
here is the example:
output of sam_juncs
Scaffold184317  456     11046   +
Scaffold184317  11239   21151   +
Scaffold184317  21385   29565   +
Scaffold184317  34619   60733   +
Scaffold184317  61209   68778   +
Scaffold184317  69066   75556   +
Scaffold184317  75631   84372   +
Scaffold184317  75631   102072  +
Scaffold184317  104376  112269  -
Scaffold184317  112465  115516  -
Scaffold184317  115669  115888  -
Scaffold184317  116034  116975  -
Scaffold184317  117130  117392  -
Scaffold184317  117631  123099  -
Scaffold184317  123260  134557  -
here is the output of bed_to_juncs
Scaffold184317  457     11046   +
Scaffold184317  11240   21151   +
Scaffold184317  21386   29565   +
Scaffold184317  34620   60733   +
Scaffold184317  61210   68778   +
Scaffold184317  69067   75556   +
Scaffold184317  75632   84372   +
Scaffold184317  75632   102072  +
Scaffold184317  104377  112269  -
Scaffold184317  112466  115516  -
Scaffold184317  115670  115888  -
Scaffold184317  116035  116975  -
Scaffold184317  117131  117392  -
Scaffold184317  117632  123099  -
Scaffold184317  123261  134557  -
pengchy is offline   Reply With Quote
Old 01-14-2016, 03:19 PM   #6
Location: St Louis, USA

Join Date: Nov 2011
Posts: 14

Try regtools -, the `junctions extract` command will do this.
trackavinash is offline   Reply With Quote

bedfile, junctions, rna-seq, tophat

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 06:37 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO