![]() |
Tophat: generate junctions.bed file from BAM file
Dear all,
I have previously mapped RNA-seq samples to the human genome using Tophat 2.0.6. This generated an "accepted_hits.bam" result file, and a "junctions.bed" file I am interested in for details on exon-exon junctions mapping. To be sure to compare fairly my different samples, I want to downsample the BAM files to have the same number of mapped reads for each sample. This is pretty straighforward using samtools or picard. Now I would like to regenerate new "junctions.bed" files from the downsampled BAM files. I couldn't find in the log files which program of the tophat package does that. Anyone ever dealt with a similar issue? Any hint? Thanks for your help Julien |
I am in the process of writing a perl script that generates the file junctions.bed from a given tophat alignment BAM file and I discovered a small bug that sometimes affects the definition of the flanking regions covered by mapped reads next to the exon-exon junction. When there is an insertion or a deletion on one side of the junction, the number of flanking covered nucleotides is not counted correctly.
For example in the following cigar line 14M1I90M664N2M, there are 14+90 nucleotides covered on the left of the junction, but tophat erroneously counts only 14. The same thing happens when there is a deletion. I thought this would be of interest for people working on related matters. Julien |
VCF file from lifescope
hi everyone
i need help in understanding the VCF file output GT: GQ : DP: FDP: AD: AST: AMQV 0/1: 13: 22: 13: 2,9: 2,6: 31,21 CHROM POS ID REF ALT QUAL FILTER INFO chr1 762273 . G A . PASS DP=22 Result: At this site GT is 0/1 (i guess it is heterozygous) the confidence is not good GQ=13 largely because there are only 22 reads at this site DP= 22, why AD is 2,9 what i understand is it should be equal to DP i read somewhere that this Sum can be smaller than DP because low-quality bases are not counted is it correct. how will this FDP, AST and AMQV be interpreted? Another question is how i can correlate the SNPse to specific diseases by comparing allelic frequencies |
Hi Julien, very important information. Thank you.
I have the same needness, and find sam_juncs, component of tophat, can output the junction position information without depth and overhang. The needness of a script to get junctions.bed from bam file produced by tophat is, from my point: I want to subset the bam file by several different criteria and check the influence on the alternative splicing detection. This script will make the alignment unnecessary. Thank you. |
There is a bug for sam_juncs script, which is 1 smaller than the output of bed_to_juncs script.
here is the example: output of sam_juncs Code:
Scaffold184317 456 11046 + Code:
Scaffold184317 457 11046 + |
Try regtools - https://github.com/griffithlab/regtools, the `junctions extract` command will do this.
|
All times are GMT -8. The time now is 02:31 AM. |
Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.