![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Tophat: generate junctions.bed file from BAM file | Julien Roux | Bioinformatics | 5 | 01-14-2016 03:19 PM |
How to convert sam/bam or bed file into wig file | roll | Bioinformatics | 5 | 02-05-2014 04:43 AM |
Missing accepted_hits.bam and junctions.bed files | eriklq | Bioinformatics | 1 | 09-26-2013 12:16 PM |
tophat junctions.bed file | upendra_35 | RNA Sequencing | 2 | 10-23-2012 07:17 AM |
How to make sense of Tophat's output file 'junctions.bed' | gsinghal | RNA Sequencing | 4 | 09-03-2012 06:49 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: baltimore Join Date: Oct 2009
Posts: 89
|
![]()
Hi:
I received aligned BAM file and do not have a raw sequence file. My aim is to count how many reads skip exon 7 of a gene and how many reads do not skip, in addition to reads that span exons 6 and 7 ; 7 and 8. Ex 6-------- Ex 7 -------- Ex 8 ___________ __________ => Condition1: reads that span 6-7 and exs 7-8 ____//////////////////////___ => Condition 2:reads skipping exon 7 ___________________ => Condition 3: reads that span exon 6,7 and end in 8. Is there a way to get numbers from my BAM file for above 3 conditions. Yes, if I have raw reads, i would use TopHat and get junctions.bed to deduce these. However, the BAM file was not generated using TopHat and I don't have access to raw sequences. Is there a way to get junctions.bed - perhaps convert bam to FASTA and then realign using Tophat. This potentially 'would' corrupt the paired end structure..leading to loss of read numbers..( I am not so sure about this though) Or Is there any other smart way to just count reads that jump exon 7. Appreciate any response. Thanks a lot. Adrian |
![]() |
![]() |
![]() |
#2 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,478
|
![]()
The simplest method would be to just script this in pysam (or whatever language you prefer).
BTW, you can convert the BAM file to fastq and realign that, but it's faster to just write a little python script. |
![]() |
![]() |
![]() |
#3 |
Member
Location: baltimore Join Date: Oct 2009
Posts: 89
|
![]()
Thank you.
Is there a particular function that I could use? If not what would be the logic to get those read stats. thanks Adrian |
![]() |
![]() |
![]() |
#4 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,478
|
![]()
The general idea is to:
That's the general idea. If your BAM file is coordinate sorted and indexed, then you can simply request the reads covering the regions of interest, which will make things a bit quicker. |
![]() |
![]() |
![]() |
#5 |
David Eccles (gringer)
Location: Wellington, New Zealand Join Date: May 2011
Posts: 812
|
![]()
Here's a rough idea of how to do bam2fastq:
Code:
samtools view file.bam | awk -F '\t' '{print ">"$1"\n"$10"\n+\n"$11}' > file.fastq Code:
samtools view -f 0x40 file.bam | awk -F '\t' '{print ">"$1"\n"$10"\n+\n"$11}' > file_R1.fastq samtools view -f 0x80 file.bam | awk -F '\t' '{print ">"$1"\n"$10"\n+\n"$11}' > file_R2.fastq The process of BAM -> FASTQ -> Tophat is slower in terms of computer time, but from your description it sounds like it will be quicker in terms of bum-on-seat time. Last edited by gringer; 06-08-2014 at 03:08 AM. |
![]() |
![]() |
![]() |
Thread Tools | |
|
|