SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tophat: generate junctions.bed file from BAM file Julien Roux Bioinformatics 5 01-14-2016 04:19 PM
How to convert sam/bam or bed file into wig file roll Bioinformatics 5 02-05-2014 05:43 AM
Missing accepted_hits.bam and junctions.bed files eriklq Bioinformatics 1 09-26-2013 01:16 PM
tophat junctions.bed file upendra_35 RNA Sequencing 2 10-23-2012 08:17 AM
How to make sense of Tophat's output file 'junctions.bed' gsinghal RNA Sequencing 4 09-03-2012 07:49 AM

Reply
 
Thread Tools
Old 06-06-2014, 01:43 PM   #1
adrian
Member
 
Location: baltimore

Join Date: Oct 2009
Posts: 89
Default Bam file to junctions.bed

Hi:

I received aligned BAM file and do not have a raw sequence file.

My aim is to count how many reads skip exon 7 of a gene and how many reads do not skip, in addition to reads that span exons 6 and 7 ; 7 and 8.

Ex 6-------- Ex 7 -------- Ex 8
___________ __________ => Condition1: reads that span 6-7 and exs 7-8
____//////////////////////___ => Condition 2:reads skipping exon 7
___________________ => Condition 3: reads that span exon 6,7 and end in 8.



Is there a way to get numbers from my BAM file for above 3 conditions.

Yes, if I have raw reads, i would use TopHat and get junctions.bed to deduce these. However, the BAM file was not generated using TopHat and I don't have access to raw sequences.

Is there a way to get junctions.bed - perhaps convert bam to FASTA and then realign using Tophat. This potentially 'would' corrupt the paired end structure..leading to loss of read numbers..( I am not so sure about this though)

Or

Is there any other smart way to just count reads that jump exon 7.

Appreciate any response. Thanks a lot.

Adrian
adrian is offline   Reply With Quote
Old 06-07-2014, 02:53 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,476
Default

The simplest method would be to just script this in pysam (or whatever language you prefer).

BTW, you can convert the BAM file to fastq and realign that, but it's faster to just write a little python script.
dpryan is offline   Reply With Quote
Old 06-07-2014, 09:22 AM   #3
adrian
Member
 
Location: baltimore

Join Date: Oct 2009
Posts: 89
Default

Thank you.
Is there a particular function that I could use? If not what would be the logic to get those read stats.

thanks
Adrian
adrian is offline   Reply With Quote
Old 06-07-2014, 01:58 PM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,476
Default

The general idea is to:
  1. Iterate over the reads
  2. For each read, get its start and end position.
  3. If at least one of the exons could be between those coordinates then get the CIGAR
  4. Parse the CIGAR string into a sequence of aligned regions
  5. For each region, note if it overlaps one of your exons. Add that to a vector or a data structure of your choice (you could even just use an integer as a bitmap).
  6. Once you've iterated through the aligned regions for a read of interest, look at the structure from the previous step and proceed as desired.

That's the general idea. If your BAM file is coordinate sorted and indexed, then you can simply request the reads covering the regions of interest, which will make things a bit quicker.
dpryan is offline   Reply With Quote
Old 06-08-2014, 04:06 AM   #5
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 799
Default

Here's a rough idea of how to do bam2fastq:
Code:
samtools view file.bam | awk -F '\t' '{print ">"$1"\n"$10"\n+\n"$11}' > file.fastq
Unfortunately this will give you a fastq file with interleaved reads, which can be a little bit of a pain to use. You can use the filter function (-f / -F) of samtools view to get around that, reading through the BAM file twice:

Code:
samtools view -f 0x40 file.bam | awk -F '\t' '{print ">"$1"\n"$10"\n+\n"$11}' > file_R1.fastq
samtools view -f 0x80 file.bam | awk -F '\t' '{print ">"$1"\n"$10"\n+\n"$11}' > file_R2.fastq
The SAM File format specification is your friend, see section 1.4.

The process of BAM -> FASTQ -> Tophat is slower in terms of computer time, but from your description it sounds like it will be quicker in terms of bum-on-seat time.

Last edited by gringer; 06-08-2014 at 04:08 AM.
gringer is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:41 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO