Seqanswers Leaderboard Ad

**dpryan** · 06-07-2014, 01:53 AM

The simplest method would be to just script this in pysam (or whatever language you prefer).

BTW, you can convert the BAM file to fastq and realign that, but it's faster to just write a little python script.

**adrian** · 06-07-2014, 08:22 AM

Thank you.
Is there a particular function that I could use? If not what would be the logic to get those read stats.

thanks
Adrian

**dpryan** · 06-07-2014, 12:58 PM

The general idea is to:

Iterate over the reads
For each read, get its start and end position.
If at least one of the exons could be between those coordinates then get the CIGAR
Parse the CIGAR string into a sequence of aligned regions
For each region, note if it overlaps one of your exons. Add that to a vector or a data structure of your choice (you could even just use an integer as a bitmap).
Once you've iterated through the aligned regions for a read of interest, look at the structure from the previous step and proceed as desired.

That's the general idea. If your BAM file is coordinate sorted and indexed, then you can simply request the reads covering the regions of interest, which will make things a bit quicker.

**gringer** · 06-08-2014, 03:06 AM

Here's a rough idea of how to do bam2fastq:

Code:

samtools view file.bam | awk -F '\t' '{print ">"$1"\n"$10"\n+\n"$11}' > file.fastq

Unfortunately this will give you a fastq file with interleaved reads, which can be a little bit of a pain to use. You can use the filter function (-f / -F) of samtools view to get around that, reading through the BAM file twice:

Code:

samtools view -f 0x40 file.bam | awk -F '\t' '{print ">"$1"\n"$10"\n+\n"$11}' > file_R1.fastq
samtools view -f 0x80 file.bam | awk -F '\t' '{print ">"$1"\n"$10"\n+\n"$11}' > file_R2.fastq

The SAM File format specification is your friend, see section 1.4.

The process of BAM -> FASTQ -> Tophat is slower in terms of computer time, but from your description it sounds like it will be quicker in terms of bum-on-seat time.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Bam file to junctions.bed

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News