View Single Post
Old 06-07-2014, 12:58 PM   #4
Devon Ryan
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480

The general idea is to:
  1. Iterate over the reads
  2. For each read, get its start and end position.
  3. If at least one of the exons could be between those coordinates then get the CIGAR
  4. Parse the CIGAR string into a sequence of aligned regions
  5. For each region, note if it overlaps one of your exons. Add that to a vector or a data structure of your choice (you could even just use an integer as a bitmap).
  6. Once you've iterated through the aligned regions for a read of interest, look at the structure from the previous step and proceed as desired.

That's the general idea. If your BAM file is coordinate sorted and indexed, then you can simply request the reads covering the regions of interest, which will make things a bit quicker.
dpryan is offline   Reply With Quote