![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
extract alignment from SAM with a GFF file | NicoBxl | Bioinformatics | 4 | 08-02-2011 02:45 PM |
Extract perfectly mapped reads from SAM/BAM file | Graham Etherington | Bioinformatics | 2 | 07-21-2011 08:27 AM |
Extract Gap and Mismatch From MAF alignment Output | peveralldubois | Bioinformatics | 0 | 01-14-2011 08:23 PM |
BWA: specifying SAM/BAM file header fields before read alignment? | nora | Bioinformatics | 3 | 12-04-2010 10:11 PM |
Filter BAM records by positions using picard | guavajuice | Bioinformatics | 0 | 04-02-2010 03:45 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Netherlands Join Date: Dec 2009
Posts: 13
|
![]()
Hi,
What would be the fastest way (I have to do this hundreds of millions times) to extract for each aligned read in a BAM file: 1) The positions where the read bases differ from a reference sequence. 2) The PHRED base quality values of these bases. If the difference is an indel, the quality value will, of course, be skipped. As far as I know, I cannot use mpileup or anything I know of due to memory limitation as this is a very custom amplicon reference analysis, with >500 million coverage per base position on the reference amplicon. In short, I need to apply an efficient approach to extract all differing positions for each aligned read. Thanks. Last edited by CHRYSES; 12-14-2011 at 06:52 AM. |
![]() |
![]() |
![]() |
#2 |
Super Moderator
Location: US Join Date: Nov 2009
Posts: 437
|
![]()
samtools and bamtools both provide very fast APIs you can use, however this requires a minimal experience with a programming language or script...
|
![]() |
![]() |
![]() |
#3 | |
Member
Location: Netherlands Join Date: Dec 2009
Posts: 13
|
![]() Quote:
I could go directly into text format (i.e. SAM) and parse it with PERL, but that's really very slooooooow. |
|
![]() |
![]() |
![]() |
#4 |
Super Moderator
Location: US Join Date: Nov 2009
Posts: 437
|
![]()
If you are going to examine every read in order then I suppose you could parse a giant text file. This sort of sequential analysis is hard to speed up unless you parallelize it. If you are parsing a text file you could break it into many parts and then run your PERL script on the many parts in parallel (if you have access to that kind of equipment).
I am not aware of an off the shelf tool. Sorry! Personally I opt for pthreads and C/C++... |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
Illumina reads are error prone. If you pull every single read with a discrepancy from reference, you are going to pull a lot of noise.
I don't think that a pileup can be generated with only variant positions, but you could grep the pileup to only get lines with alterante letters. The pileup will have the position, all the letters called by all the reads that cross the position, and all the qualities for all the reads that cross the position. |
![]() |
![]() |
![]() |
#6 | |
Member
Location: Netherlands Join Date: Dec 2009
Posts: 13
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
Tags |
bam parse samtools |
Thread Tools | |
|
|