I just became a bioinformatician (is that what we're called?) a few weeks ago, and am surprised that I can't find anything in samtools that will output a subset of fields from a SAM/BAM file. It seems to be entirely geared toward filtering "horizontally", for subsets of alignment records. I know that's a major need, but it's not actually what I need at the moment!
What I'm looking for is a derivative file with, for example, just one long list of CIGAR strings. I don't care what reads or alignments they're from; I just want to do some statistics on them in the aggregate.
What I'll do in the meantime is just parse the SAM with python to make my derivative file. But surely samtools would be better at it than I am. Anyone know how to make it do what I want? Thanks in advance!
(I'm excited to see the new python wrapper for samtools, which I'll probably check out soon.)
ANSWER from below, thanks to nilshomer: Just operate directly on the SAM-formatted stream that comes out of samtools view. E.g., samtools view <bamname> | awk <whatever field you need> | <further processing>.
What I'm looking for is a derivative file with, for example, just one long list of CIGAR strings. I don't care what reads or alignments they're from; I just want to do some statistics on them in the aggregate.
What I'll do in the meantime is just parse the SAM with python to make my derivative file. But surely samtools would be better at it than I am. Anyone know how to make it do what I want? Thanks in advance!
(I'm excited to see the new python wrapper for samtools, which I'll probably check out soon.)
ANSWER from below, thanks to nilshomer: Just operate directly on the SAM-formatted stream that comes out of samtools view. E.g., samtools view <bamname> | awk <whatever field you need> | <further processing>.
Comment