Hi all,
I got some short (~220bp) pair-ended DNA sequences targeted in one region. The raw data are in fastq and I aligned them using bwa, which give me sam file. Now I want to extract some positions from each read. For example,
The original aligned read sequence is:
TTGAATGGGGGATGTTTTTGGGATATAGATTATGTTTTTATATC
I want to extract position 3, 6, 8, so I want:
GTG
on each line.
The reason that I can't simply pick up the position from sam file is that sometimes there are indels which will shift my positions.
I googled this problem but most answers only give me the count summary of each position like using pileup, bamtobed... But in my case, the sequence of GTG is important and can not be split. Namely, I'm interested in how many GTGs rather than how many G, how many T and how many G.
Any help is very appreciated!
I got some short (~220bp) pair-ended DNA sequences targeted in one region. The raw data are in fastq and I aligned them using bwa, which give me sam file. Now I want to extract some positions from each read. For example,
The original aligned read sequence is:
TTGAATGGGGGATGTTTTTGGGATATAGATTATGTTTTTATATC
I want to extract position 3, 6, 8, so I want:
GTG
on each line.
The reason that I can't simply pick up the position from sam file is that sometimes there are indels which will shift my positions.
I googled this problem but most answers only give me the count summary of each position like using pileup, bamtobed... But in my case, the sequence of GTG is important and can not be split. Namely, I'm interested in how many GTGs rather than how many G, how many T and how many G.
Any help is very appreciated!
Comment