Hello everyone, I,ve been looking for an answer since a few day for a question that not seem so difficult...
How can I extract subsequences from a BAM/SAM alignment?
I have assembled Fastq paired end data from an Illumina run with BWA mem vs. a reference BaL_V3_extend (506 bp long).
Fastq reads are 250 bp long overlapping on 70 bp.
The SAM file was filtered and converted to BAM to retrieve only the mapped sequence on the reference :
I'm interested in recovering parts of this alignment (I mean the sequences) that cover region 197-301 (105 bp) of my reference.
I tried bedtools :
V3.bed contains :
which is my region of interest, but V3_mapped.bam contains the ~430 bp long , not the 105 bp alignment :
I also tried in BED format :
The V3_mapped.bed contains :
But how could I extract the subalignment sequences (105bp) from my mapped.bam file?
Thanks for your advices.
How can I extract subsequences from a BAM/SAM alignment?
I have assembled Fastq paired end data from an Illumina run with BWA mem vs. a reference BaL_V3_extend (506 bp long).
Fastq reads are 250 bp long overlapping on 70 bp.
The SAM file was filtered and converted to BAM to retrieve only the mapped sequence on the reference :
Code:
samtools view -SbhF 4 raw.sam > mapped.bam
I tried bedtools :
Code:
intersectBed -abam mapped.bam -b V3.bed > V3_mapped.bam
Code:
BaL_V3_extend 196 301 V3_loop
Code:
HWI-M00185:92:000000000-AAE9F:1:1101:18230:1606 0 BaL_V3_extend 46 60 219M3D36M164S * 0 0 ACAATGCACACATGGAATTAGGCCAGTGGTATCAACTCAATTGCTGTTAAATGGCAGTCTAGCAGAAGAAGATGTAATAATCAGATCTGACAATTTCACAGACAATGTTAAAACCATAATAGTACAGCTAAACGAAACTGTAAAAATTAATTGCACAAGACCCAACAACAATACAAGGAGAAGTATACATATGGGACCAGGGAGAGCATTTTATGGAACAGATATAATAGGAGATATAAGACAAGCTCATTGTAATGTTAGTGAAAAAGATTGGAAGAAAACTCTACGGAGGGTAGCTATAAAATTAAAGGAACAATTTAATAAAACAATAATTTTTAATCACTCCTCAGGAGGGGATCCAGAAATTGTAATGCACACCTTTAATTGCAGAGGGGAATTTTTCTACTGTAATACATCAC CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGG@DGGGGG>FGGFGFEGGGDGGGGGGGFFGDDGGGEGGGFFGFGFFEFGFDFDGEFGFGDDGFGGGFFFFFFGGFGE9CFGGGGGGGGGGGGF?FAFGFBAGGGFFAGGFFFGGGGC?CEDFGGGGGGGGGGGGGGGGGGGGGGGGFFE9AGGGGGFGGGGGFFBGFGGGGGGGGGGCGGGGGGGGGGDGGGGGGGGGGGGGGGGGFFFFGGGGFFGGGFGGGGGGGGGGGGGGGGGGGGGGGGCCCBC NM:i:29 MD:Z:6T8G11A12C31G3G4T5C2A6G1G7C21G2T3T5G10T23A1A12A8C8A3A0C3^AGG3A23A8 AS:i:116 XS:i:0 SA:Z:BaL_V3_extend,346,+,297S22M3D100M,29,18; HWI-M00185:92:000000000-AAE9F:1:1101:12377:1640 0 BaL_V3_extend 46 60 219M3D43M3D4M3I14M4D6M4I115M14S * 0 0 ACAATGTACACATGGAATTAAGCCAGTAGTATCAACTCAATTGCTGTTAAATGGCAGTCTAGCAGAAGAAGACATAGTAATCAGATCTGAAAATTTCACAGACAATGCTAAAACCATAATAGTACAGCTGAATGAAACTGTACAAATTAAGTGCACAAGGCCCAACAACAATACAAGGAGAAGTATACATATGGGACCGGGGAGAGCATTTTATGGAACAGATATAATAGGAAATATAAGACAAGCTCATTGTAATGTCAGTGAAAAGGAATGGAAGAAAACTCTACGACGGGTAGCTATAAAATTAAGGGAACAATTTGAGAATAAAACAATAGTCTTTACTCACTCCTCAGGAGGGGATCCAGAAATTGTAATGCACACCTTTAATTGTAGAGGGGAATTTTTCTATTGCAATACATCAC CCCCCFFFFGGGGGGGGGGGGGGGGGGGGGGGGGGGEFGGGFFGDFEGGGGFFCFFGFGGGGGGGGGGGGGGGGAFGFFGG<FFGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCFGFFFFCFGGCFFFGGGGGGFEFGGGGGGFGFGF9FGGGGDFC8FF?FEGGGGCEGGGGG:FGGGG,@DFGGFDDFG8BEGGGGDGCGGF;=(ADFGGG9;ECFGGFFEGGGDFFCFFFFGFGDGA@@6A9DGGGGGCFGGFFE8AFDFCFFEEECGFGGGGGCGFC@,BB,FGGDFFFF9FEBE?8FGF@A<?F<FGDGGEFFAFGGF<GGGGCGGGFGGFGGGFF8GGFGGGGGFFGFGGGGF9FCAFGGFGGGGGGGGFGGGGGCGGGGFFFF?FGGGGGGGFGGGGGGGGFGGFCCCCC NM:i:63 MD:Z:15G4G19C31G0G7T5C9G1G36T5G7T2T5A17A1A12A5A2C8A3A0C3^AGG3A9G13A8C0C1T3^AGA1C9T0G1C3^TTAA0A2A5T12A10G20A0G2T14C11G1C5G0T9G16 AS:i:124 XS:i:0 HWI-M00185:92:000000000-AAE9F:1:1101:8886:2382 0 BaL_V3_extend 46 60 219M3D43M3D4M3I14M4D6M4I129M * 0 0 ACAATGCACACATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGTTAAATGGCAGTCTAGCAGAAGGAGAGGTAGTAATCAGATCTGACAATTTCACAGACAATGTTAAAACCATAATAGTACAGCTAAACGAAACTGTAAAAATTAATTGCACAAGACCCAACAACAATACAAGGAGAAGTATACATATGGGACCAGGGAGAGCATTTTATGGAACAGATATAATAGGAAATATAAGACAAGCTCATTGTAATGTTAGTGAAAAGGAATGGAAGAAAACTCTACGACGGGTAGCTATAAAATTAAGGGAACAATTTGAGAATAAAACAATAGTCTTTACTCACTCCTCAGGAGGGGATCCAGAAATTGTAATGCACACCTTTAATTGTAGAGGGGAACTTTTCTATTGTAATACATCAC CCCCCGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGFGGGFGGFGGGGGFGGGGGGGGGGGGGGGGGGGGGGGFGGGGGEGGCCGGGGGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGFGGGGGG=FGGGGGGGGDFGGFFEDEEGGGG=CFEGFGFGFFFFGGGFGGGGFFFFGGGGFGGGFFFFFFGGFDFFGFE==+FCD9,FDGGFGGGGGGGGDGFFGGGFEE6F@EECCGFFGGGFEGGGGFGFFFF9FAGGGGGGGFEFEEECFFGFCFGGGGFEECGGGGGFGGGGGFEGGGGGGGGGFCGFGGGEFGGGGGGGGGGGGGGGGGFFGGGGGGGGGFEFGGGGFEFGGGGGGGGGGCCCCC NM:i:65 MD:Z:6T8G52A12T5C2A6G1G7C21G2T3T5G10T23A1A12A8C8A3A0C3^AGG3A9G13A8C0C5^AGA1C9T0G1C3^TTAA0A2A5T12A10G20A0G2T14C11G1C5G0T9G8T7C6T2A3 AS:i:131 XS:i:0 ...
Code:
intersectBed -abam mapped.bam -b V3.bed -bed > V3_mapped.bed
Code:
BaL_V3_extend 196 301 HWI-M00185:92:000000000-AAE9F:1:1101:18230:1606 60 + 45 303 0,0,0 1 258, 0, BaL_V3_extend 196 301 HWI-M00185:92:000000000-AAE9F:1:1101:12377:1640 60 + 45 456 0,0,0 1 411, 0, BaL_V3_extend 196 301 HWI-M00185:92:000000000-AAE9F:1:1101:8886:2382 60 + 45 470 0,0,0 1 425, 0, ...
Thanks for your advices.
Comment