![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Removing pairs that align to almost the same positions from bam | ShellfishGene | Bioinformatics | 2 | 07-16-2013 03:00 AM |
Bam Output Mismatch By Contig | crkessen | Bioinformatics | 2 | 06-10-2013 01:28 PM |
Mismatch between BAM and GFF file | Siva | Illumina/Solexa | 4 | 01-13-2013 03:45 PM |
BWA change mismatch penalty between N-mismatch and base-mismatch | andreac | Bioinformatics | 0 | 12-19-2012 09:05 AM |
How to get the exact mismatch positions in Tophat output | maplesword | Bioinformatics | 1 | 09-07-2010 01:22 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: California Join Date: Dec 2010
Posts: 21
|
![]()
Does anyone know of a tool that can output the positions of mismatches for each alignment in a BAM file? I need the coordinates for the mismatch positions within the read as well as the coordinate in the reference sequence to which it aligns. I have found tools that can do one or the other (SAMStat, samtools mpileup, NGSUtils basecall), but they seem to give inconsistent results. Or will I have to extract the information from the CIGAR string or MD tag manually?
Thanks! |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
Illumina reads are error-prone; most of the discrepancies you see will be artifacts. Are you trying to count artifacts?
|
![]() |
![]() |
![]() |
#3 |
Member
Location: California Join Date: Dec 2010
Posts: 21
|
![]()
Yes, exactly. Both artifacts/errors as well as mismatches caused by anything else (e.g. true polymorphism, mapping error, etc.).
|
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: France Join Date: Apr 2010
Posts: 143
|
![]()
I wrote a small java program for Biostars ( http://www.biostars.org/p/59647/ ) : see : https://github.com/lindenb/jvarkit/wiki/Biostar59647
Code:
$ java -jar dist/biostar59647.jar -r samtools-0.1.18/examples/toy.fa samtools-0.1.18/examples/toy.bam |\ xmllint --format - <?xml version="1.0" encoding="UTF-8"?> <sam ref="/home/lindenb/samtools-0.1.18/examples/toy.bam" bam="/home/lindenb/samtools-0.1.18/examples/toy.fa"> <!---r /home/lindenb/samtools-0.1.18/examples/toy.fa /home/lindenb/samtools-0.1.18/examples/toy.bam--> <read> <name>r001</name> <sequence>TTAGATAAAGAGGATACTG</sequence> <flags READ_PAIRED="true" READ_MAPPED_IN_PROPER_PAIR="true" READ_UNMAPPED="false" MATE_UNMAPPED="false" READ _REVERSE_STRAND="false" MATE_REVERSE_STRAND="true" FIRST_IN_PAIR="false" SECOND_IN_PAIR="true" NOT_PRIMARY_ALIGN MENT="false" READ_FAILS_VENDOR_QUALITY_CHECK="false" READ_IS_DUPLICATE="false" SUPPLEMENTARY_ALIGNMENT="false">1 63</flags> <qual>30</qual> <chrom index="0">ref</chrom> <pos>7</pos> <cigar>8M4I4M1D3M</cigar> <mate-chrom index="0">ref</mate-chrom> <mate-pos>37</mate-pos> <align> <M read-index="1" read-base="T" ref-index="7" ref-base="T"/> <M read-index="2" read-base="T" ref-index="8" ref-base="T"/> <M read-index="3" read-base="A" ref-index="9" ref-base="A"/> <M read-index="4" read-base="G" ref-index="10" ref-base="G"/> <M read-index="5" read-base="A" ref-index="11" ref-base="A"/> <M read-index="6" read-base="T" ref-index="12" ref-base="T"/> <M read-index="7" read-base="A" ref-index="13" ref-base="A"/> <M read-index="8" read-base="A" ref-index="14" ref-base="A"/> <I read-index="9" read-base="A"/> <I read-index="10" read-base="G"/> <I read-index="11" read-base="A"/> <I read-index="12" read-base="G"/> <M read-index="13" read-base="G" ref-index="15" ref-base="G"/> <M read-index="14" read-base="A" ref-index="16" ref-base="A"/> <M read-index="15" read-base="T" ref-index="17" ref-base="T"/> <M read-index="16" read-base="A" ref-index="18" ref-base="A"/> <D ref-index="19" ref-base="G"/> <M read-index="17" read-base="C" ref-index="20" ref-base="C"/> <M read-index="18" read-base="T" ref-index="21" ref-base="T"/> <M read-index="19" read-base="G" ref-index="22" ref-base="G"/> </align> </read> (...) |
![]() |
![]() |
![]() |
#5 |
Member
Location: California Join Date: Dec 2010
Posts: 21
|
![]()
This is perfect. Thanks so much!
|
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: France Join Date: Apr 2010
Posts: 143
|
![]()
if you don't want to parse the XML , you might also like that tool: https://github.com/lindenb/jvarkit/wiki/SAM2Tsv
Code:
$ java -jar dist/sam2tsv.jar -A \ -r samtools-0.1.18/examples/toy.fa samtools-0.1.18/examples/toy.sam r001 ref 0 T 7 T M r001 ref 1 T 8 T M r001 ref 2 A 9 A M r001 ref 3 G 10 G M r001 ref 4 A 11 A M r001 ref 5 T 12 T M r001 ref 6 A 13 A M r001 ref 7 A 14 A M r001 ref 8 A . . I r001 ref 9 G . . I r001 ref 10 A . . I r001 ref 11 G . . I r001 ref 12 G 15 G M r001 ref 13 A 16 A M r001 ref 14 T 17 T M r001 ref 15 A 18 A M r001 ref . . 19 G D r001 ref 16 C 20 C M r001 ref 17 T 21 T M r001 ref 18 G 22 G M : ref 7 TTAGATAAAGAGGATA-CTG 22 : |||||||| |||| ||| : r001 1 TTAGATAA----GATAGCTG 19 r002 ref 0 A . . S r002 ref 1 A . . I r002 ref 2 A . . I r002 ref 3 A 9 A M r002 ref 4 G 10 G M |
![]() |
![]() |
![]() |
#7 |
Member
Location: California Join Date: Dec 2010
Posts: 21
|
![]()
Great. I was wondering if it is possible for this tool to also extract the location of deletions in the reads. I see that for insertions, it reports the location of the insertion in the read but not in the reference. I realize that if the sequence is not present in the read, then there can't really be an index of that position, but the position after which the deletion occurs would be really helpful. Maybe there is another way to get this information from the CIGAR string?
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|