Seqanswers Leaderboard Ad

**swbarnes2** · 12-13-2013, 01:53 PM

Illumina reads are error-prone; most of the discrepancies you see will be artifacts. Are you trying to count artifacts?

**shoegame2001** · 12-13-2013, 01:54 PM

Yes, exactly. Both artifacts/errors as well as mismatches caused by anything else (e.g. true polymorphism, mapping error, etc.).

**lindenb** · 12-13-2013, 03:44 PM

I wrote a small java program for Biostars ( http://www.biostars.org/p/59647/ ) : see : https://github.com/lindenb/jvarkit/wiki/Biostar59647

Code:

$ java -jar dist/biostar59647.jar -r samtools-0.1.18/examples/toy.fa  samtools-0.1.18/examples/toy.bam |\
xmllint --format - 

<?xml version="1.0" encoding="UTF-8"?>
<sam ref="/home/lindenb/samtools-0.1.18/examples/toy.bam" bam="/home/lindenb/samtools-0.1.18/examples/toy.fa">
  <!---r /home/lindenb/samtools-0.1.18/examples/toy.fa /home/lindenb/samtools-0.1.18/examples/toy.bam-->
  <read>
    <name>r001</name>
    <sequence>TTAGATAAAGAGGATACTG</sequence>
    <flags READ_PAIRED="true" READ_MAPPED_IN_PROPER_PAIR="true" READ_UNMAPPED="false" MATE_UNMAPPED="false" READ
_REVERSE_STRAND="false" MATE_REVERSE_STRAND="true" FIRST_IN_PAIR="false" SECOND_IN_PAIR="true" NOT_PRIMARY_ALIGN
MENT="false" READ_FAILS_VENDOR_QUALITY_CHECK="false" READ_IS_DUPLICATE="false" SUPPLEMENTARY_ALIGNMENT="false">1
63</flags>
    <qual>30</qual>
    <chrom index="0">ref</chrom>
    <pos>7</pos>
    <cigar>8M4I4M1D3M</cigar>
    <mate-chrom index="0">ref</mate-chrom>
    <mate-pos>37</mate-pos>
    <align>
      <M read-index="1" read-base="T" ref-index="7" ref-base="T"/>
      <M read-index="2" read-base="T" ref-index="8" ref-base="T"/>
      <M read-index="3" read-base="A" ref-index="9" ref-base="A"/>
      <M read-index="4" read-base="G" ref-index="10" ref-base="G"/>
      <M read-index="5" read-base="A" ref-index="11" ref-base="A"/>
      <M read-index="6" read-base="T" ref-index="12" ref-base="T"/>
      <M read-index="7" read-base="A" ref-index="13" ref-base="A"/>
      <M read-index="8" read-base="A" ref-index="14" ref-base="A"/>
      <I read-index="9" read-base="A"/>
      <I read-index="10" read-base="G"/>
      <I read-index="11" read-base="A"/>
      <I read-index="12" read-base="G"/>
      <M read-index="13" read-base="G" ref-index="15" ref-base="G"/>
      <M read-index="14" read-base="A" ref-index="16" ref-base="A"/>
      <M read-index="15" read-base="T" ref-index="17" ref-base="T"/>
      <M read-index="16" read-base="A" ref-index="18" ref-base="A"/>
      <D ref-index="19" ref-base="G"/>
      <M read-index="17" read-base="C" ref-index="20" ref-base="C"/>
      <M read-index="18" read-base="T" ref-index="21" ref-base="T"/>
      <M read-index="19" read-base="G" ref-index="22" ref-base="G"/>
    </align>
  </read>
(...)

**shoegame2001** · 12-13-2013, 04:17 PM

This is perfect. Thanks so much!

**lindenb** · 12-14-2013, 07:13 AM

if you don't want to parse the XML , you might also like that tool: https://github.com/lindenb/jvarkit/wiki/SAM2Tsv

Code:

$ java -jar dist/sam2tsv.jar -A  \
    -r samtools-0.1.18/examples/toy.fa 
      samtools-0.1.18/examples/toy.sam

r001	ref	0	T	7	T	M
r001	ref	1	T	8	T	M
r001	ref	2	A	9	A	M
r001	ref	3	G	10	G	M
r001	ref	4	A	11	A	M
r001	ref	5	T	12	T	M
r001	ref	6	A	13	A	M
r001	ref	7	A	14	A	M
r001	ref	8	A	.	.	I
r001	ref	9	G	.	.	I
r001	ref	10	A	.	.	I
r001	ref	11	G	.	.	I
r001	ref	12	G	15	G	M
r001	ref	13	A	16	A	M
r001	ref	14	T	17	T	M
r001	ref	15	A	18	A	M
r001	ref	.	.	19	G	D
r001	ref	16	C	20	C	M
r001	ref	17	T	21	T	M
r001	ref	18	G	22	G	M
:   ref        7 TTAGATAAAGAGGATA-CTG 22      
:                ||||||||    |||| |||
:  r001        1 TTAGATAA----GATAGCTG 19      
r002	ref	0	A	.	.	S
r002	ref	1	A	.	.	I
r002	ref	2	A	.	.	I
r002	ref	3	A	9	A	M
r002	ref	4	G	10	G	M

**shoegame2001** · 03-27-2014, 01:24 PM

Great. I was wondering if it is possible for this tool to also extract the location of deletions in the reads. I see that for insertions, it reports the location of the insertion in the read but not in the reference. I realize that if the sequence is not present in the read, then there can't really be an index of that position, but the position after which the deletion occurs would be really helpful. Maybe there is another way to get this information from the CIGAR string?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 58 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Mismatch positions in BAM

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News