Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Jonathan
    Member
    • Jun 2009
    • 36

    MAQ - colorspace alignment troubles

    Hi all,
    before I went on to trying BFAST and BowTie for my colorspace-alignment problem,
    I thought I'd ask arround here, because this is tooo much of bug-like to have been missed by a lot of people.

    So I have this nice little colorspace dataset (70mio 50nt reads, SE)
    and feed it to maq, reference is a mightyly reduced hg18 set.

    Steps 1-5 (from http://maq.sourceforge.net/color.shtml ) work fine, 6 - the mapping, too,
    an intermediate maq merge is fine too.

    Step 7 has a nifty little requirement that had me debugging MAQ for a day
    Code:
    maq csmap2nt aln.nt.map ref.bfa aln.cs.map
    (It uses a hash based on the seq-name, and with multiple identical fasta-tags, discards most matches)

    On to the usual SNP-calling, but oh wonder:
    I'm getting tons of SNPs - below in the pileup-view:
    Code:
    entg|EIF1AY:ccds|CCDS14795.1_1  1       A       0       @       
    entg|EIF1AY:ccds|CCDS14795.1_1  2       T       0       @
    entg|EIF1AY:ccds|CCDS14795.1_1  3       A       0       @ 
    entg|EIF1AY:ccds|CCDS14795.1_1  4       G       0       @  
    entg|EIF1AY:ccds|CCDS14795.1_1  5       C       1       @a 
    entg|EIF1AY:ccds|CCDS14795.1_1  6       A       2       @.,  
    entg|EIF1AY:ccds|CCDS14795.1_1  7       A       2       @., 
    entg|EIF1AY:ccds|CCDS14795.1_1  8       A       2       @gG
    entg|EIF1AY:ccds|CCDS14795.1_1  9       G       3       @.,,   
    entg|EIF1AY:ccds|CCDS14795.1_1  10      A       4       @.CCC
    entg|EIF1AY:ccds|CCDS14795.1_1  11      C       4       @gGGG 
    entg|EIF1AY:ccds|CCDS14795.1_1  12      T       4       @aAAA   
    entg|EIF1AY:ccds|CCDS14795.1_1  13      T       7       @cCCCCCC 
    entg|EIF1AY:ccds|CCDS14795.1_1  14      G       8       @aAAAAAAA 
    entg|EIF1AY:ccds|CCDS14795.1_1  15      G       9       @.,,,,,,,, 
    entg|EIF1AY:ccds|CCDS14795.1_1  16      A       9       @.,,,,,,,, 
    entg|EIF1AY:ccds|CCDS14795.1_1  17      A       9       @cCCCCCCCC 
    entg|EIF1AY:ccds|CCDS14795.1_1  18      C       9       @aAAAAAAAA 
    entg|EIF1AY:ccds|CCDS14795.1_1  19      C       10      @.,,,,,,,,,
    entg|EIF1AY:ccds|CCDS14795.1_1  20      A       10      @.,,,,,,,,, 
    entg|EIF1AY:ccds|CCDS14795.1_1  21      A       10      @cCCCCCCCCC 
    entg|EIF1AY:ccds|CCDS14795.1_1  22      C       10      @aAAAAAAAAA  
    entg|EIF1AY:ccds|CCDS14795.1_1  23      C       10      @tAAAAAAAAA 
    entg|EIF1AY:ccds|CCDS14795.1_1  24      C       11      @g,,,,,,,,,. 
    entg|EIF1AY:ccds|CCDS14795.1_1  25      A       12      @.,,,,,,,,,.,   
    entg|EIF1AY:ccds|CCDS14795.1_1  26      A       12      @.,,,,,,,,,.,   
    entg|EIF1AY:ccds|CCDS14795.1_1  27      A       12      @tTTTTTTTTTtT   
    entg|EIF1AY:ccds|CCDS14795.1_1  28      T       13      @cCCCCCCCCC.CC  
    entg|EIF1AY:ccds|CCDS14795.1_1  29      G       14      @cCCCCCCCCCcCCt 
    entg|EIF1AY:ccds|CCDS14795.1_1  30      T       14      @gGGGGGGGGGgGGg 
    entg|EIF1AY:ccds|CCDS14795.1_1  31      C       15      @aAAAAAAAAAaAAaA 
    entg|EIF1AY:ccds|CCDS14795.1_1  32      C       16      @.,,,,,,,,,.,,.,,
    this goes on like this - with every second or third position being a close to 100% pure - hm, on other occasions, I'd tend to call it SNP - something.

    So I went ahead, and extracted one of the sequences contributing to the above pileup,
    extracted it from the csfasta file, decoded it, matched manually the two sequence-strings
    (one supplied by maq mapview, one by the conversion of csfasta to fasta)
    Code:
    ttggaaccaacccaaatgtccaacaatgatagactggattaagaaaatgcggcacatatacaccatgg
      TGAACCAACCCAAATGTCCAACAATGATAGACTGGATTAAGAAAATGTGAT
       GACACACAACAATCCGACacCATCgTTGgCGCAgtATAggaaatcccgt
    Line 2 begins at pileup pos. 14 - or rather, what we see listed in the pileup is the sequence from line 3. Totally *NOT* matching. Generating enourmous amounts of SNPs.
    While the manually csfasta-fasta converted sequence matches close to perfect - just what I'd expect.

    So anyone an idea whats going on? Someone ought to have seen something alike since I did nothing more but follow the plot, using a unmodified maq-0.7.1, self comiled on Fedora 12 x86_64.

    Ah, btw. bwa segfaults on the same dataset when trying to do the 'samse' step.

    Best
    -Jonathan
  • Jonathan
    Member
    • Jun 2009
    • 36

    #2
    Originally posted by Jonathan View Post
    So I went ahead, and extracted one of the sequences contributing to the above pileup,
    extracted it from the csfasta file, decoded it, matched manually the two sequence-strings
    (one supplied by maq mapview, one by the conversion of csfasta to fasta)
    Code:
    ttggaaccaacccaaatgtccaacaatgatagactggattaagaaaatgcggcacatatacaccatgg
      TGAACCAACCCAAATGTCCAACAATGATAGACTGGATTAAGAAAATGTGAT
       GACACACAACAATCCGACacCATCgTTGgCGCAgtATAggaaatcccgt
    Line 2 begins at pileup pos. 14 - or rather, what we see listed in the pileup is the sequence from line 3. Totally *NOT* matching. Generating enourmous amounts of SNPs.
    While the manually csfasta-fasta converted sequence matches close to perfect - just what I'd expect.
    A note on the above: I just found out that the missmatching line (the 3rd) is the 'double encoded'(DE) colorspace line. This encoding takes place when merging the csfasta and _qv.qual using the solid2fastq.pl script supplied with MAQ.

    The csmap2nt step *should* do some conversion, though I have yet to understand that code and then again yet to see why it fails to convert the DE-colorspace-reads to nt-reads

    Best
    -Jonathan

    Comment

    Latest Articles

    Collapse

    • GATTACAT
      Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
      by GATTACAT
      Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
      07-01-2026, 11:43 AM
    • SEQadmin2
      Nine Things a Sample Prep Scientist Thinks About Before Sequencing
      by SEQadmin2


      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

      Here are nine questions we think about, in roughly the order they matter, before...
      06-18-2026, 07:11 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by SEQadmin2, Yesterday, 11:08 AM
    0 responses
    7 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-30-2026, 05:37 AM
    0 responses
    11 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-26-2026, 11:10 AM
    0 responses
    19 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-17-2026, 06:09 AM
    0 responses
    53 views
    0 reactions
    Last Post SEQadmin2  
    Working...