Hi all,
before I went on to trying BFAST and BowTie for my colorspace-alignment problem,
I thought I'd ask arround here, because this is tooo much of bug-like to have been missed by a lot of people.
So I have this nice little colorspace dataset (70mio 50nt reads, SE)
and feed it to maq, reference is a mightyly reduced hg18 set.
Steps 1-5 (from http://maq.sourceforge.net/color.shtml ) work fine, 6 - the mapping, too,
an intermediate maq merge is fine too.
Step 7 has a nifty little requirement that had me debugging MAQ for a day
(It uses a hash based on the seq-name, and with multiple identical fasta-tags, discards most matches)
On to the usual SNP-calling, but oh wonder:
I'm getting tons of SNPs - below in the pileup-view:
this goes on like this - with every second or third position being a close to 100% pure - hm, on other occasions, I'd tend to call it SNP - something.
So I went ahead, and extracted one of the sequences contributing to the above pileup,
extracted it from the csfasta file, decoded it, matched manually the two sequence-strings
(one supplied by maq mapview, one by the conversion of csfasta to fasta)
Line 2 begins at pileup pos. 14 - or rather, what we see listed in the pileup is the sequence from line 3. Totally *NOT* matching. Generating enourmous amounts of SNPs.
While the manually csfasta-fasta converted sequence matches close to perfect - just what I'd expect.
So anyone an idea whats going on? Someone ought to have seen something alike since I did nothing more but follow the plot, using a unmodified maq-0.7.1, self comiled on Fedora 12 x86_64.
Ah, btw. bwa segfaults on the same dataset when trying to do the 'samse' step.
Best
-Jonathan
before I went on to trying BFAST and BowTie for my colorspace-alignment problem,
I thought I'd ask arround here, because this is tooo much of bug-like to have been missed by a lot of people.
So I have this nice little colorspace dataset (70mio 50nt reads, SE)
and feed it to maq, reference is a mightyly reduced hg18 set.
Steps 1-5 (from http://maq.sourceforge.net/color.shtml ) work fine, 6 - the mapping, too,
an intermediate maq merge is fine too.
Step 7 has a nifty little requirement that had me debugging MAQ for a day
Code:
maq csmap2nt aln.nt.map ref.bfa aln.cs.map
On to the usual SNP-calling, but oh wonder:
I'm getting tons of SNPs - below in the pileup-view:
Code:
entg|EIF1AY:ccds|CCDS14795.1_1 1 A 0 @ entg|EIF1AY:ccds|CCDS14795.1_1 2 T 0 @ entg|EIF1AY:ccds|CCDS14795.1_1 3 A 0 @ entg|EIF1AY:ccds|CCDS14795.1_1 4 G 0 @ entg|EIF1AY:ccds|CCDS14795.1_1 5 C 1 @a entg|EIF1AY:ccds|CCDS14795.1_1 6 A 2 @., entg|EIF1AY:ccds|CCDS14795.1_1 7 A 2 @., entg|EIF1AY:ccds|CCDS14795.1_1 8 A 2 @gG entg|EIF1AY:ccds|CCDS14795.1_1 9 G 3 @.,, entg|EIF1AY:ccds|CCDS14795.1_1 10 A 4 @.CCC entg|EIF1AY:ccds|CCDS14795.1_1 11 C 4 @gGGG entg|EIF1AY:ccds|CCDS14795.1_1 12 T 4 @aAAA entg|EIF1AY:ccds|CCDS14795.1_1 13 T 7 @cCCCCCC entg|EIF1AY:ccds|CCDS14795.1_1 14 G 8 @aAAAAAAA entg|EIF1AY:ccds|CCDS14795.1_1 15 G 9 @.,,,,,,,, entg|EIF1AY:ccds|CCDS14795.1_1 16 A 9 @.,,,,,,,, entg|EIF1AY:ccds|CCDS14795.1_1 17 A 9 @cCCCCCCCC entg|EIF1AY:ccds|CCDS14795.1_1 18 C 9 @aAAAAAAAA entg|EIF1AY:ccds|CCDS14795.1_1 19 C 10 @.,,,,,,,,, entg|EIF1AY:ccds|CCDS14795.1_1 20 A 10 @.,,,,,,,,, entg|EIF1AY:ccds|CCDS14795.1_1 21 A 10 @cCCCCCCCCC entg|EIF1AY:ccds|CCDS14795.1_1 22 C 10 @aAAAAAAAAA entg|EIF1AY:ccds|CCDS14795.1_1 23 C 10 @tAAAAAAAAA entg|EIF1AY:ccds|CCDS14795.1_1 24 C 11 @g,,,,,,,,,. entg|EIF1AY:ccds|CCDS14795.1_1 25 A 12 @.,,,,,,,,,., entg|EIF1AY:ccds|CCDS14795.1_1 26 A 12 @.,,,,,,,,,., entg|EIF1AY:ccds|CCDS14795.1_1 27 A 12 @tTTTTTTTTTtT entg|EIF1AY:ccds|CCDS14795.1_1 28 T 13 @cCCCCCCCCC.CC entg|EIF1AY:ccds|CCDS14795.1_1 29 G 14 @cCCCCCCCCCcCCt entg|EIF1AY:ccds|CCDS14795.1_1 30 T 14 @gGGGGGGGGGgGGg entg|EIF1AY:ccds|CCDS14795.1_1 31 C 15 @aAAAAAAAAAaAAaA entg|EIF1AY:ccds|CCDS14795.1_1 32 C 16 @.,,,,,,,,,.,,.,,
So I went ahead, and extracted one of the sequences contributing to the above pileup,
extracted it from the csfasta file, decoded it, matched manually the two sequence-strings
(one supplied by maq mapview, one by the conversion of csfasta to fasta)
Code:
ttggaaccaacccaaatgtccaacaatgatagactggattaagaaaatgcggcacatatacaccatgg TGAACCAACCCAAATGTCCAACAATGATAGACTGGATTAAGAAAATGTGAT GACACACAACAATCCGACacCATCgTTGgCGCAgtATAggaaatcccgt
While the manually csfasta-fasta converted sequence matches close to perfect - just what I'd expect.
So anyone an idea whats going on? Someone ought to have seen something alike since I did nothing more but follow the plot, using a unmodified maq-0.7.1, self comiled on Fedora 12 x86_64.
Ah, btw. bwa segfaults on the same dataset when trying to do the 'samse' step.
Best
-Jonathan
Comment