Hi, have finally got my hands on some ECC SOLiD data. Encouragingly it mapped really well with Lifescope (91.1% reads mapped, 75 bp reads mapped to hg19). But with other mappers it is a very different story...
Using AB's convertFromXSQ.sh script it's possible to extract both the ECC generated basespace read data and underlying colourspace reads. However, when both of these are mapped separately using alternative third party mappers I am consistently finding the basespace reads map much POORER than their respective colourspace sequences.
For example comparing Lifescope (v2.5) with BWA (v. 5.9) and Bowtie1 (v0.12.7) I get the following results:
Outcome of mapping 142,102,059 ECC generated 75 bp reads:
--------------------------------------------------------------
Lifescope --- 129,530,059 reads mapped (91.1%)
BWA CS --- 77,511,121 reads mapped (54.5%)
BWA BS --- 55,231,727 reads mapped (38.9%)
Bowtie CS --- 105,120,949 reads mapped (74.0%)
Bowtie BS --- 65,657,080 reads mapped (46.2%)
--------------------------------------------------------------
i.e., Both BWA and Bowtie maps the ECC basespace version of the reads considerably poorer than the equivalent colourspace mapping.
These differences can also be seen in the figure below summarising coverage distributions for the various mappings. (Note how, at least for BWA, mean coverage varies whilst median stays largely the same suggesting that, at least for BWA mapping, there are fewer extremes in coverage over the genome - but for Bowtie, the distributions are quite different: colourspace is clearly better than its ECC basespace equivalent when using Bowtie - i.e., more reads are mapping throughput the chromosome - in fact the distribution, at least for chr 1, is not markedly different to that of Lifescope).
Has anyone else been handling EEC generated SOLiD data and have seen the same phenomenon? Does anyone have any suggestions as to why this is happening, given, naively I had assumed the ECC basespace output would, if anything, be more accurate than it's corresponding colourspace reads. Certainly that improvements IS seen with Lifescope (v2.5) but not with alternative mappers (or at least with the two considered so far).
Using AB's convertFromXSQ.sh script it's possible to extract both the ECC generated basespace read data and underlying colourspace reads. However, when both of these are mapped separately using alternative third party mappers I am consistently finding the basespace reads map much POORER than their respective colourspace sequences.
For example comparing Lifescope (v2.5) with BWA (v. 5.9) and Bowtie1 (v0.12.7) I get the following results:
Outcome of mapping 142,102,059 ECC generated 75 bp reads:
--------------------------------------------------------------
Lifescope --- 129,530,059 reads mapped (91.1%)
BWA CS --- 77,511,121 reads mapped (54.5%)
BWA BS --- 55,231,727 reads mapped (38.9%)
Bowtie CS --- 105,120,949 reads mapped (74.0%)
Bowtie BS --- 65,657,080 reads mapped (46.2%)
--------------------------------------------------------------
i.e., Both BWA and Bowtie maps the ECC basespace version of the reads considerably poorer than the equivalent colourspace mapping.
These differences can also be seen in the figure below summarising coverage distributions for the various mappings. (Note how, at least for BWA, mean coverage varies whilst median stays largely the same suggesting that, at least for BWA mapping, there are fewer extremes in coverage over the genome - but for Bowtie, the distributions are quite different: colourspace is clearly better than its ECC basespace equivalent when using Bowtie - i.e., more reads are mapping throughput the chromosome - in fact the distribution, at least for chr 1, is not markedly different to that of Lifescope).
Has anyone else been handling EEC generated SOLiD data and have seen the same phenomenon? Does anyone have any suggestions as to why this is happening, given, naively I had assumed the ECC basespace output would, if anything, be more accurate than it's corresponding colourspace reads. Certainly that improvements IS seen with Lifescope (v2.5) but not with alternative mappers (or at least with the two considered so far).
Comment