Hi everyone,
one day of intensively searching the web for tools that simulate next generation sequencing reads ended in the following result. There is a lot of software, but as always none of them is able to generate the data one needs. What i am looking for is some code that is able to
- produce Illumina (single end) reads from a reference sequence,
- give the user the ability to adjust the lengths of the reads and the read coverage,
- estimate the base quality scores (this is very important), and
- output the data in an appropriate format, like fastq
The first tool i tested was RMAP (http://rulai.cshl.edu/rmap/), which is able to produce single and paired end reads at variable lengths and scores. I tried it, but at the end it turned out, that the scores are constant.
Next i had a look on ART (http://bioinformatics.joyhz.com/ART/). At the first glance it looks very promising, providing the simulation of single and paired end reads at variable lengths and also the calculation of base scores. Perfect! But again there is a drawback. The reads must not exceed a maximum length of 36 bp (i need 76 bp).
Afterwards i saw that maq (http://maq.sourceforge.net/maq-man.shtml) is able to simulate illumina reads, and also to estimate base qualities, but unfortunately this is only possible for paired end reads.
The last tool i tried was metasim (http://www-ab.informatik.uni-tuebing...ftware/metasim) and guess what! This tool is not able to estimate base scores, nor must the reads exceed a total length of 36 bp.
So, before i start to write a new tool, that is able to fullfill my needs, is anyone out there who knows a program for simulating single end illumina reads at variable lengths?
one day of intensively searching the web for tools that simulate next generation sequencing reads ended in the following result. There is a lot of software, but as always none of them is able to generate the data one needs. What i am looking for is some code that is able to
- produce Illumina (single end) reads from a reference sequence,
- give the user the ability to adjust the lengths of the reads and the read coverage,
- estimate the base quality scores (this is very important), and
- output the data in an appropriate format, like fastq
The first tool i tested was RMAP (http://rulai.cshl.edu/rmap/), which is able to produce single and paired end reads at variable lengths and scores. I tried it, but at the end it turned out, that the scores are constant.
Next i had a look on ART (http://bioinformatics.joyhz.com/ART/). At the first glance it looks very promising, providing the simulation of single and paired end reads at variable lengths and also the calculation of base scores. Perfect! But again there is a drawback. The reads must not exceed a maximum length of 36 bp (i need 76 bp).
Afterwards i saw that maq (http://maq.sourceforge.net/maq-man.shtml) is able to simulate illumina reads, and also to estimate base qualities, but unfortunately this is only possible for paired end reads.
The last tool i tried was metasim (http://www-ab.informatik.uni-tuebing...ftware/metasim) and guess what! This tool is not able to estimate base scores, nor must the reads exceed a total length of 36 bp.
So, before i start to write a new tool, that is able to fullfill my needs, is anyone out there who knows a program for simulating single end illumina reads at variable lengths?
Comment