SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
Looking for the right WGS simulator oiiio Bioinformatics 5 07-20-2012 10:59 AM
DWGSIM 0.1.4: whole genome NGS simulator nilshomer Bioinformatics 43 07-15-2012 06:46 AM
wgsim output efoss Bioinformatics 3 12-02-2011 12:35 AM
mutation report from wgsim is wierd. dis Bioinformatics 0 05-20-2011 06:26 PM
Nexgen simulator. aloliveira General 1 02-14-2011 08:57 AM

Reply
 
Thread Tools
Old 04-26-2011, 01:16 AM   #1
shuhongck
Junior Member
 
Location: Tainwan

Join Date: Mar 2011
Posts: 4
Default The problem of wgsim simulator

Dear all

I used wgsim to simulate illumina reads, and I got read1 and read2 fastq files.

But all bases of quality scores are "2"

222222222222222222222222222222222222222222222222222222222222222222222222222
@gi|49175990|ref|NC_000913.2|_26305_26624_0:2:0_2:0:0_b/1
GTTTTTGTGCCGGTGTAGACCGCGCTATCAGCATTGTTGAAAACGCGCTGGCCATTTGCGGCGCACCGATATATG
+
222222222222222222222222222222222222222222222222222222222222222222222222222
@gi|49175990|ref|NC_000913.2|_967853_968177_1:7:3_2:3:0_c/1
TGACGATTACCGCATAAACCGACTTTAAGCACCCCGCTCGCTAACGCATACGCCCCGCCGGCAACCACCAGCCAT


What's wrong with my process
~/Simulation/samtools-wgsim$ wgsim -d 350 -s 30 -N 70000 -1 75 -2 75 /Simulation out.read1.fq out.read2.fq

or the situation is normal ?
shuhongck is offline   Reply With Quote
Old 05-04-2011, 07:55 PM   #2
MBekritsky
Member
 
Location: CSHL

Join Date: Nov 2009
Posts: 15
Default

Hi,

I ran into a similar problem a few months back. My memory is a bit fuzzy, but if I recall correctly, wgsim only simulates read data, it doesn't do anything to simulate quality scores.

In order to get simulated quality scores as well, I switched to maq simulate. If you give it a sequence file from some NGS data (e.g. a run of paired-end sequence), it will create synthetic quality scores based on the quality scores from your NGS data. I think it uses a Markov process to generate the quality lines.

Hope this helps!
MBekritsky is offline   Reply With Quote
Old 05-09-2011, 08:23 AM   #3
shuhongck
Junior Member
 
Location: Tainwan

Join Date: Mar 2011
Posts: 4
Default

Quote:
Originally Posted by MBekritsky View Post
Hi,

I ran into a similar problem a few months back. My memory is a bit fuzzy, but if I recall correctly, wgsim only simulates read data, it doesn't do anything to simulate quality scores.

In order to get simulated quality scores as well, I switched to maq simulate. If you give it a sequence file from some NGS data (e.g. a run of paired-end sequence), it will create synthetic quality scores based on the quality scores from your NGS data. I think it uses a Markov process to generate the quality lines.

Hope this helps!
Thank you very much. This information is very helpful for me.

I tried using Maq to simulate E.coli K12 illumina sequencing reads, and I noticed that I need a simupars.dat file to simluate the data. According to the help manual, I can get *simpuars.dat* from excuting "simutrain" or from the Maq website, but I didn't find any related files on the Maq download page. On the other hand, I don't have E.coli K12 illumina real data to generate the simpuars.dat.

Does anyone can help me to figure out the issue? Thanks in advance!
shuhongck is offline   Reply With Quote
Old 05-10-2011, 04:15 AM   #4
MBekritsky
Member
 
Location: CSHL

Join Date: Nov 2009
Posts: 15
Default

Hi again,

In my experience, the data file doesn't need to be a run from the same species, since all simutrain does (I think) is calculate quality score frequencies by read position. My suggestion is to find some recent sequencing data from the same machine with the same read length of the samples you'll be submitting and use that for simutrain. This way if there's any quirks or any quality score phenomena intrinsic to the machine you'll be using, it'll be something you may be able to catch at the simulation stage.

In my opinion, you would be better served by using any real data from the machine you'll be using for sequencing than by trying to find E. coli Illumina sequence from a different machine. As a case in point, I've used cancer sequencing data to train MAQ simulate for simulations on "normal" data. As far as I can tell, there was nothing in simutrain that biased my results.
MBekritsky is offline   Reply With Quote
Old 05-12-2011, 11:37 PM   #5
shuhongck
Junior Member
 
Location: Tainwan

Join Date: Mar 2011
Posts: 4
Default

Thanks your advices. I will try it.
shuhongck is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:34 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO