Seqanswers Leaderboard Ad

**MBekritsky** · 05-04-2011, 07:55 PM

Hi,

I ran into a similar problem a few months back. My memory is a bit fuzzy, but if I recall correctly, wgsim only simulates read data, it doesn't do anything to simulate quality scores.

In order to get simulated quality scores as well, I switched to maq simulate. If you give it a sequence file from some NGS data (e.g. a run of paired-end sequence), it will create synthetic quality scores based on the quality scores from your NGS data. I think it uses a Markov process to generate the quality lines.

Hope this helps!

**shuhongck** · 05-09-2011, 08:23 AM

Originally posted by MBekritsky View Post

Hi,

I ran into a similar problem a few months back. My memory is a bit fuzzy, but if I recall correctly, wgsim only simulates read data, it doesn't do anything to simulate quality scores.

In order to get simulated quality scores as well, I switched to maq simulate. If you give it a sequence file from some NGS data (e.g. a run of paired-end sequence), it will create synthetic quality scores based on the quality scores from your NGS data. I think it uses a Markov process to generate the quality lines.

Hope this helps!

Thank you very much. This information is very helpful for me.

I tried using Maq to simulate E.coli K12 illumina sequencing reads, and I noticed that I need a simupars.dat file to simluate the data. According to the help manual, I can get *simpuars.dat* from excuting "simutrain" or from the Maq website, but I didn't find any related files on the Maq download page. On the other hand, I don't have E.coli K12 illumina real data to generate the simpuars.dat.

Does anyone can help me to figure out the issue? Thanks in advance!

**MBekritsky** · 05-10-2011, 04:15 AM

Hi again,

In my experience, the data file doesn't need to be a run from the same species, since all simutrain does (I think) is calculate quality score frequencies by read position. My suggestion is to find some recent sequencing data from the same machine with the same read length of the samples you'll be submitting and use that for simutrain. This way if there's any quirks or any quality score phenomena intrinsic to the machine you'll be using, it'll be something you may be able to catch at the simulation stage.

In my opinion, you would be better served by using any real data from the machine you'll be using for sequencing than by trying to find E. coli Illumina sequence from a different machine. As a case in point, I've used cancer sequencing data to train MAQ simulate for simulations on "normal" data. As far as I can tell, there was nothing in simutrain that biased my results.

**shuhongck** · 05-12-2011, 11:37 PM

Thanks your advices. I will try it.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 10 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

The problem of wgsim simulator

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News