View Single Post
Old 06-03-2011, 01:43 PM   #27
Location: USA

Join Date: May 2011
Posts: 18
Question Questions regarding synthetic data generation

Hi all,

I found this thread about generating synthetic reads for Illumina platform and since I need to generate such synthetic data, I post my question here (as opposed to creating a new thread!).

1) is it possible to generate SE reads and not PE?

2) does anyone know the advantage/disanvantages of “wgsim” from SAMTOOLs vs. “dwgsim” from the DNAA package? What has been modified in dwgsim? it is not very clear to me, since the README file of DNAA package says that:
“This is a fork of the SAMtools wgsim, since certain assumptions are made that we do not agree with.”
what are these assumptions? What has been modified? Is there any publication that elaborates these issues?

3) is there any statistical consideration involved in the generation of the reads? e.g. larger genes on the genome get more reads? Or is there any distribution-related consideration while sheering the reference genome? is the errors distributed uniformly in both software?

4) any other recommendations for synthetic data generation?

Thank you for any help in advance
tldgID is offline   Reply With Quote