SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
Generating simulated paired end reads whaleberg General 3 05-21-2011 12:15 PM
MAQ simulated read header names MBekritsky Bioinformatics 0 11-18-2010 06:46 PM
Alignment Cover for Simulated Illumina Reads BertieWooster Bioinformatics 1 09-13-2010 07:15 PM
Run maq on solexa data with simulated qual scores AnamikaDarwin Bioinformatics 0 05-22-2009 10:52 AM
Sample/Simulated data for testing adaptor trimming hydkat Illumina/Solexa 0 12-03-2008 01:09 AM

Reply
 
Thread Tools
Old 03-10-2010, 12:47 AM   #21
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by KevinLam View Post
Thanks Nils!
Actually there's only one shell script so it's quite evident (my bad)
anyway i ran that

Code:
sh autogen.sh 
Preparing the dnaa build system...please wait

ERROR:  Unable to locate GNU Autoconf.

ERROR:  To prepare the dnaa build system from scratch,
        at least version 2.52 of GNU Autoconf must be installed.


autogen.sh does not need to be run on the same machine that will
run configure or make.  Either the GNU Autotools will need to be installed
or upgraded on this system, or autogen.sh must be run on the source
code on another system and then transferred to here. -- Cheers!

is it possible for you to include the autoconf files?
I do not have that installed on my system
Probably not the best idea to include autoconf with source code. If you mean the ./configure script, then it will be included in the releases (no release yet). You will have to either install the appropriate autoconf version or you can PM/email me and I would be happy to send you a tar-ball.
nilshomer is offline   Reply With Quote
Old 10-04-2010, 06:08 AM   #22
plichel
Junior Member
 
Location: Germany

Join Date: Mar 2010
Posts: 9
Default

To not to waste threads, does anybody know whether there is a read simulater that can sample also some known snps from, say, a dbsnp file or similar ?
Thanks !
plichel is offline   Reply With Quote
Old 04-03-2011, 09:42 PM   #23
srividya
Junior Member
 
Location: CHENNAI

Join Date: Sep 2010
Posts: 6
Default wgsim

Hello,

I am using wgsim to generate simulated reads of 76bp length(Solexa).

The fastq that is generated - Is it solexa fastq or sanger fastq ? Since there is no options to specify the fastq type required, I thought it to be Sanger. Is it correct?

Thanks,
Srividya
srividya is offline   Reply With Quote
Old 04-04-2011, 08:14 AM   #24
lcollado
Member
 
Location: Baltimore, MD

Join Date: Jun 2009
Posts: 65
Default

Hello srividya,

I don't know the answer, but you can find out using the ASCII table: http://es.wikipedia.org/wiki/ASCII

Solexa fastq (>= 1.3) won't have any values below 64. Meaning that numbers (48 to 57 in decimal ASCII) shouldn't appear in the quality lines of your fastq file.

Greetings,
Leonardo
__________________
L. Collado Torres, Ph.D. student in Biostatistics.
lcollado is offline   Reply With Quote
Old 04-04-2011, 06:16 PM   #25
srividya
Junior Member
 
Location: CHENNAI

Join Date: Sep 2010
Posts: 6
Default

Hello,

Thanks for the reply , all the reads had a quality score of 2. Now, I can safely consider them to be Sanger.

Thanks,
Srividya
srividya is offline   Reply With Quote
Old 04-06-2011, 09:14 AM   #26
lcollado
Member
 
Location: Baltimore, MD

Join Date: Jun 2009
Posts: 65
Default

No problem and I'm glad you were able to solve your question

Leo
__________________
L. Collado Torres, Ph.D. student in Biostatistics.
lcollado is offline   Reply With Quote
Old 06-03-2011, 12:43 PM   #27
tldgID
Member
 
Location: USA

Join Date: May 2011
Posts: 18
Question Questions regarding synthetic data generation

Hi all,

I found this thread about generating synthetic reads for Illumina platform and since I need to generate such synthetic data, I post my question here (as opposed to creating a new thread!).

1) is it possible to generate SE reads and not PE?

2) does anyone know the advantage/disanvantages of “wgsim” from SAMTOOLs vs. “dwgsim” from the DNAA package? What has been modified in dwgsim? it is not very clear to me, since the README file of DNAA package says that:
“This is a fork of the SAMtools wgsim, since certain assumptions are made that we do not agree with.”
what are these assumptions? What has been modified? Is there any publication that elaborates these issues?

3) is there any statistical consideration involved in the generation of the reads? e.g. larger genes on the genome get more reads? Or is there any distribution-related consideration while sheering the reference genome? is the errors distributed uniformly in both software?

4) any other recommendations for synthetic data generation?

Thank you for any help in advance
tldgID is offline   Reply With Quote
Old 06-03-2011, 04:28 PM   #28
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by tldgID View Post
Hi all,

I found this thread about generating synthetic reads for Illumina platform and since I need to generate such synthetic data, I post my question here (as opposed to creating a new thread!).

1) is it possible to generate SE reads and not PE?

2) does anyone know the advantage/disanvantages of “wgsim” from SAMTOOLs vs. “dwgsim” from the DNAA package? What has been modified in dwgsim? it is not very clear to me, since the README file of DNAA package says that:
“This is a fork of the SAMtools wgsim, since certain assumptions are made that we do not agree with.”
what are these assumptions? What has been modified? Is there any publication that elaborates these issues?

3) is there any statistical consideration involved in the generation of the reads? e.g. larger genes on the genome get more reads? Or is there any distribution-related consideration while sheering the reference genome? is the errors distributed uniformly in both software?

4) any other recommendations for synthetic data generation?

Thank you for any help in advance
1) Yes, specify "-2 0".
2) The fork was done to provide better color space (SOLiD) support, in particular to include the first color and adapter.
3) Random read placement, errors distributed according to the error rate.
nilshomer is offline   Reply With Quote
Old 06-06-2011, 08:25 AM   #29
tldgID
Member
 
Location: USA

Join Date: May 2011
Posts: 18
Default

Quote:
Originally Posted by nilshomer View Post
1) Yes, specify "-2 0".
2) The fork was done to provide better color space (SOLiD) support, in particular to include the first color and adapter.
3) Random read placement, errors distributed according to the error rate.

Thank you Nils!

About Q2: so, if I need Illumina-like synthetic data, it won't make a difference to use “wgsim” or “dwgsim”?

About Q3: can you elaborate more about “Random read placement”? My understanding is that the error rate is pre-specified, then when the reads are generated, in each position, the nt can be changed according to the error rate. Is this related to “Random read placement” or you meant something else?

Thanks again
tldgID is offline   Reply With Quote
Old 06-06-2011, 09:33 AM   #30
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Q2: there are a number of differences, including left-justification of indels and small bug fixes. You will notice differences and I encourage you test both out as I cannot predict all the differences.

Q3: a read's start position is randomly drawn from all possible start positions. Random errors are then introduced according to the per-base error rate.
nilshomer is offline   Reply With Quote
Old 06-06-2011, 10:11 AM   #31
tldgID
Member
 
Location: USA

Join Date: May 2011
Posts: 18
Default

Thanks Nils!
tldgID is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:56 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO