SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
Generating simulated paired end reads whaleberg General 3 05-21-2011 01:15 PM
MAQ simulated read header names MBekritsky Bioinformatics 0 11-18-2010 07:46 PM
Alignment Cover for Simulated Illumina Reads BertieWooster Bioinformatics 1 09-13-2010 08:15 PM
Run maq on solexa data with simulated qual scores AnamikaDarwin Bioinformatics 0 05-22-2009 11:52 AM
Sample/Simulated data for testing adaptor trimming hydkat Illumina/Solexa 0 12-03-2008 02:09 AM

Reply
 
Thread Tools
Old 11-18-2008, 02:24 AM   #1
foolishbrat
Member
 
Location: South East Asia

Join Date: Nov 2008
Posts: 44
Default Simulated Dataset of Solexa

Dear all,

Is there any resource in which we can download the synthetically
generated Solexa datasets? e.g. with "known" tags.

The aim is to test our algorithm of mapping tags in the genome.
We also want to evaluate the error correction model of tag count
with this simulated dataset.
foolishbrat is offline   Reply With Quote
Old 11-18-2008, 05:45 AM   #2
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 415
Default

Hi

You can try this program

http://www-ab.informatik.uni-tuebing...ftware/metasim

cheers
Colin
colindaven is offline   Reply With Quote
Old 11-19-2008, 02:36 AM   #3
foolishbrat
Member
 
Location: South East Asia

Join Date: Nov 2008
Posts: 44
Default

Thanks so much. I owe you one Collin.
foolishbrat is offline   Reply With Quote
Old 04-16-2009, 07:37 AM   #4
francesco.vezzi
Member
 
Location: Udine (Italy)

Join Date: Jan 2009
Posts: 50
Default

Hi,
I downloaded metasim but I discovered that by default I can simulate only Illumina reads that are not paired and of length 36. Do you know the parameters that have to be used in order to generate an acceptable coverage composed by paired end illumina reads of length greater then 50 bases?

Thanks
Francesco
francesco.vezzi is offline   Reply With Quote
Old 04-17-2009, 04:29 AM   #5
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 415
Default

Hi Franscesco

In short, no. This may stretch the capabilities of the program. Why not write to the authors - I am sure this kind of feature would be useful for a lot of people with all sorts of next gen sequencing read lengths coming out now.
Maybe they would be prepared to add a specific Illumina option.

cheers
Colin
colindaven is offline   Reply With Quote
Old 04-17-2009, 06:27 AM   #6
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

A lot of other groups have written various short read simulators. For example, both maq and samtools include reads simulators having the same code base. Maq's is able to learn error profile from a known fastq file but the read length may also be limited by the training data at the same time. The wgsim in samtools only generates uniform errors, but removes the limit of training data.

I know people from BGI and Gabor's group have also implemented good short read simulators.
lh3 is offline   Reply With Quote
Old 09-17-2009, 04:46 AM   #7
maria.b
Member
 
Location: Paris

Join Date: Sep 2009
Posts: 14
Default

Dear all,

I just download WGSIM from de SAMTools package but I didn't find any manual. I success to generate reads Fastq files but I don't understand how to control read length. And I don't understand why there is 2 output files. Are the reads paired-end reads or single-end reads?

Thanks

Maria
maria.b is offline   Reply With Quote
Old 09-17-2009, 10:49 AM   #8
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by maria.b View Post
Dear all,

I just download WGSIM from de SAMTools package but I didn't find any manual. I success to generate reads Fastq files but I don't understand how to control read length. And I don't understand why there is 2 output files. Are the reads paired-end reads or single-end reads?

Thanks

Maria
The output files are meant for BWA or MAQ, where each paired end is in a separate file. To get a list of options, including the options to control read length (-1 and -2), use ./wgsim -h. Let me know if you need a version of this simulator for other aligners (I forked this simulator in a similar package: DNAA).

Nils
nilshomer is offline   Reply With Quote
Old 09-23-2009, 09:00 AM   #9
polivares
Member
 
Location: Manchester, UK

Join Date: Jan 2009
Posts: 29
Default

Can someone please help me understand the parameters of wgsim. I am struggling to understand how changing the standard deviation [-s paramater; default value = 50] and the "outer distance between the two ends" [-d parameter; default value = 500] will affect the output of the simulation.

I am generating a synthetic library of genomic loci that vary in size. For instance I get coordinates a,b,c...z and for each position in the genome I generate a set of subsequences centred in the respective coordinate but with varying length.

The problem is that some of the subsequences retrieved by my script are ignored when using them as an input for wgsim.

I have played a bit around and found that the minimum length of an input sequence for wgsim must be (s x 3) + d. I can trick my script to generate sequences bigger than that value but I want to understand better what is the simulator doing.

Thanks in advance
polivares is offline   Reply With Quote
Old 09-28-2009, 12:16 PM   #10
Auction
Member
 
Location: california

Join Date: Jul 2009
Posts: 24
Default

I tried download page of DNAA at SF, http://sourceforge.net/projects/dnaa/files/.
But there is no file available.
Auction is offline   Reply With Quote
Old 09-28-2009, 01:00 PM   #11
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by Auction View Post
I tried download page of DNAA at SF, http://sourceforge.net/projects/dnaa/files/.
But there is no file available.
Ah, get the source code via git as there is not release yet:
Code:
git clone git://dnaa.git.sourceforge.net/gitroot/dnaa/dnaa
Nils
nilshomer is offline   Reply With Quote
Old 09-28-2009, 05:37 PM   #12
Auction
Member
 
Location: california

Join Date: Jul 2009
Posts: 24
Default

I've got it. Thanks
Auction is offline   Reply With Quote
Old 02-23-2010, 11:35 AM   #13
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Hi,

Can someone share experience using DNAA? How does it work!

We are working to generate a set of reads from given reference sequence (Mitochondria genome), and then map those reads using an aligner, and verify if the artificial SNPs were identified. Essentially, having the information in read header, where in the genome it was generated from, and having the knowledge of artificially inserted SNP positions is important.
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Old 02-23-2010, 03:23 PM   #14
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by bioinfosm View Post
Hi,

Can someone share experience using DNAA? How does it work!

We are working to generate a set of reads from given reference sequence (Mitochondria genome), and then map those reads using an aligner, and verify if the artificial SNPs were identified. Essentially, having the information in read header, where in the genome it was generated from, and having the knowledge of artificially inserted SNP positions is important.
As the main developer, it works great (ha)! Seriously though, there are many tools that I use frequently (some gems) that are released as is. I would be happy to add anyone as a developer. The tools include simulation code, SAM/BAM manipulation, SV detection tools, and pre and post alignment QC tools.

For generating reads from a simulated genome, it works quite well. The code is taken from Heng Li's "wgsim" found in samtools. I modified it to handle SOLiD data faithfully as well as model error rates by cycle/ligation (non-uniform error rates). Once the reads have been generated, you can run your favorite aligner. I then have a fast C-program evaluate your SAM/BAM file. Furthermore, if you SNP call with samtools, there is a PERL script to evaluate your pileup calls given the simulated variants.

Nils

Please leave your feedback on its usefulness

Nils
nilshomer is offline   Reply With Quote
Old 02-24-2010, 02:49 PM   #15
lcollado
Member
 
Location: Baltimore, MD

Join Date: Jun 2009
Posts: 65
Default

Hello,

Does the git command above still work? I've tried it a few times today with no luck:

Code:
$ git clone git://dnaa.git.sourceforge.net/gitroot/dnaa/dnaa
Initialized empty Git repository in /[my local path]/dnaa/.git/
dnaa.git.sourceforge.net[0: 216.34.181.91]: errno=Connection timed out
fatal: unable to connect a socket (Connection timed out)
Thanks,
Leonardo

PS I'll try later from home as I guess that it could be a local network issue.

Edit: It worked perfectly at home, so I guess that the port git uses is blocked at my workplace.
__________________
L. Collado Torres, Ph.D. student in Biostatistics.

Last edited by lcollado; 03-01-2010 at 08:24 PM.
lcollado is offline   Reply With Quote
Old 02-24-2010, 02:57 PM   #16
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by lcollado View Post
Hello,

Does the git command above still work? I've tried it a few times today with no luck:

Code:
$ git clone git://dnaa.git.sourceforge.net/gitroot/dnaa/dnaa
Initialized empty Git repository in /[my local path]/dnaa/.git/
dnaa.git.sourceforge.net[0: 216.34.181.91]: errno=Connection timed out
fatal: unable to connect a socket (Connection timed out)
Thanks,
Leonardo

PS I'll try later from home as I guess that it could be a local network issue.
Just tested, it works.
nilshomer is offline   Reply With Quote
Old 03-01-2010, 02:58 PM   #17
kbushley
Member
 
Location: Oregon

Join Date: Jan 2010
Posts: 22
Default

Simulating longer PE Illumina reads,

Hi. I finally found this after searching a bit...on the metasim website there are two configuration files for both 60 and 80 pb PE illumina reads...basically, these contain all the parameters for Illumina PE error models and you can upload it as a configuration file. Hope that helps.


kathryn
kbushley is offline   Reply With Quote
Old 03-10-2010, 01:00 AM   #18
KevinLam
Senior Member
 
Location: SEA

Join Date: Nov 2009
Posts: 197
Default

Quote:
Originally Posted by nilshomer View Post
Ah, get the source code via git as there is not release yet:
Code:
git clone git://dnaa.git.sourceforge.net/gitroot/dnaa/dnaa
Nils
is it missing some files?
Just did a git clone
but I can't configure / make


$ ./configure
bash: ./configure: No such file or directory
KevinLam is offline   Reply With Quote
Old 03-10-2010, 01:36 AM   #19
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by KevinLam View Post
is it missing some files?
Just did a git clone
but I can't configure / make


$ ./configure
bash: ./configure: No such file or directory
try this before configure:
Code:
sh autogen.sh
I have updated the INSTALL to include this step. Thanks for spotting the poor documentation.
nilshomer is offline   Reply With Quote
Old 03-10-2010, 01:40 AM   #20
KevinLam
Senior Member
 
Location: SEA

Join Date: Nov 2009
Posts: 197
Default

Thanks Nils!
Actually there's only one shell script so it's quite evident (my bad)
anyway i ran that

Code:
sh autogen.sh 
Preparing the dnaa build system...please wait

ERROR:  Unable to locate GNU Autoconf.

ERROR:  To prepare the dnaa build system from scratch,
        at least version 2.52 of GNU Autoconf must be installed.


autogen.sh does not need to be run on the same machine that will
run configure or make.  Either the GNU Autotools will need to be installed
or upgraded on this system, or autogen.sh must be run on the source
code on another system and then transferred to here. -- Cheers!

is it possible for you to include the autoconf files?
I do not have that installed on my system
KevinLam is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:02 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO