SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Generic SNP/Indel Simulator for NGS data peveralldubois Bioinformatics 4 05-02-2013 04:12 AM
dwgsim read orientation av_d Bioinformatics 3 08-15-2011 08:48 PM
DWGSIM: whole genome NGS simulator nilshomer Bioinformatics 0 08-14-2011 04:50 PM
dwgsim usage gprakhar Bioinformatics 18 07-12-2011 08:59 AM
dwgsim -> readnames and quality scores genome Bioinformatics 1 02-16-2011 06:17 PM

Closed Thread
 
Thread Tools
Old 09-06-2011, 05:57 PM   #1
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default DWGSIM 0.1.4: whole genome NGS simulator

We are pleased to release DWGSIM version 0.1.4.

New features include:
- support for Ion Torrent data
- more metrics to focus on within dwgsim_eval
- set the random seed to generate deterministic variants
- can input a mutations.txt file or a bed-like file to specify the variants to simulate (the -m or -b option)
- new mutations.txt format to specify mutation strand, not just ploidy

This release also includes bug fixes:
- some insertions were not always left-justified
- command line option checking
- better usage

Please see the Documentation.

Last edited by nilshomer; 09-06-2011 at 06:12 PM.
nilshomer is offline  
Old 11-16-2011, 02:19 AM   #2
TiborNagy
Senior Member
 
Location: Budapest

Join Date: Mar 2010
Posts: 329
Default

Hi,

Could You show me an example how could I generate IonTorrent reads? I have try the following commands (Version: 0.1.8):

dwgsim -c 2 -B -f auto reference.fasta iontorrent/testdata
dwgsim with these settings wait forever.

dwgsim -c 2 -B -f force brca1.fasta iontorrent/masodik
dwgsim with these settings give me an error message:
[dwgsim_core] Updating error rate for end 1
[dwgsim_core] 0dwgsim: src/dwgsim.c:271: generate_errors_flows: Assertion `opt->flow_order_len != i' failed.
Aborted

Thanks
TiborNagy is offline  
Old 11-16-2011, 04:56 AM   #3
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by TiborNagy View Post
Hi,

Could You show me an example how could I generate IonTorrent reads? I have try the following commands (Version: 0.1.8):

dwgsim -c 2 -B -f auto reference.fasta iontorrent/testdata
dwgsim with these settings wait forever.

dwgsim -c 2 -B -f force brca1.fasta iontorrent/masodik
dwgsim with these settings give me an error message:
[dwgsim_core] Updating error rate for end 1
[dwgsim_core] 0dwgsim: src/dwgsim.c:271: generate_errors_flows: Assertion `opt->flow_order_len != i' failed.
Aborted

Thanks
I think you forgot to include the flow order (-f) option. I am going to add some better command line checking. Also, it looks like you are generating paired reads (use -2 0 to turn that off).

So try (assuming a TACG flow order):
dwgsim -2 0 -c 2 -B -f TACG force brca1.fasta iontorrent/masodi
nilshomer is offline  
Old 11-16-2011, 05:05 AM   #4
TiborNagy
Senior Member
 
Location: Budapest

Join Date: Mar 2010
Posts: 329
Default

Thank You. The flow order string was the main problem.
TiborNagy is offline  
Old 04-17-2012, 09:27 PM   #5
chintanspy
Member
 
Location: India

Join Date: Sep 2010
Posts: 16
Default no output for ion torrent data

I am trying to generate data for ion torrent.

I tried following command:
dwgsim -2 0 -c 2 -B -f ATGC genome/e_coli_K12_DH10B.fasta sim_SE_ion

Output:
[dwgsim_core] Updating error rate for end 1
[dwgsim_core] 1000000
[dwgsim_core] Updated with scaling factor 0.45297!
[dwgsim_core] Escherichia_coli_K-12_DH10B length: 4686137
[dwgsim_core] 1 sequences, total length: 4686137
[dwgsim_core] Currently on:
[dwgsim_core] 7046823
[dwgsim_core] Complete!

Files:
-rw-r--r-- 1 root root 0 2012-04-17 21:43 sim_SE_ion.bfast.fastq
-rw-r--r-- 1 root root 0 2012-04-17 21:43 sim_SE_ion.bwa.read1.fastq
-rw-r--r-- 1 root root 0 2012-04-17 21:41 sim_SE_ion.bwa.read2.fastq
-rw-r--r-- 1 root root 0 2012-04-17 21:43 sim_SE_ion.mutations.txt

my ouput files are empty can you explain why is it happening so?
chintanspy is offline  
Old 04-18-2012, 01:02 AM   #6
chintanspy
Member
 
Location: India

Join Date: Sep 2010
Posts: 16
Default version 1.11 vs 1.10

I forgot to mention the version of dwgsim which i used...it was 1.11..i had downloaded from git

Now, i downloaded version 1.10. from sourceforge.net, to my surprise I am getting data for ion torrent and not for illumina and SOLiD.

Can someone explain what is the problem???
__________________
Regards,
Chintan Vora
chintanspy is offline  
Old 04-18-2012, 01:52 AM   #7
chintanspy
Member
 
Location: India

Join Date: Sep 2010
Posts: 16
Default question solved

Finally, got the answer...it was the 'hard disk space' issue.

But i had doubt..as in....how is illumina data diffferent from ion torrent data??
Both the outputs are in fastq and basespace

How is the flow order(for ion torrent data) implemented to generate ion torrent data?
__________________
Regards,
Chintan Vora
chintanspy is offline  
Old 04-18-2012, 10:41 AM   #8
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Check out the differences in the technology to understand why Ion Torrent is different from Illumina. You can even download public datasets that each company provides. Understanding the technology will help you understand flow order and the like.
nilshomer is offline  
Old 04-19-2012, 12:56 AM   #9
chintanspy
Member
 
Location: India

Join Date: Sep 2010
Posts: 16
Default

I guess i was not clear about my question.......

I have known the difference between the 2 technologies

I wanted to know how the 'flow order' information used to generate simulated data?

Sorry if I am still unclear.
__________________
Regards,
Chintan Vora
chintanspy is offline  
Old 04-19-2012, 06:30 AM   #10
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by chintanspy View Post
I guess i was not clear about my question.......

I have known the difference between the 2 technologies

I wanted to know how the 'flow order' information used to generate simulated data?

Sorry if I am still unclear.
454 and Ion Torrent produce estimates of hompolymers, one for each flow. The errors occur when these homopolymers are misestimated. So for Ion Torrent data, dwgsim introduces errors by misestimating the hompolymer length.
nilshomer is offline  
Old 04-19-2012, 10:56 PM   #11
chintanspy
Member
 
Location: India

Join Date: Sep 2010
Posts: 16
Default

Quote:
Originally Posted by nilshomer View Post
454 and Ion Torrent produce estimates of hompolymers, one for each flow. The errors occur when these homopolymers are misestimated. So for Ion Torrent data, dwgsim introduces errors by misestimating the hompolymer length.
You mean to say the Carry Forward and Incomplete Extension errors are introduced, right?
it is also depended on the length of the read, right?

Also, does -f option take pattern for flow order ?
__________________
Regards,
Chintan Vora
chintanspy is offline  
Old 04-19-2012, 11:26 PM   #12
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by chintanspy View Post
You mean to say the Carry Forward and Incomplete Extension errors are introduced, right?
it is also depended on the length of the read, right?

Also, does -f option take pattern for flow order ?
CAFIE is not explicitly modeled, no, but a more simple error model (overcall/undercall) is used. The error rate can be adjusted to increase/decrease across the read. Finally, any flow order can be used as long as it has at least one instance of each nucleotide.
nilshomer is offline  
Old 04-20-2012, 01:55 AM   #13
chintanspy
Member
 
Location: India

Join Date: Sep 2010
Posts: 16
Default

Thank you for quick replies. Things are pretty much clear.

There is an option to generate strand specific reads:

-S INT generate reads [0]:
0: default (opposite strand for Illumina, same strand for SOLiD/Ion Torrent)
1: same strand (mate pair)
2: opposite strand (paired end)

The default is '0' which means it should generate reads from same strand for Ion torrent. I am getting the reads from opposite strand as well.

I tried the following command
dwgsim -1 30 -e 0 -E 0 -r 0 -X 0 -y 0 -R 0 -2 0 -c 2 -B -f ATGC test.fa ionData/test

my test.fa had following sequence
AAAATGCAAAATCTGAAAAAACGTTTTGGGAAAAAAAAAA

####Output reads
count Reads
8 AAAATCTGAAAAAACGTTTTGGGAAAAAAA
9 AAAATGCAAAATCTGAAAAAACGTTTTGGG
2 AAATCTGAAAAAACGTTTTGGGAAAAAAAA
7 AAATGCAAAATCTGAAAAAACGTTTTGGGA
8 AATCTGAAAAAACGTTTTGGGAAAAAAAAA
8 AATGCAAAATCTGAAAAAACGTTTTGGGAA
7 ATCTGAAAAAACGTTTTGGGAAAAAAAAAA
4 ATGCAAAATCTGAAAAAACGTTTTGGGAAA
6 CAAAATCTGAAAAAACGTTTTGGGAAAAAA
2 CCCAAAACGTTTTTTCAGATTTTGCATTTT
9 GCAAAATCTGAAAAAACGTTTTGGGAAAAA
7 TCCCAAAACGTTTTTTCAGATTTTGCATTT
9 TGCAAAATCTGAAAAAACGTTTTGGGAAAA
5 TTCCCAAAACGTTTTTTCAGATTTTGCATT
5 TTTCCCAAAACGTTTTTTCAGATTTTGCAT
7 TTTTCCCAAAACGTTTTTTCAGATTTTGCA
7 TTTTTCCCAAAACGTTTTTTCAGATTTTGC
3 TTTTTTCCCAAAACGTTTTTTCAGATTTTG
2 TTTTTTTCCCAAAACGTTTTTTCAGATTTT
5 TTTTTTTTCCCAAAACGTTTTTTCAGATTT
8 TTTTTTTTTCCCAAAACGTTTTTTCAGATT
5 TTTTTTTTTTCCCAAAACGTTTTTTCAGAT

can you please explain why is it happening so?
__________________
Regards,
Chintan Vora
chintanspy is offline  
Old 04-20-2012, 03:44 PM   #14
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Two things:
1. Could you try the latest code here: https://github.com/nh13/dwgsim. See commit: 6c33e0e5c64c816a5b95c7eac155eef2b4b8155c
2. Can you post two reads in your FASTQ that map on opposite strands?
nilshomer is offline  
Old 04-20-2012, 11:18 PM   #15
chintanspy
Member
 
Location: India

Join Date: Sep 2010
Posts: 16
Default

Quote:
Originally Posted by nilshomer View Post
1. Could you try the latest code here: https://github.com/nh13/dwgsim. See commit: 6c33e0e5c64c816a5b95c7eac155eef2b4b8155c
I will try and let you know the output.

Quote:
Originally Posted by nilshomer View Post
2. Can you post two reads in your FASTQ that map on opposite strands?
The following reads map on to opposite strand
CCCAAAACGTTTTTTCAGATTTTGCATTTT
TCCCAAAACGTTTTTTCAGATTTTGCATTT
TTCCCAAAACGTTTTTTCAGATTTTGCATT
TTTCCCAAAACGTTTTTTCAGATTTTGCAT
TTTTCCCAAAACGTTTTTTCAGATTTTGCA
TTTTTCCCAAAACGTTTTTTCAGATTTTGC
TTTTTTCCCAAAACGTTTTTTCAGATTTTG
TTTTTTTCCCAAAACGTTTTTTCAGATTTT
TTTTTTTTCCCAAAACGTTTTTTCAGATTT
TTTTTTTTTCCCAAAACGTTTTTTCAGATT
TTTTTTTTTTCCCAAAACGTTTTTTCAGAT
__________________
Regards,
Chintan Vora
chintanspy is offline  
Old 04-21-2012, 08:23 AM   #16
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by chintanspy View Post
I will try and let you know the output.



The following reads map on to opposite strand
CCCAAAACGTTTTTTCAGATTTTGCATTTT
TCCCAAAACGTTTTTTCAGATTTTGCATTT
TTCCCAAAACGTTTTTTCAGATTTTGCATT
TTTCCCAAAACGTTTTTTCAGATTTTGCAT
TTTTCCCAAAACGTTTTTTCAGATTTTGCA
TTTTTCCCAAAACGTTTTTTCAGATTTTGC
TTTTTTCCCAAAACGTTTTTTCAGATTTTG
TTTTTTTCCCAAAACGTTTTTTCAGATTTT
TTTTTTTTCCCAAAACGTTTTTTCAGATTT
TTTTTTTTTCCCAAAACGTTTTTTCAGATT
TTTTTTTTTTCCCAAAACGTTTTTTCAGAT
Can you show the pairs in FASTQ format? I need to see the read names as well.
nilshomer is offline  
Old 04-22-2012, 08:43 PM   #17
chintanspy
Member
 
Location: India

Join Date: Sep 2010
Posts: 16
Default

Hi nils,

I tried the latest code with commit "6c33e0e5c64c816a5b95c7eac155eef2b4b8155c".
I have attached the outputs of the old and latest code

each zip file consists of 2 files:
1. *.bwa.read1.fastq
2. *.sam

I am still getting the same output as the previous code.
Attached Files
File Type: zip old_ver_dwgsim_output.zip (2.3 KB, 1 views)
File Type: zip new_ver_dwgsim_output.zip (2.3 KB, 2 views)
__________________
Regards,
Chintan Vora
chintanspy is offline  
Old 04-23-2012, 10:43 PM   #18
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

You attached only the first read in each pair. The second end is found in "test_new_v.bwa.read2.fastq".

The "-S" option controls how pairs are related (same strand/opposite strand). This doesn't preclude the individual reads to mapping to the reverse strand. Perhaps this is where the confusion lies?
nilshomer is offline  
Old 04-23-2012, 11:25 PM   #19
chintanspy
Member
 
Location: India

Join Date: Sep 2010
Posts: 16
Default

Quote:
Originally Posted by nilshomer View Post
You attached only the first read in each pair. The second end is found in "test_new_v.bwa.read2.fastq".

The "-S" option controls how pairs are related (same strand/opposite strand). This doesn't preclude the individual reads to mapping to the reverse strand. Perhaps this is where the confusion lies?
Yes, you are right...mapping is done on both the strands.

my *.bwa.read2.fastq file is empty as I had used "-2 0".

So "-S" option says "default (opposite strand for Illumina, same strand for SOLiD/Ion Torrent)". For SOLiD/Ion Torrent, reads have to be from same strand, right?

The command that I used is
dwgsim -1 30 -e 0 -E 0 -r 0 -X 0 -y 0 -R 0 -2 0 -c 2 -B -f ATGC test.fa ionData/test

From the above command I expect
1. no errors
2. no mutations (no indels)
3. no reads from opposite strand

I get reads without errors and mutations but there are reads from opposite strand.
My question is why am I getting the reads from negative strand ?

__________________
Regards,
Chintan Vora
chintanspy is offline  
Old 04-23-2012, 11:42 PM   #20
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

You are getting reads mapping to the negative strand since the simulator draws reads from either strand. The "-S" option does not affect this behavior. Again, there is no way to specify to have individual templates maps only to one strand or the other. This is drawn randomly (0.5 probability).

The "-S" option specifies the strand relationship between the first end and second end. In this case you have no second end ("-2 0"), so this option has no effect. In the case you had "-2 100" or the like, the "-S" option would say are the first end and second end always on the same strand? For example, if the first end maps to the reverse strand then with "-S 1" the second end would also map to the reverse strand. Perhaps you should search the forum on the meaning of paired ends and mate pairs, since they are used in the description of "-S 1" and "-S 2".
nilshomer is offline  
Closed Thread

Tags
dwgsim, fastq, simulation

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:01 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO