Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DWGSIM 0.1.4: whole genome NGS simulator

    We are pleased to release DWGSIM version 0.1.4.

    New features include:
    - support for Ion Torrent data
    - more metrics to focus on within dwgsim_eval
    - set the random seed to generate deterministic variants
    - can input a mutations.txt file or a bed-like file to specify the variants to simulate (the -m or -b option)
    - new mutations.txt format to specify mutation strand, not just ploidy

    This release also includes bug fixes:
    - some insertions were not always left-justified
    - command line option checking
    - better usage

    Please see the Documentation.
    Last edited by nilshomer; 09-06-2011, 06:12 PM.

  • #2
    Hi,

    Could You show me an example how could I generate IonTorrent reads? I have try the following commands (Version: 0.1.8):

    dwgsim -c 2 -B -f auto reference.fasta iontorrent/testdata
    dwgsim with these settings wait forever.

    dwgsim -c 2 -B -f force brca1.fasta iontorrent/masodik
    dwgsim with these settings give me an error message:
    [dwgsim_core] Updating error rate for end 1
    [dwgsim_core] 0dwgsim: src/dwgsim.c:271: generate_errors_flows: Assertion `opt->flow_order_len != i' failed.
    Aborted

    Thanks

    Comment


    • #3
      Originally posted by TiborNagy View Post
      Hi,

      Could You show me an example how could I generate IonTorrent reads? I have try the following commands (Version: 0.1.8):

      dwgsim -c 2 -B -f auto reference.fasta iontorrent/testdata
      dwgsim with these settings wait forever.

      dwgsim -c 2 -B -f force brca1.fasta iontorrent/masodik
      dwgsim with these settings give me an error message:
      [dwgsim_core] Updating error rate for end 1
      [dwgsim_core] 0dwgsim: src/dwgsim.c:271: generate_errors_flows: Assertion `opt->flow_order_len != i' failed.
      Aborted

      Thanks
      I think you forgot to include the flow order (-f) option. I am going to add some better command line checking. Also, it looks like you are generating paired reads (use -2 0 to turn that off).

      So try (assuming a TACG flow order):
      dwgsim -2 0 -c 2 -B -f TACG force brca1.fasta iontorrent/masodi

      Comment


      • #4
        Thank You. The flow order string was the main problem.

        Comment


        • #5
          no output for ion torrent data

          I am trying to generate data for ion torrent.

          I tried following command:
          dwgsim -2 0 -c 2 -B -f ATGC genome/e_coli_K12_DH10B.fasta sim_SE_ion

          Output:
          [dwgsim_core] Updating error rate for end 1
          [dwgsim_core] 1000000
          [dwgsim_core] Updated with scaling factor 0.45297!
          [dwgsim_core] Escherichia_coli_K-12_DH10B length: 4686137
          [dwgsim_core] 1 sequences, total length: 4686137
          [dwgsim_core] Currently on:
          [dwgsim_core] 7046823
          [dwgsim_core] Complete!

          Files:
          -rw-r--r-- 1 root root 0 2012-04-17 21:43 sim_SE_ion.bfast.fastq
          -rw-r--r-- 1 root root 0 2012-04-17 21:43 sim_SE_ion.bwa.read1.fastq
          -rw-r--r-- 1 root root 0 2012-04-17 21:41 sim_SE_ion.bwa.read2.fastq
          -rw-r--r-- 1 root root 0 2012-04-17 21:43 sim_SE_ion.mutations.txt

          my ouput files are empty can you explain why is it happening so?
          Regards,
          Chintan Vora

          Comment


          • #6
            version 1.11 vs 1.10

            I forgot to mention the version of dwgsim which i used...it was 1.11..i had downloaded from git

            Now, i downloaded version 1.10. from sourceforge.net, to my surprise I am getting data for ion torrent and not for illumina and SOLiD.

            Can someone explain what is the problem???
            Regards,
            Chintan Vora

            Comment


            • #7
              question solved

              Finally, got the answer...it was the 'hard disk space' issue.

              But i had doubt..as in....how is illumina data diffferent from ion torrent data??
              Both the outputs are in fastq and basespace

              How is the flow order(for ion torrent data) implemented to generate ion torrent data?
              Regards,
              Chintan Vora

              Comment


              • #8
                Check out the differences in the technology to understand why Ion Torrent is different from Illumina. You can even download public datasets that each company provides. Understanding the technology will help you understand flow order and the like.

                Comment


                • #9
                  I guess i was not clear about my question.......

                  I have known the difference between the 2 technologies

                  I wanted to know how the 'flow order' information used to generate simulated data?

                  Sorry if I am still unclear.
                  Regards,
                  Chintan Vora

                  Comment


                  • #10
                    Originally posted by chintanspy View Post
                    I guess i was not clear about my question.......

                    I have known the difference between the 2 technologies

                    I wanted to know how the 'flow order' information used to generate simulated data?

                    Sorry if I am still unclear.
                    454 and Ion Torrent produce estimates of hompolymers, one for each flow. The errors occur when these homopolymers are misestimated. So for Ion Torrent data, dwgsim introduces errors by misestimating the hompolymer length.

                    Comment


                    • #11
                      Originally posted by nilshomer View Post
                      454 and Ion Torrent produce estimates of hompolymers, one for each flow. The errors occur when these homopolymers are misestimated. So for Ion Torrent data, dwgsim introduces errors by misestimating the hompolymer length.
                      You mean to say the Carry Forward and Incomplete Extension errors are introduced, right?
                      it is also depended on the length of the read, right?

                      Also, does -f option take pattern for flow order ?
                      Regards,
                      Chintan Vora

                      Comment


                      • #12
                        Originally posted by chintanspy View Post
                        You mean to say the Carry Forward and Incomplete Extension errors are introduced, right?
                        it is also depended on the length of the read, right?

                        Also, does -f option take pattern for flow order ?
                        CAFIE is not explicitly modeled, no, but a more simple error model (overcall/undercall) is used. The error rate can be adjusted to increase/decrease across the read. Finally, any flow order can be used as long as it has at least one instance of each nucleotide.

                        Comment


                        • #13
                          Thank you for quick replies. Things are pretty much clear.

                          There is an option to generate strand specific reads:

                          -S INT generate reads [0]:
                          0: default (opposite strand for Illumina, same strand for SOLiD/Ion Torrent)
                          1: same strand (mate pair)
                          2: opposite strand (paired end)

                          The default is '0' which means it should generate reads from same strand for Ion torrent. I am getting the reads from opposite strand as well.

                          I tried the following command
                          dwgsim -1 30 -e 0 -E 0 -r 0 -X 0 -y 0 -R 0 -2 0 -c 2 -B -f ATGC test.fa ionData/test

                          my test.fa had following sequence
                          AAAATGCAAAATCTGAAAAAACGTTTTGGGAAAAAAAAAA

                          ####Output reads
                          count Reads
                          8 AAAATCTGAAAAAACGTTTTGGGAAAAAAA
                          9 AAAATGCAAAATCTGAAAAAACGTTTTGGG
                          2 AAATCTGAAAAAACGTTTTGGGAAAAAAAA
                          7 AAATGCAAAATCTGAAAAAACGTTTTGGGA
                          8 AATCTGAAAAAACGTTTTGGGAAAAAAAAA
                          8 AATGCAAAATCTGAAAAAACGTTTTGGGAA
                          7 ATCTGAAAAAACGTTTTGGGAAAAAAAAAA
                          4 ATGCAAAATCTGAAAAAACGTTTTGGGAAA
                          6 CAAAATCTGAAAAAACGTTTTGGGAAAAAA
                          2 CCCAAAACGTTTTTTCAGATTTTGCATTTT
                          9 GCAAAATCTGAAAAAACGTTTTGGGAAAAA
                          7 TCCCAAAACGTTTTTTCAGATTTTGCATTT
                          9 TGCAAAATCTGAAAAAACGTTTTGGGAAAA
                          5 TTCCCAAAACGTTTTTTCAGATTTTGCATT
                          5 TTTCCCAAAACGTTTTTTCAGATTTTGCAT
                          7 TTTTCCCAAAACGTTTTTTCAGATTTTGCA
                          7 TTTTTCCCAAAACGTTTTTTCAGATTTTGC
                          3 TTTTTTCCCAAAACGTTTTTTCAGATTTTG
                          2 TTTTTTTCCCAAAACGTTTTTTCAGATTTT
                          5 TTTTTTTTCCCAAAACGTTTTTTCAGATTT
                          8 TTTTTTTTTCCCAAAACGTTTTTTCAGATT
                          5 TTTTTTTTTTCCCAAAACGTTTTTTCAGAT

                          can you please explain why is it happening so?
                          Regards,
                          Chintan Vora

                          Comment


                          • #14
                            Two things:
                            1. Could you try the latest code here: https://github.com/nh13/dwgsim. See commit: 6c33e0e5c64c816a5b95c7eac155eef2b4b8155c
                            2. Can you post two reads in your FASTQ that map on opposite strands?

                            Comment


                            • #15
                              Originally posted by nilshomer View Post
                              1. Could you try the latest code here: https://github.com/nh13/dwgsim. See commit: 6c33e0e5c64c816a5b95c7eac155eef2b4b8155c
                              I will try and let you know the output.

                              Originally posted by nilshomer View Post
                              2. Can you post two reads in your FASTQ that map on opposite strands?
                              The following reads map on to opposite strand
                              CCCAAAACGTTTTTTCAGATTTTGCATTTT
                              TCCCAAAACGTTTTTTCAGATTTTGCATTT
                              TTCCCAAAACGTTTTTTCAGATTTTGCATT
                              TTTCCCAAAACGTTTTTTCAGATTTTGCAT
                              TTTTCCCAAAACGTTTTTTCAGATTTTGCA
                              TTTTTCCCAAAACGTTTTTTCAGATTTTGC
                              TTTTTTCCCAAAACGTTTTTTCAGATTTTG
                              TTTTTTTCCCAAAACGTTTTTTCAGATTTT
                              TTTTTTTTCCCAAAACGTTTTTTCAGATTT
                              TTTTTTTTTCCCAAAACGTTTTTTCAGATT
                              TTTTTTTTTTCCCAAAACGTTTTTTCAGAT
                              Regards,
                              Chintan Vora

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X