Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • dwgsim usage

    I want to use dwgsim to generate CNVs and InDels.
    Searched for documentation, didn't find anything useful except "http://sourceforge.net/apps/mediawiki/dnaa/index.php?title=Whole_Genome_Simulation".

    Tried using it myself, its working, am getting an output, hence have some questions!

    1) "chr1 763 A C -" what does this line of output mean?
    As far as I can understand the program is taking Nucleotides from Ref Chr1, and changing them based on some parameter and then generating reads.
    In short can you please explain the algorithm behind this??

    In the output format, what is meant by "start end 1" & "start end 2" ?

    2) Why does the program produce 3 output files(2 for bwa, 1 for BFast)?

    Any help will be highly appreciated!!!

  • #2
    Originally posted by gprakhar View Post
    I want to use dwgsim to generate CNVs and InDels.
    Searched for documentation, didn't find anything useful except "http://sourceforge.net/apps/mediawiki/dnaa/index.php?title=Whole_Genome_Simulation".

    Tried using it myself, its working, am getting an output, hence have some questions!
    The source code is the best documentation. This simulator is forked off of the SAMtools simulator, which was modified from the MAQ simulator. It does not simulate CNVs, but only small indels, SNPs, and a single type of sequencing error (base change).

    Originally posted by gprakhar View Post
    1) "chr1 763 A C -" what does this line of output mean?
    As far as I can understand the program is taking Nucleotides from Ref Chr1, and changing them based on some parameter and then generating reads.
    In short can you please explain the algorithm behind this??
    It randomly adds SNPs and inserts/deletes bases at the given rates (-r/-R) in the given reference. The indel length is determined by an exponential distribution (-X). Then reads are simulated from this mutated genome with the given error rate (-e). Paired end reads' insert size are also drawn from an distribution. See the source code for more details.

    Originally posted by gprakhar View Post
    In the output format, what is meant by "start end 1" & "start end 2" ?
    The start positions in the reference from where the two paired end reads were drawn.

    Originally posted by gprakhar View Post
    2) Why does the program produce 3 output files(2 for bwa, 1 for BFast)?

    Any help will be highly appreciated!!!
    So we can test BWA and BFAST on the same datasets. BWA requires two files, one for each paired end, while BFAST requires an aggregated file.

    Comment


    • #3
      Thank you for the help.
      I do agree that Source Code is the best documentation, still I will try and create a manual, as I learn about the tool.
      We have a CNV detection tool which we want to test. Hence I wanted simulated data with CNV at known locations.

      Any simulators for this particular job?

      SV simulation from the PEMer pacakage looks good, but I am having trouble compiling it.

      Comment


      • #4
        Hello, Nils,
        I have a problem obtaining color-space simulation reads.
        The output BFAST looks OK:

        @gi|12057207|gb|AE001439.1|_637443_636958_1_1_1:0:0_0:0:0_0
        A21322003132201120203003022220220001033010212331013
        +
        11111111111111111111111111111111111111111111111111
        ...

        But the two BWA files contain nucleotide reads:

        @gi|12057207|gb|AE001439.1|_637444_636959_1_1_1:0:0_0:0:0_0/2
        CTGGAATCTGGACCGAGATAATAGGGGAGGAAACATTACAGCGTTCACT
        +
        1111111111111111111111111111111111111111111111111
        ...

        Is is intended? How to get BWA files in color-space?

        Regards,
        Alexander

        Comment


        • #5
          Originally posted by Alex8 View Post
          Hello, Nils,
          I have a problem obtaining color-space simulation reads.
          The output BFAST looks OK:

          @gi|12057207|gb|AE001439.1|_637443_636958_1_1_1:0:0_0:0:0_0
          A21322003132201120203003022220220001033010212331013
          +
          11111111111111111111111111111111111111111111111111
          ...

          But the two BWA files contain nucleotide reads:

          @gi|12057207|gb|AE001439.1|_637444_636959_1_1_1:0:0_0:0:0_0/2
          CTGGAATCTGGACCGAGATAATAGGGGAGGAAACATTACAGCGTTCACT
          +
          1111111111111111111111111111111111111111111111111
          ...

          Is is intended? How to get BWA files in color-space?

          Regards,
          Alexander
          This is intended. The BWA reads are "double-encoded" and the input to BWA as if you had used BWA's solid2fastq.pl.

          Comment


          • #6
            I wonder what is the easy way to convert double-encoded to csfasta? SOLiD denovo tools seems to contain similar function but is is not clearly available.

            Comment


            • #7
              Originally posted by Alex8 View Post
              I wonder what is the easy way to convert double-encoded to csfasta? SOLiD denovo tools seems to contain similar function but is is not clearly available.
              You should use the BFAST output to convert the CSFATQ to CSFASTA/QUAL, not the BWA. The outputs of dwgsim were designed for BFAST and BWA.

              Comment


              • #8
                Thank you for the information!

                Comment


                • #9
                  Nils, please let know the format of the base/color error rate file - the one that goes with -E key.

                  Comment


                  • #10
                    Originally posted by Alex8 View Post
                    Nils, please let know the format of the base/color error rate file - the one that goes with -E key.
                    In the 0.1.1 release, you should have one value per line, with the ith value representing the error rate of the ith base.

                    The latest GIT, which is what I would like to support, does not use an error file, but instead uses two command line options "-e/-E" giving either a uniform error rate (ex "-e 0.01") or a increasing error rate (ex from 0.01 to 0.1 is "-e 0.01-0.1").

                    The source code is your best documentation for this software.

                    Comment


                    • #11
                      Hey, I'm using dwgsim to generate paired end reads for my data. I need to generate a few single end read files as well is there any way i can use dwgsim to do the same? I don't want to use 2 different simulators for my studies.

                      -Arjun

                      Comment


                      • #12
                        Sure, specify zero length reads for the second end ("-2 0").

                        Comment


                        • #13
                          Originally posted by nilshomer View Post
                          Sure, specify zero length reads for the second end ("-2 0").
                          Wanted to clarify that! Thanks a ton!

                          -A

                          Comment


                          • #14
                            Hey i just got another doubt regarding my single end problem... if i'm setting "-2 0" then do i double the value of -N since N is the No of Read Pairs?

                            -A

                            Comment


                            • #15
                              Originally posted by arkal View Post
                              Hey i just got another doubt regarding my single end problem... if i'm setting "-2 0" then do i double the value of -N since N is the No of Read Pairs?

                              -A
                              Just to clarify what i mean, supposing i need 1,000,000 read pairs for 15x coverage,
                              PE
                              -N 1000000 -1 76 -2 76

                              then is SE

                              -N 1000000 -1 76 -2 0

                              or

                              -N 2000000 -1 76 -2 0

                              ?

                              Also, if I take file 1 of PE 30X is it equivalent to SE 15X?
                              -A
                              Last edited by arkal; 07-11-2011, 11:12 PM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Recent Advances in Sequencing Analysis Tools
                                by seqadmin


                                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                                05-06-2024, 07:48 AM
                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:35 AM
                              0 responses
                              15 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-09-2024, 02:46 PM
                              0 responses
                              21 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-07-2024, 06:57 AM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-06-2024, 07:17 AM
                              0 responses
                              19 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X