Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Simulate Illumina read-pairs

    Hello,

    I want to simulate read-pairs using a read-length greater than 35 (up to 75). If I run MAQ, this works:

    maq simulate -N 1000 -1 35 -2 35 out.read.1.fastq out.read.2.fastq human_b37_chr22.fasta calib-36.dat

    But this does not:

    maq simulate -N 1000 -1 70 -2 70 out.read.1.fastq out.read.2.fastq human_b37_chr22.fasta" calib-36.dat

    calib-36.dat was downloaded from http://sourceforge.net/projects/maq/...data/20080929/. The flags "-1" and "-2" do not work if the user wants read lengths greater than those specified in the .dat file.

    If anybody knows about other sequencers and other simulators, that is helpful too. I want to sequence a genome subject to random mutations in the sequencing process.

  • #2
    Hi.

    I think the up to date simulator is wgsim

    Reads simulator. Contribute to lh3/wgsim development by creating an account on GitHub.


    History
    =======

    Wgsim was modified from MAQ's read simulator by dropping dependencies to other
    source codes in the MAQ package and incorporating patches from Colin Hercus
    which allow to simulate INDELs longer than 1bp. Wgsim was originally released
    in the SAMtools software package. I forked it out in 2011 as a standalone
    project. A few improvements were also added in this course.

    Comment


    • #3
      Software list

      Hey,
      I am currently reviewing software for this purpose so I know of quite a few options. Most of these you can just google "[prog name] simulation genome" or something and you will find them in the top few hits. The illumina one you need to write to them to ask for and as far as I know it is not official.

      * wgsim -> PE only, uniform error
      * dwgsim -> Position specific error. PE only
      * metasim -> PE only, specialized for simulating from a population
      * in-house illumina C++ -> doesn't model mate-pair chimeras, uses sampling of illumina error strings as the error for the output. Doesn't model base specific error though, error is the same for each underlying base if it occurs.
      * in-house illumina perl -> This adds in proper handling of mate-pair simulation, but it uses the same base level error strategy as the C++ version, this is the main reason we chose to write our own. Doesn't model pe-contamination in MP lib, but the developer notes it would be easy to separately generate PE reads and mix them into the output file. Although ours ended up being backwards, we still successfully modeled different error rates depending on the underlying base.
      * PEMer -> no mate-pair chimeras
      * reseqsim -> focuses on SV analysis, doesn't do MP modeling
      * simnext -> flat error rate like wgsim
      * mason -> doesn't model mate-pair chimeras
      * flux-capacitor -> models RNA-seq reads


      And of course there is the one I wrote which we used in the first Assemblathon:
      SimSeq: https://github.com/jstjohn/SimSeq?locale=en

      Comment


      • #4
        Thanks everyone for your replies.

        I want a sequence error simulator that should match Illumina in the 1000 Genomes Project. That is where I am getting my data from. (Illumina-specific is not a die-hard requirement, but it helps a bit. The type of error should not depend on the read size of reads.)

        I need read-pairs. Read length should be specifiable by the user. The insert size should follow a random distribution - Normal or whatever - that can be specified. SimSeq seems to satisfy those criteria at the moment but I have not tried it yet.

        I have my own tailored donor genome for a particular kind of mutation that needs sequencing errors.

        Comment


        • #5
          If I want to use dwgsim for simulating read-pairs, can anyone explain the flags for me (http://sourceforge.net/apps/mediawik...ome_Simulation)?

          What do -e and -E mean technically? What are the error rates relative to?

          I think that -r is the mutation rate per base pair. Can that be confirmed?

          What does -R, the fraction of indels, mean? Fraction of what?

          -X and -y are also confusing. What are those probabilities relative to?

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Recent Advances in Sequencing Analysis Tools
            by seqadmin


            The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
            05-06-2024, 07:48 AM
          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:57 AM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-06-2024, 07:17 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-02-2024, 08:06 AM
          0 responses
          19 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-30-2024, 12:17 PM
          0 responses
          24 views
          0 likes
          Last Post seqadmin  
          Working...
          X