Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Simulate Illumina read-pairs

    Hello,

    I want to simulate read-pairs using a read-length greater than 35 (up to 75). If I run MAQ, this works:

    maq simulate -N 1000 -1 35 -2 35 out.read.1.fastq out.read.2.fastq human_b37_chr22.fasta calib-36.dat

    But this does not:

    maq simulate -N 1000 -1 70 -2 70 out.read.1.fastq out.read.2.fastq human_b37_chr22.fasta" calib-36.dat

    calib-36.dat was downloaded from http://sourceforge.net/projects/maq/...data/20080929/. The flags "-1" and "-2" do not work if the user wants read lengths greater than those specified in the .dat file.

    If anybody knows about other sequencers and other simulators, that is helpful too. I want to sequence a genome subject to random mutations in the sequencing process.

  • #2
    Hi.

    I think the up to date simulator is wgsim

    Reads simulator. Contribute to lh3/wgsim development by creating an account on GitHub.


    History
    =======

    Wgsim was modified from MAQ's read simulator by dropping dependencies to other
    source codes in the MAQ package and incorporating patches from Colin Hercus
    which allow to simulate INDELs longer than 1bp. Wgsim was originally released
    in the SAMtools software package. I forked it out in 2011 as a standalone
    project. A few improvements were also added in this course.

    Comment


    • #3
      Software list

      Hey,
      I am currently reviewing software for this purpose so I know of quite a few options. Most of these you can just google "[prog name] simulation genome" or something and you will find them in the top few hits. The illumina one you need to write to them to ask for and as far as I know it is not official.

      * wgsim -> PE only, uniform error
      * dwgsim -> Position specific error. PE only
      * metasim -> PE only, specialized for simulating from a population
      * in-house illumina C++ -> doesn't model mate-pair chimeras, uses sampling of illumina error strings as the error for the output. Doesn't model base specific error though, error is the same for each underlying base if it occurs.
      * in-house illumina perl -> This adds in proper handling of mate-pair simulation, but it uses the same base level error strategy as the C++ version, this is the main reason we chose to write our own. Doesn't model pe-contamination in MP lib, but the developer notes it would be easy to separately generate PE reads and mix them into the output file. Although ours ended up being backwards, we still successfully modeled different error rates depending on the underlying base.
      * PEMer -> no mate-pair chimeras
      * reseqsim -> focuses on SV analysis, doesn't do MP modeling
      * simnext -> flat error rate like wgsim
      * mason -> doesn't model mate-pair chimeras
      * flux-capacitor -> models RNA-seq reads


      And of course there is the one I wrote which we used in the first Assemblathon:
      SimSeq: https://github.com/jstjohn/SimSeq?locale=en

      Comment


      • #4
        Thanks everyone for your replies.

        I want a sequence error simulator that should match Illumina in the 1000 Genomes Project. That is where I am getting my data from. (Illumina-specific is not a die-hard requirement, but it helps a bit. The type of error should not depend on the read size of reads.)

        I need read-pairs. Read length should be specifiable by the user. The insert size should follow a random distribution - Normal or whatever - that can be specified. SimSeq seems to satisfy those criteria at the moment but I have not tried it yet.

        I have my own tailored donor genome for a particular kind of mutation that needs sequencing errors.

        Comment


        • #5
          If I want to use dwgsim for simulating read-pairs, can anyone explain the flags for me (http://sourceforge.net/apps/mediawik...ome_Simulation)?

          What do -e and -E mean technically? What are the error rates relative to?

          I think that -r is the mutation rate per base pair. Can that be confirmed?

          What does -R, the fraction of indels, mean? Fraction of what?

          -X and -y are also confusing. What are those probabilities relative to?

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          31 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          32 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Working...
          X