Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Error using Masurca 3.2.6 assembler

    Hi all, I'm actually using MaSuRCA-3.2.6 to assemble my genome and a ran the fallowing script:

    ```
    #PBS -S /bin/bash
    #PBS -l nodes=1pn=8:bigmem,mem=100gb
    #PBS -e /pandata/ACG-0006_0027/LOGS/ACG-006_assembly.error
    #PBS -o /pandata/ACG-0006_0027/LOGS/ACG-006_assembly.out
    #PBS -N ACG-006
    #PBS -q q1week


    DATA
    PE= pe 150 22 /pandata/LEPIWASP/ACG-0006_0027/frag_1.fastq /pandata/LEPIWASP/ACG-0006_0027/frag_2.fastq

    END

    PARAMETERS
    #set this to 1 if your Illumina jumping library reads are shorter than 100bp
    EXTEND_JUMP_READS=0
    #this is k-mer size for deBruijn graph values between 25 and 127 are supported, auto will compute the optimal size based on the read data and GC content
    GRAPH_KMER_SIZE = auto
    #set this to 1 for all Illumina-only assemblies
    #set this to 1 if you have less than 20x long reads (454, Sanger, Pacbio) and less than 50x CLONE coverage by Illumina, Sanger or 454 mate pairs
    #otherwise keep at 0
    USE_LINKING_MATES = 0
    #specifies whether to run mega-reads correction on the grid
    USE_GRID=0
    #specifies queue to use when running on the grid MANDATORY
    GRID_QUEUE=all.q
    #batch size in the amount of long read sequence for each batch on the grid
    GRID_BATCH_SIZE=300000000
    #coverage by the longest Long reads to use
    LHE_COVERAGE=30
    #this parameter is useful if you have too many Illumina jumping library mates. Typically set it to 60 for bacteria and 300 for the other organisms
    LIMIT_JUMP_COVERAGE = 300
    #these are the additional parameters to Celera Assembler. do not worry about performance, number or processors or batch sizes -- these are computed automatically.
    #set cgwErrorRate=0.25 for bacteria and 0.1<=cgwErrorRate<=0.15 for other organisms.
    CA_PARAMETERS = cgwErrorRate=0.15
    #minimum count k-mers used in error correction 1 means all k-mers are used. one can increase to 2 if Illumina coverage >100
    KMER_COUNT_THRESHOLD = 1
    #whether to attempt to close gaps in scaffolds with Illumina data
    CLOSE_GAPS=1
    #auto-detected number of cpus to use
    NUM_THREADS = 16
    #this is mandatory jellyfish hash size -- a safe value is estimated_genome_size*estimated_coverage
    JF_SIZE = 200000000
    #set this to 1 to use SOAPdenovo contigging/scaffolding module. Assembly will be worse but will run faster. Useful for very large (>5Gbp) genomes from Illumina-only data
    SOAP_ASSEMBLY=0
    END
    ```

    Then, I got the asemble.sh file and I ran it as well and got the following .out:

    ```
    [Sat Jun 16 22:32:45 CEST 2018] Processing pe library reads
    [Sat Jun 16 22:49:04 CEST 2018] Average PE read length 150
    [Sat Jun 16 22:49:05 CEST 2018] Using kmer size of 49 for the graph
    [Sat Jun 16 22:49:06 CEST 2018] MIN_Q_CHAR: 33
    WARNING: JF_SIZE set too low, increasing JF_SIZE to at least 1115876884, this automatic increase may be not enough!
    [Sat Jun 16 22:49:06 CEST 2018] Creating mer database for Quorum
    [Sat Jun 16 23:09:23 CEST 2018] Error correct PE.
    [Sat Jun 16 23:11:49 CEST 2018] Error correction of PE reads failed. Check pe.cor.log.

    `and .error: `

    /panhome/TOOLS/MaSuRCA-3.2.6/assemble.sh: line 102: 46750 Aborted quorum_error_correct_reads -q $((MIN_Q_CHAR + 40)
    ) --contaminant=/panhome/TOOLS/MaSuRCA-3.2.6/bin/../share/adapter.jf -m 1 -s 1 -g 1 -a 3 -t 16 -w 10 -e 3 -M quorum_mer_db.jf pe.re
    named.fastq --no-discard -o pe.cor.tmp --verbose > quorum.err 2>&1
    ```

    Does someone have an idea of what is going on here? Thanks for your help.

    The 2 fasta files are comming from an illumina Hiseq 3000 150bp and the genome size of my specie is around 1.5 GB.

  • #2
    I checked on internet and tried to change the JF_Size with JF_SIZE = 25500000000 and got this error:

    Code:
    line 102: 25712 Aborted                 quorum_error_correct_reads -q $((MIN_Q_CHAR + 40)
    ) --contaminant=/panhome/bguinet/TOOLS/MaSuRCA-3.2.6/bin/../share/adapter.jf -m 1 -s 1 -g 1 -a 3 -t 16 -w 10 -e 3 -M quorum_mer_db.jf pe.re
    named.fastq --no-discard -o pe.cor.tmp --verbose > quorum.err 2>&1
    and the .out
    Code:
    [Sun Jun 17 11:40:30 CEST 2018] Processing pe library reads
    [Sun Jun 17 11:50:47 CEST 2018] Average PE read length 150
    [Sun Jun 17 11:50:47 CEST 2018] Using kmer size of 49 for the graph
    [Sun Jun 17 11:50:48 CEST 2018] MIN_Q_CHAR: 33
    [Sun Jun 17 11:50:48 CEST 2018] Creating mer database for Quorum
    [Sun Jun 17 12:19:01 CEST 2018] Error correct PE.
    [Sun Jun 17 12:35:01 CEST 2018] Error correction of PE reads failed. Check pe.cor.log.
    and the frag.fastaq files are correct:


    Code:
    /pandata/LEPIWASP/ACG-0006_0027$ file -b -i frag_1.fastq
    text/plain; charset=us-ascii
    /pandata/LEPIWASP/ACG-0006_0027$ file -b -i frag_2.fastq
    text/plain; charset=us-ascii
    and I cannot check the pe.cor.log file because it does not exist.

    Comment


    • #3
      Masurca, failed to create mega-reads frg file

      Hi Guys,
      I need your help.
      Tried to solve alone by changing and avoiding some parameters, however still I am getting the same error.

      I am running Masurca with config file (see below).

      Analysis of asembly PE illumina with nanopore stoped on the "Generating assembly input files step"

      Error type:

      error reading mega-reads file at /bioappl/src/MaSuRCA/MaSuRCA-3.3.3/bin/find_contained_reads.pl line 33, <FILE> line 23780.
      [Mon Jun 10 18:27:32 CEST 2019] failed to create mega-reads frg file
      [Mon Jun 10 18:27:32 CEST 2019] mega-reads exited before assembly

      Could someone help me what to do now? where is the problem?

      Thank you in advance, a lot!!!!
      D

      DATA
      #Illumina paired end reads supplied as <two-character prefix> <fragment mean> <fragment stdev> <forward_reads> <reverse_reads>
      #if single-end, do not specify <reverse_reads>
      #MUST HAVE Illumina paired end reads to use MaSuRCA
      PE= il 75 11 /bioinf/proj_data_chestnut/dorota_b/Illumina/R1.fastq /bioinf/proj_data_chestnut/dorota_b/Illumina/R2.fastq
      #pacbio OR nanopore reads must be in a single fasta or fastq file with absolute path, can be gzipped
      NANOPORE=/bioinf/proj_data_chestnut/dorota_b/Nanopore/nanopore.fastq
      END

      PARAMETERS
      #PLEASE READ all comments to essential parameters below, and set the parameters according to your project
      #set this to 1 if your Illumina jumping library reads are shorter than 100bp
      EXTEND_JUMP_READS=0
      #this is k-mer size for deBruijn graph values between 25 and 127 are supported, auto will compute the optimal size based on the read data and GC content
      GRAPH_KMER_SIZE = auto
      #set this to 1 for all Illumina-only assemblies
      #set this to 0 if you have more than 15x coverage by long reads (Pacbio or Nanopore) or any other long reads/mate pairs (Illumina MP, Sanger, 454, etc)
      USE_LINKING_MATES = 0
      #specifies whether to run the assembly on the grid
      USE_GRID=0
      #specifies grid engine to use SGE or SLURM
      GRID_ENGINE=SGE
      #specifies queue (for SGE) or partition (for SLURM) to use when running on the grid MANDATORY
      GRID_QUEUE=all.q
      #batch size in the amount of long read sequence for each batch on the grid
      GRID_BATCH_SIZE=500000000
      #use at most this much coverage by the longest Pacbio or Nanopore reads, discard the rest of the reads
      #can increase this to 30 or 35 if your reads are short (N50<7000bp)
      LHE_COVERAGE=25
      #set to 0 (default) to do two passes of mega-reads for slower, but higher quality assembly, otherwise set to 1
      MEGA_READS_ONE_PASS=1
      #this parameter is useful if you have too many Illumina jumping library mates. Typically set it to 60 for bacteria and 300 for the other organisms
      LIMIT_JUMP_COVERAGE = 60
      #these are the additional parameters to Celera Assembler. do not worry about performance, number or processors or batch sizes -- these are computed automatically.
      #CABOG ASSEMBLY ONLY: set cgwErrorRate=0.25 for bacteria and 0.1<=cgwErrorRate<=0.15 for other organisms.
      CA_PARAMETERS = cgwErrorRate=0.15
      #CABOG ASSEMBLY ONLY: whether to attempt to close gaps in scaffolds with Illumina or long read data
      CLOSE_GAPS=1
      #auto-detected number of cpus to use, set this to the number of CPUs/threads per node you will be using
      NUM_THREADS = 20
      #this is mandatory jellyfish hash size -- a safe value is estimated_genome_size*20
      JF_SIZE = 160000000
      #ILLUMINA ONLY. Set this to 1 to use SOAPdenovo contigging/scaffolding module. Assembly will be worse but will run faster. Useful for very large (>=8Gbp) genomes from Illumina-only data
      SOAP_ASSEMBLY=0
      #Hybrid Illumina paired end + Nanopore/PacBio assembly ONLY. Set this to 1 to use Flye assembler for final assembly of corrected mega-reads. A lot faster than CABOG, at the expense of some contiguity. Works well even when MEGA_READS_ONE_PASS is set to 1. DO NOT use if you have less than 15x coverage by long reads.
      FLYE_ASSEMBLY=0
      END

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM
      • seqadmin
        The Impact of AI in Genomic Medicine
        by seqadmin



        Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
        02-26-2024, 02:07 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-14-2024, 06:13 AM
      0 responses
      33 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-08-2024, 08:03 AM
      0 responses
      72 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-07-2024, 08:13 AM
      0 responses
      81 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-06-2024, 09:51 AM
      0 responses
      68 views
      0 likes
      Last Post seqadmin  
      Working...
      X