Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ronaldrcutler
    Member
    • May 2016
    • 80

    Newblar (GS de novo assembler) paired end input

    Hello all.

    I am new to genome assembly and am using Newblar to assemble two paired-end fastq files that are 4.09 GB each. I see that when you input these files into the program you have the choice to choose if they are paired-end, does this mean it will recognize both input files as being paired-end?

    I have also gone ahead and merged the 2 paired-end files into a using flash. When I input this merged paired-end file, should I still choose the paired-end option in Newblar?

    Also, I am getting an error that I have ran out of computation memory. Thus, I am splitting up my files using fastq-splitter.pl. However, how would I split up separate paired-end files for input into Newblar?

    Thanks in advance
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    #2
    You will not get an optimal assembly if you split reads into multiple subsets and assembly them independently. In fact, you'll get a mess. If you run out of memory, you need to use a computer that has more memory, or a different algorithm.

    Comment

    • ronaldrcutler
      Member
      • May 2016
      • 80

      #3
      Okay thanks for the important info.

      Do you know if I put in two paired-end files and use the paired-end option on both of them, Newbler will recognize this as paired end files? Or is using a merged file of the two paired-end reads a better approach?

      Comment

      • Brian Bushnell
        Super Moderator
        • Jan 2014
        • 2709

        #4
        Sorry, I have never used Newbler, so I don't know its idiosyncrasies... but hopefully someone else does!

        Typically, if you have overlapping reads, an OLC assembler will perform best with merged reads. Flash does not perform well in my tests, though. Bearing in mind that I am biased, being the developer, I recommend BBMerge for joining paired reads prior to assembly.

        What was your merge rate? The best procedure depends on that... if the insert size was too long to merge a substantial fraction of the reads, it's better to skip merging.

        Comment

        • ronaldrcutler
          Member
          • May 2016
          • 80

          #5
          The max read length is 250 bp, which I used as the maxOverlap parameter in flash. The results of this merge:

          Code:
          [FLASH] Read combination statistics:
          [FLASH]  Total pairs: 8576138
          [FLASH]  Combined pairs:  6056207
          [FLASH]  Uncombined pairs: 2519931
          [FLASH]  Percent combined: 70.62%
          Note that when I adjusted maxOverlap to be 225, I was getting a warning that a high proportion overlapped by more than 225 bp. Which is why I stuck with 250. Although this may not be the best option since my max read length is 250 bp?

          The max read length was calculated by using this command and looking through all the reads to determine a max read length:

          Code:
          awk '{if(NR%4==2) print NR"\t"$0"\t"length($0)}' <read> > <output.txt>
          Last edited by ronaldrcutler; 07-14-2016, 03:32 AM.

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            Is this 454 data? Is that the reason for using newbler?

            Comment

            • ronaldrcutler
              Member
              • May 2016
              • 80

              #7
              No this is fastq data.

              Comment

              • GenoMax
                Senior Member
                • Feb 2008
                • 7142

                #8
                From which platform? How big is the genome expected to be? What is the read length?

                Comment

                • ronaldrcutler
                  Member
                  • May 2016
                  • 80

                  #9
                  Illumina I believe.

                  The merged paired-end file (using flash) has 6056207 sequences, 1451352720 bp
                  The mate1 paired-end file has 8576138 sequences, 1798377920 bp

                  The read lengths are 250 bp

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Pathogen Surveillance with Advanced Genomic Tools
                    by seqadmin




                    The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                    03-24-2025, 11:48 AM
                  • seqadmin
                    New Genomics Tools and Methods Shared at AGBT 2025
                    by seqadmin


                    This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                    The Headliner
                    The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                    03-03-2025, 01:39 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 03-20-2025, 05:03 AM
                  0 responses
                  49 views
                  0 reactions
                  Last Post seqadmin  
                  Started by seqadmin, 03-19-2025, 07:27 AM
                  0 responses
                  57 views
                  0 reactions
                  Last Post seqadmin  
                  Started by seqadmin, 03-18-2025, 12:50 PM
                  0 responses
                  50 views
                  0 reactions
                  Last Post seqadmin  
                  Started by seqadmin, 03-03-2025, 01:15 PM
                  0 responses
                  200 views
                  0 reactions
                  Last Post seqadmin  
                  Working...