Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Questions for Newbler users ?

    I'm new to newbler and have some naive questions on using the GS De Novo Assembler... if hope anyone can help me.

    1) I have 454 sequencing reads in fasta and qual format. It seems that from the manual, Newbler only read fastafile as input. I wonder if Newbler also read qual input file?

    2) If I have multiple fasta files, is there a command option that Newbler to read them all at once rather than using additional command line such as 'addRun' ?


    Thanks in advance...

  • #2
    1) You can not add quality score information if your sequences are input as FASTA files.

    2) If you have multiple fasta files you could supply all of the names as a list on one line:

    addRun file1.fasta file2.fasta file3.fasta

    or you could use standard shell globbing to specify multiple files:

    addRun *.fasta

    The newbler assembler, which is now officially called the gsAssembler or "GS De novo Assembler", prefers that the data be supplied in the 454 native format, Standard Flowgram Format (.sff) files. These files, in addition to the flowgram information, contain the base calls and quality scores. Contact the facility which performed the sequencing for you and ask them to provide the SFF files for your runs.

    Comment


    • #3
      Newbler will take quality information for FASTA files

      If Newbler will take quality information for FASTA formated sequences. For example, if the FASTA file is "myfasta.fna" and quality file should be "myfasta.qual" or "myfasta.fna.qual" in the same directory.

      Comment


      • #4
        I stand corrected in my response #1 above. NSTbioinformatics is correct, you can included the quality score information provided you use the naming conventions described. If you have access to the manual this is described in section 5.5 "Assembling with Reads Obtained Using the Sanger Sequencing Method", specifically 5.5.2 "Sanger Reads with available Quality Scores"

        If there are quality score files associated with the FASTA files, the GS De Novo Assembler will automatically read the quality scores for the reads, and use them in the assembly and consensus calling. The assembler looks for a file that begins with the same file name or file prefix as the FASTA file but ending with “.qual”. So, for a FASTA file whose name is “myreads.fna”, the assembler will test both “myreads.fna.qual” and “myreads.qual” (replacing any suffix with “.qual”) in its search for a quality score file. If such a file exists, it must contain sequence entries that match (in order) the sequences of the FASTA file, and the number of quality scores for each sequence must match the number of bases of each corresponding sequence in the FASTA file.

        From the October 2008 version of the GS Data Analysis Software Manual. Refers to v 2.0.00.22 of newber.
        I have tested running newbler using only FASTA data, with and without the .qual file and it does make a difference in the the assembly, but newbler makes no note of finding or loading the .qual file in its progress logs or output files.

        I would still urge you to contact your sequencing facility to try to obtain the .sff files. I tested newbler using the same data set (Titanium shotgun data for a bacterial genome) supplied as either .sff, .fna only and .fna+.fna.qual. Newbler generated the better assembly* when it had the .sff file. (*Better determined by the level of contiguity of the assembly.)

        Comment


        • #5
          I thank kmcarr and NSTbioinformatics for your help. I will try to obtain the .sff files.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 03-27-2024, 06:37 PM
          0 responses
          13 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-27-2024, 06:07 PM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          69 views
          0 likes
          Last Post seqadmin  
          Working...
          X