Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Questions for Newbler users ?

    I'm new to newbler and have some naive questions on using the GS De Novo Assembler... if hope anyone can help me.

    1) I have 454 sequencing reads in fasta and qual format. It seems that from the manual, Newbler only read fastafile as input. I wonder if Newbler also read qual input file?

    2) If I have multiple fasta files, is there a command option that Newbler to read them all at once rather than using additional command line such as 'addRun' ?


    Thanks in advance...

  • #2
    1) You can not add quality score information if your sequences are input as FASTA files.

    2) If you have multiple fasta files you could supply all of the names as a list on one line:

    addRun file1.fasta file2.fasta file3.fasta

    or you could use standard shell globbing to specify multiple files:

    addRun *.fasta

    The newbler assembler, which is now officially called the gsAssembler or "GS De novo Assembler", prefers that the data be supplied in the 454 native format, Standard Flowgram Format (.sff) files. These files, in addition to the flowgram information, contain the base calls and quality scores. Contact the facility which performed the sequencing for you and ask them to provide the SFF files for your runs.

    Comment


    • #3
      Newbler will take quality information for FASTA files

      If Newbler will take quality information for FASTA formated sequences. For example, if the FASTA file is "myfasta.fna" and quality file should be "myfasta.qual" or "myfasta.fna.qual" in the same directory.

      Comment


      • #4
        I stand corrected in my response #1 above. NSTbioinformatics is correct, you can included the quality score information provided you use the naming conventions described. If you have access to the manual this is described in section 5.5 "Assembling with Reads Obtained Using the Sanger Sequencing Method", specifically 5.5.2 "Sanger Reads with available Quality Scores"

        If there are quality score files associated with the FASTA files, the GS De Novo Assembler will automatically read the quality scores for the reads, and use them in the assembly and consensus calling. The assembler looks for a file that begins with the same file name or file prefix as the FASTA file but ending with “.qual”. So, for a FASTA file whose name is “myreads.fna”, the assembler will test both “myreads.fna.qual” and “myreads.qual” (replacing any suffix with “.qual”) in its search for a quality score file. If such a file exists, it must contain sequence entries that match (in order) the sequences of the FASTA file, and the number of quality scores for each sequence must match the number of bases of each corresponding sequence in the FASTA file.

        From the October 2008 version of the GS Data Analysis Software Manual. Refers to v 2.0.00.22 of newber.
        I have tested running newbler using only FASTA data, with and without the .qual file and it does make a difference in the the assembly, but newbler makes no note of finding or loading the .qual file in its progress logs or output files.

        I would still urge you to contact your sequencing facility to try to obtain the .sff files. I tested newbler using the same data set (Titanium shotgun data for a bacterial genome) supplied as either .sff, .fna only and .fna+.fna.qual. Newbler generated the better assembly* when it had the .sff file. (*Better determined by the level of contiguity of the assembly.)

        Comment


        • #5
          I thank kmcarr and NSTbioinformatics for your help. I will try to obtain the .sff files.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          30 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          32 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Working...
          X