Seqanswers Leaderboard Ad

**kmcarr** · 07-01-2009, 11:19 AM

1) You can not add quality score information if your sequences are input as FASTA files.

2) If you have multiple fasta files you could supply all of the names as a list on one line:

addRun file1.fasta file2.fasta file3.fasta

or you could use standard shell globbing to specify multiple files:

addRun *.fasta

The newbler assembler, which is now officially called the gsAssembler or "GS De novo Assembler", prefers that the data be supplied in the 454 native format, Standard Flowgram Format (.sff) files. These files, in addition to the flowgram information, contain the base calls and quality scores. Contact the facility which performed the sequencing for you and ask them to provide the SFF files for your runs.

**NSTbioinformatics** · 07-03-2009, 06:10 AM

Newbler will take quality information for FASTA files

If Newbler will take quality information for FASTA formated sequences. For example, if the FASTA file is "myfasta.fna" and quality file should be "myfasta.qual" or "myfasta.fna.qual" in the same directory.

**kmcarr** · 07-04-2009, 06:06 AM

I stand corrected in my response #1 above. NSTbioinformatics is correct, you can included the quality score information provided you use the naming conventions described. If you have access to the manual this is described in section 5.5 "Assembling with Reads Obtained Using the Sanger Sequencing Method", specifically 5.5.2 "Sanger Reads with available Quality Scores"

If there are quality score files associated with the FASTA files, the GS De Novo Assembler will automatically read the quality scores for the reads, and use them in the assembly and consensus calling. The assembler looks for a file that begins with the same file name or file prefix as the FASTA file but ending with “.qual”. So, for a FASTA file whose name is “myreads.fna”, the assembler will test both “myreads.fna.qual” and “myreads.qual” (replacing any suffix with “.qual”) in its search for a quality score file. If such a file exists, it must contain sequence entries that match (in order) the sequences of the FASTA file, and the number of quality scores for each sequence must match the number of bases of each corresponding sequence in the FASTA file.

From the October 2008 version of the GS Data Analysis Software Manual. Refers to v 2.0.00.22 of newber.

I have tested running newbler using only FASTA data, with and without the .qual file and it does make a difference in the the assembly, but newbler makes no note of finding or loading the .qual file in its progress logs or output files.

I would still urge you to contact your sequencing facility to try to obtain the .sff files. I tested newbler using the same data set (Titanium shotgun data for a bacterial genome) supplied as either .sff, .fna only and .fna+.fna.qual. Newbler generated the better assembly* when it had the .sff file. (*Better determined by the level of contiguity of the assembly.)

**Coffeebean** · 07-09-2009, 08:56 AM

I thank kmcarr and NSTbioinformatics for your help. I will try to obtain the .sff files.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Questions for Newbler users ?

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News