Seqanswers Leaderboard Ad

**srasdk** · 10-25-2011, 03:36 PM

Some of 454 is submitted to SRA as fastq. You will not be able to create sff files from this data.

**BaCh** · 10-26-2011, 12:25 AM

Is there any special reason you want to go the sff_extract way instead of directly using the FASTQ files from the SRA?

**seenstevo** · 10-26-2011, 12:54 AM

@ srasdk. I have heard that some 454 are submitted as fastq however seeing as I was able to use the fastq-dump tool to apparently create fastq files which I was then able to view etc (having not been able to previously) I assumed they were not in fastq format. using the file command on the files tells me they are simply data files. not really sure what format they are in...?

**seenstevo** · 10-26-2011, 01:04 AM

@ BaCh. Following the MIRA3 guide to preparing 454 data it wants the fastq files for the sequence and quality and the xml file for clipping info. if my SRA data still contains lower case base calls and N's then do I still need to do the clipping as suggested by MIRA3? it seems to be a necessary file.

If there is another approach I could take using just the fastq files i've got then i'd welcome other suggestions. also bearing in mind that when I view the fastq files, all lower case base reads have been upper cased. i guess this would not be a problem if the quality info was retained in the fastq format.
thanks, seenstevo

**srasdk** · 10-26-2011, 09:42 AM

The format you get from NCBI is SRA format. When it is created from fastq data, it lacks sufficient information like "454 signal" and "right quality clip" to generate SFF.

I am not familiar with MIRA3, but is sounds like you lack ready-to-use scripts to generated required format.
If you are handy with perl/python/awk/etc... You may be able to use generic vdb-dump from Toolkit and post-process the output.

Example:

./vdb-dump -C NAME,READ,'(INSDC:quality:text

hred_33)QUALITY',READ_LEN SRR000001 -f tab | head -1
EM7LVYS01C1LWG TCAGGGGGGAGCTTAAATTTGAAACTAGAAAAATTTTGAACAAAATAATCATAATTGTTAGCTGATGAAAAACTAGAAAAGATTTTCTGAGTGTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAACGGTATCCCGTAGTGTGCATTCATCCCTGCTCTGGATACAGTCAGCTCCCAAATTCCATAAACAACTCCTTTGTAAGTAACCTCCTTTTGACAGGGGGTACTGAGCGGGCTGGCAAGGCN =;8GC91*#==<C=EA.EA/<B=(<<:=HC90'FB5&;B:<GC6(=D=<<==C=C==B<=<<<=;<<GC8.#<<9=FB4%<8EA4%87:<<8=B;C<@8>5=C?*A<&A<&<=49/2A='@;#A<&<A9C=@9B::B:<;=C?+<<;<===<=;C<==<FB0=<=<<<D=9=;;=<=<=<;=FB2FB2C<C<;=FB0<C==;C<D@-<=B:<=C=C;<C=GD7*=;:=HD90'==<<=<=:FB0<<C<;C=C=<! 4, 88, 44, 119

You are getting 1 line-per-record tab-separated output.
The first 3 columns are name, basecalls, and phred quality in ascii format with offset=33.
The last column is read layout:
4 bases - tcag primer
88 bases - first mate
44 bases - 454 mate linker
119 bases - second mate

In case of 454 fragments you will get 2 lengths: for primer and for fragment

**BaCh** · 10-26-2011, 11:28 AM

Originally posted by seenstevo View Post

@ BaCh. Following the MIRA3 guide to preparing 454 data it wants the fastq files for the sequence and quality and the xml file for clipping info. if my SRA data still contains lower case base calls and N's then do I still need to do the clipping as suggested by MIRA3? it seems to be a necessary file.

Having the XML is the best thing, but as long as the sequence data has "clippings" via lowercase/uppercase, MIRA will understand that. Just turn off the MIRA warning that it wants the XML.

B.

**seenstevo** · 10-27-2011, 05:14 AM

That vdb-dump method looks a bit complicated for me but will collar someone who might know how to use it, thanks.

@BaCh. When I viewed the fastq files I converted from the SRA files they lacked the lowercase/uppercase info as everything was simply put in uppercase. Does this mean that the info is lost and is there any way to keep it incase I can't get the SRA files into SFF format?
Cheers

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

SRA files for use in MIRA3 assembler

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News