![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
NCBI SRA database | bair | Bioinformatics | 15 | 11-03-2015 01:06 PM |
data set from NCBI SRA | masylichu | RNA Sequencing | 2 | 10-27-2015 03:27 PM |
Convert fastq from NCBI SRA to fasta and qual? | kmkocot | Bioinformatics | 7 | 10-09-2012 09:15 AM |
454 data analysis & Mapping | Abishai3911 | Bioinformatics | 3 | 07-03-2011 02:27 AM |
ncbi sra | cburger | Bioinformatics | 0 | 02-02-2011 08:04 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Eesti Join Date: Jan 2009
Posts: 37
|
![]()
Are there SFF files for 454 projects in SRA somewhere? For recent submissions I find only fastq, but I am looking for traceinfo xml as well belonging to particular short reads. Somehow I remember xml files were also available earlier?!
v. |
![]() |
![]() |
![]() |
#2 |
Member
Location: Eesti Join Date: Jan 2009
Posts: 37
|
![]()
ok re-found again TraceDB (some time since I tried to retrieve such data)
ftp://ftp.ncbi.nlm.nih.gov/pub/TraceDB BUT I do not find any similar organisms in TraceDB which correspond to SRR numbers v. |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]()
V.
The NCBI Trace Archive (TA) and Short Read Archive (now renamed the Sequence Read Archive or SRA) are two separate databases with separate missions. The TA was designed to store traces, sequences and metadata generated by Sanger sequencing, primarily from WGS projects. When next gen sequencing came on the scene the NCBI recognized that the TA design was not a good fit for this new type of massively parallel sequencing thus they designed the SRA. The SRA does not use or have traceinfo.xml files. And while data from 454 experiments is uploaded to the SRA as SFF files, you can not download said SFF files. The SRA only provides the sequence and q-scores available for download in the form of FASTQ files. |
![]() |
![]() |
![]() |
#4 |
Member
Location: Eesti Join Date: Jan 2009
Posts: 37
|
![]()
right, now I remember that TA was down for a while because next-generation data (?) and there was not possible to get data but I did not follow the developments there... Are these fastq traces cleaned for adaptor sequences (454 reads)? Should be known issue that Roche-software does not clean properly ...
I guess I found some scripts to do adaptor clipping, I'll try soon. Anyway seems that would be much easier to do run clipping on sff, not a problem with your own data though. v. |
![]() |
![]() |
![]() |
#5 | |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]()
The SFF file definition includes the full flowgram and base calls plus left (3') and right (5') clipping points. The 3' end of the read is clipped for the keytag sequence (TCAG). The 3' end of the read has a number of trimming filters applied including one which identifies the 454-B adapter sequence. The downloaded FASTQ is the trimmed sequence only.
Quote:
|
|
![]() |
![]() |
![]() |
#6 | |
Member
Location: Eesti Join Date: Jan 2009
Posts: 37
|
![]() Quote:
Seems Roche's software is not the best in clipping, or at least used to be not the best. Why , I do not know, check for example the discussion in: http://www.freelists.org/post/mira_t...aptor-clipping |
|
![]() |
![]() |
![]() |
#7 | |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#8 | |
Member
Location: Eesti Join Date: Jan 2009
Posts: 37
|
![]() Quote:
http://chevreux.org/uploads/media/mi...tml#section_27 ? maybe this TCTCCGTC is custom adapter maybe I am wrong that Roche processing pipeline should not take care of it but then it is sequence provider problem and data in NCBI may contain adaptors, right? Why I started this discussion was because downloading quite resent SRR029264 for testing various assemblers as theses data should be quite similar too data I get soon and I see CCGGCCAC in it. Should SFF file contain information about such adaptors? Anyway getting rid of these 8 bp is not a big problem, but as I am not too much into the topic yet, can NCBI short reads contain more of such type of stuff? Do uploaded data need to be cleaned or it is ok for database to have them in without auxiliary information (i.e. traceinfo)? v. Last edited by v_kisand; 12-28-2009 at 01:16 AM. |
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|