Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 454 /NCBI SRA & traceinfo

    Are there SFF files for 454 projects in SRA somewhere? For recent submissions I find only fastq, but I am looking for traceinfo xml as well belonging to particular short reads. Somehow I remember xml files were also available earlier?!

    v.

  • #2
    ok re-found again TraceDB (some time since I tried to retrieve such data)
    ftp://ftp.ncbi.nlm.nih.gov/pub/TraceDB

    BUT

    I do not find any similar organisms in TraceDB which correspond to SRR numbers

    v.

    Comment


    • #3
      V.

      The NCBI Trace Archive (TA) and Short Read Archive (now renamed the Sequence Read Archive or SRA) are two separate databases with separate missions. The TA was designed to store traces, sequences and metadata generated by Sanger sequencing, primarily from WGS projects. When next gen sequencing came on the scene the NCBI recognized that the TA design was not a good fit for this new type of massively parallel sequencing thus they designed the SRA. The SRA does not use or have traceinfo.xml files. And while data from 454 experiments is uploaded to the SRA as SFF files, you can not download said SFF files. The SRA only provides the sequence and q-scores available for download in the form of FASTQ files.

      Comment


      • #4
        right, now I remember that TA was down for a while because next-generation data (?) and there was not possible to get data but I did not follow the developments there... Are these fastq traces cleaned for adaptor sequences (454 reads)? Should be known issue that Roche-software does not clean properly ...

        I guess I found some scripts to do adaptor clipping, I'll try soon. Anyway seems that would be much easier to do run clipping on sff, not a problem with your own data though.

        v.

        Comment


        • #5
          The SFF file definition includes the full flowgram and base calls plus left (3') and right (5') clipping points. The 3' end of the read is clipped for the keytag sequence (TCAG). The 3' end of the read has a number of trimming filters applied including one which identifies the 454-B adapter sequence. The downloaded FASTQ is the trimmed sequence only.

          Should be known issue that Roche-software does not clean properly ...
          I'm not sure what you mean by this. I've never seen the 454 filter failing to remove the 454 adapter sequence. I suppose this is possible if the quality of the read was so degraded that it could not recognize the sequence, but in that case the signal/quality based filters would trim off that portion of the read.

          Comment


          • #6
            Originally posted by kmcarr View Post
            The SFF file definition includes the full flowgram and base calls plus left (3') and right (5') clipping points. The 3' end of the read is clipped for the keytag sequence (TCAG). The 3' end of the read has a number of trimming filters applied including one which identifies the 454-B adapter sequence. The downloaded FASTQ is the trimmed sequence only.



            I'm not sure what you mean by this. I've never seen the 454 filter failing to remove the 454 adapter sequence. I suppose this is possible if the quality of the read was so degraded that it could not recognize the sequence, but in that case the signal/quality based filters would trim off that portion of the read.
            Yes, that's why I am looking for SFF files
            Seems Roche's software is not the best in clipping, or at least used to be not the best. Why , I do not know, check for example the discussion in:

            Comment


            • #7
              Originally posted by v_kisand View Post
              Yes, that's why I am looking for SFF files
              Seems Roche's software is not the best in clipping, or at least used to be not the best. Why , I do not know, check for example the discussion in:
              http://www.freelists.org/post/mira_t...aptor-clipping
              The thread you linked to is discussing clipping of adapters introduced for cDNA synthesis, specifically the SMART cDNA construction adapters. The Roche signal processing pipeline, which outputs the SFF files, was never intended to remove cloning/adapter sequences introduced by the end user; it only removes the primer from the 454 library construction which it does just fine. The Roche assembly programs (gsAssembler, gsMapper) can trim other adapter sequences provided by the user as part of their assembly or mapping process. If you are using third party software (like MIRA) then of course you will have to trim any non-Roche adapters yourself.

              Comment


              • #8
                Originally posted by kmcarr View Post
                The thread you linked to is discussing clipping of adapters introduced for cDNA synthesis, specifically the SMART cDNA construction adapters. The Roche signal processing pipeline, which outputs the SFF files, was never intended to remove cloning/adapter sequences introduced by the end user; it only removes the primer from the 454 library construction which it does just fine. The Roche assembly programs (gsAssembler, gsMapper) can trim other adapter sequences provided by the user as part of their assembly or mapping process. If you are using third party software (like MIRA) then of course you will have to trim any non-Roche adapters yourself.
                Thanks for clarifying but what about
                http://chevreux.org/uploads/media/mi...tml#section_27 ?

                maybe this TCTCCGTC is custom adapter

                maybe I am wrong that Roche processing pipeline should not take care of it but then it is sequence provider problem and data in NCBI may contain adaptors, right?

                Why I started this discussion was because downloading quite resent SRR029264 for testing various assemblers as theses data should be quite similar too data I get soon and I see CCGGCCAC in it. Should SFF file contain information about such adaptors? Anyway getting rid of these 8 bp is not a big problem, but as I am not too much into the topic yet, can NCBI short reads contain more of such type of stuff? Do uploaded data need to be cleaned or it is ok for database to have them in without auxiliary information (i.e. traceinfo)?

                v.
                Last edited by v_kisand; 12-28-2009, 02:16 AM.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 08:47 AM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                60 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                59 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                54 views
                0 likes
                Last Post seqadmin  
                Working...
                X