Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 454 /NCBI SRA & traceinfo

    Are there SFF files for 454 projects in SRA somewhere? For recent submissions I find only fastq, but I am looking for traceinfo xml as well belonging to particular short reads. Somehow I remember xml files were also available earlier?!

    v.

  • #2
    ok re-found again TraceDB (some time since I tried to retrieve such data)
    ftp://ftp.ncbi.nlm.nih.gov/pub/TraceDB

    BUT

    I do not find any similar organisms in TraceDB which correspond to SRR numbers

    v.

    Comment


    • #3
      V.

      The NCBI Trace Archive (TA) and Short Read Archive (now renamed the Sequence Read Archive or SRA) are two separate databases with separate missions. The TA was designed to store traces, sequences and metadata generated by Sanger sequencing, primarily from WGS projects. When next gen sequencing came on the scene the NCBI recognized that the TA design was not a good fit for this new type of massively parallel sequencing thus they designed the SRA. The SRA does not use or have traceinfo.xml files. And while data from 454 experiments is uploaded to the SRA as SFF files, you can not download said SFF files. The SRA only provides the sequence and q-scores available for download in the form of FASTQ files.

      Comment


      • #4
        right, now I remember that TA was down for a while because next-generation data (?) and there was not possible to get data but I did not follow the developments there... Are these fastq traces cleaned for adaptor sequences (454 reads)? Should be known issue that Roche-software does not clean properly ...

        I guess I found some scripts to do adaptor clipping, I'll try soon. Anyway seems that would be much easier to do run clipping on sff, not a problem with your own data though.

        v.

        Comment


        • #5
          The SFF file definition includes the full flowgram and base calls plus left (3') and right (5') clipping points. The 3' end of the read is clipped for the keytag sequence (TCAG). The 3' end of the read has a number of trimming filters applied including one which identifies the 454-B adapter sequence. The downloaded FASTQ is the trimmed sequence only.

          Should be known issue that Roche-software does not clean properly ...
          I'm not sure what you mean by this. I've never seen the 454 filter failing to remove the 454 adapter sequence. I suppose this is possible if the quality of the read was so degraded that it could not recognize the sequence, but in that case the signal/quality based filters would trim off that portion of the read.

          Comment


          • #6
            Originally posted by kmcarr View Post
            The SFF file definition includes the full flowgram and base calls plus left (3') and right (5') clipping points. The 3' end of the read is clipped for the keytag sequence (TCAG). The 3' end of the read has a number of trimming filters applied including one which identifies the 454-B adapter sequence. The downloaded FASTQ is the trimmed sequence only.



            I'm not sure what you mean by this. I've never seen the 454 filter failing to remove the 454 adapter sequence. I suppose this is possible if the quality of the read was so degraded that it could not recognize the sequence, but in that case the signal/quality based filters would trim off that portion of the read.
            Yes, that's why I am looking for SFF files
            Seems Roche's software is not the best in clipping, or at least used to be not the best. Why , I do not know, check for example the discussion in:

            Comment


            • #7
              Originally posted by v_kisand View Post
              Yes, that's why I am looking for SFF files
              Seems Roche's software is not the best in clipping, or at least used to be not the best. Why , I do not know, check for example the discussion in:
              http://www.freelists.org/post/mira_t...aptor-clipping
              The thread you linked to is discussing clipping of adapters introduced for cDNA synthesis, specifically the SMART cDNA construction adapters. The Roche signal processing pipeline, which outputs the SFF files, was never intended to remove cloning/adapter sequences introduced by the end user; it only removes the primer from the 454 library construction which it does just fine. The Roche assembly programs (gsAssembler, gsMapper) can trim other adapter sequences provided by the user as part of their assembly or mapping process. If you are using third party software (like MIRA) then of course you will have to trim any non-Roche adapters yourself.

              Comment


              • #8
                Originally posted by kmcarr View Post
                The thread you linked to is discussing clipping of adapters introduced for cDNA synthesis, specifically the SMART cDNA construction adapters. The Roche signal processing pipeline, which outputs the SFF files, was never intended to remove cloning/adapter sequences introduced by the end user; it only removes the primer from the 454 library construction which it does just fine. The Roche assembly programs (gsAssembler, gsMapper) can trim other adapter sequences provided by the user as part of their assembly or mapping process. If you are using third party software (like MIRA) then of course you will have to trim any non-Roche adapters yourself.
                Thanks for clarifying but what about
                http://chevreux.org/uploads/media/mi...tml#section_27 ?

                maybe this TCTCCGTC is custom adapter

                maybe I am wrong that Roche processing pipeline should not take care of it but then it is sequence provider problem and data in NCBI may contain adaptors, right?

                Why I started this discussion was because downloading quite resent SRR029264 for testing various assemblers as theses data should be quite similar too data I get soon and I see CCGGCCAC in it. Should SFF file contain information about such adaptors? Anyway getting rid of these 8 bp is not a big problem, but as I am not too much into the topic yet, can NCBI short reads contain more of such type of stuff? Do uploaded data need to be cleaned or it is ok for database to have them in without auxiliary information (i.e. traceinfo)?

                v.
                Last edited by v_kisand; 12-28-2009, 02:16 AM.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                51 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                67 views
                0 likes
                Last Post seqadmin  
                Working...
                X