Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SRA format

    Does anybody know about the SRA format specification? Does one exist?
    I just have found API on the NCBI site which can help to read SRA files but I haven't found any information about the format specification.

  • #2
    ask them: [email protected]

    Comment


    • #3
      You don't mean the SRA XML Specification, which is documented?
      This documentation provides application notes for the Sequence Read Archive (SRA) at the National Center for Biotechnology Information.


      Rather I assume you mean the binary SRA files whose first 8 bytes are "NCBI.sra"? If you find a link, or you prompt the NCBI to publish this, could you post the URL here please?

      Comment


      • #4
        Originally posted by maubp View Post
        You don't mean the SRA XML Specification, which is documented?
        This documentation provides application notes for the Sequence Read Archive (SRA) at the National Center for Biotechnology Information.


        Rather I assume you mean the binary SRA files whose first 8 bytes are "NCBI.sra"? If you find a link, or you prompt the NCBI to publish this, could you post the URL here please?
        Yes, I mean the binary file format. Ok, I will write them and will answer you if I find something.

        Comment


        • #5
          You could also check their source code of, say, fastq-dump.c or of some other dumping tool. It worked for us.

          Comment


          • #6
            Originally posted by vadim View Post
            You could also check their source code of, say, fastq-dump.c or of some other dumping tool. It worked for us.
            It is very difficult to understand a format specification from 12 Mbs of code so I want to try the simplest way at the beginning. If nothing is successful I will try to analyse the source code.

            Comment


            • #7
              Originally posted by vadim View Post
              Could you tell me please how did you got this address?

              Comment


              • #8
                Could you please explain what are you planning on doing with SRA data? Most people are happy with fastq/fasta dumps produced by standard tools. For something more complicated you could use the API from the SDK, which in my sense is much easier than understanding the format specs.

                Comment


                • #9
                  Originally posted by vadim View Post
                  Could you please explain what are you planning on doing with SRA data? Most people are happy with fastq/fasta dumps produced by standard tools. For something more complicated you could use the API from the SDK, which in my sense is much easier than understanding the format specs.
                  I'm working in the UGENE project and my next task is integration SRA format supporting into our tool. It is not simple to have just included SRA SDK into UGENE because of our tool is a cross-platform program but this SDK is only UNIX-supportable.

                  Comment


                  • #10
                    I believe SRA SDK can be build for Windows and Mac as well, although I have never actually tried this.
                    Is UGENE written in C++? In which case I would definitely consider re-using NCBI's code.

                    Comment


                    • #11
                      Originally posted by vadim View Post
                      I believe SRA SDK can be build for Windows and Mac as well, although I have never actually tried this.
                      Is UGENE written in C++? In which case I would definitely consider re-using NCBI's code.
                      Yes, it is. C++ with Qt4.

                      Comment


                      • #12
                        Originally posted by maubp View Post
                        If you find a link, or you prompt the NCBI to publish this, could you post the URL here please?
                        Guys from NCBI said me that they don't give this documentation anybody. And if you want to use the SRA format then you need to use their API.

                        Comment


                        • #13
                          Originally posted by Deutsche View Post
                          Guys from NCBI said me that they don't give this documentation anybody. And if you want to use the SRA format then you need to use their API.
                          Well, at least they are clear about it.

                          Hurrah for the principles of openness and sharing in science! </sarcasm>

                          Comment


                          • #14
                            It's perhaps a reasonable stance to take as it gives them flexibility of changing the format without having to keep notifying people, just as long as the API remains constant. However it does rather block interfaces being written by others in alternative languages.

                            The format is almost certainly quite complex though. I remember lots of discussions and to-ing and fro-ing on the best algorithms for compressing traces, qualities and sequences, with different methods for each type. As others have suggested, I'd recommend using their API and if it doesn't port cleanly to Windows then making it fixing that may be an easier task than reimplementing.

                            I'm not sure what licence they use though and whether that would be a hindrance.

                            Comment


                            • #15
                              Originally posted by jkbonfield View Post
                              As others have suggested, I'd recommend using their API and if it doesn't port cleanly to Windows then making it fixing that may be an easier task than reimplementing.

                              I'm not sure what licence they use though and whether that would be a hindrance.
                              It should work under windows, see here:
                              SRA Tools. Contribute to ncbi/sra-tools development by creating an account on GitHub.


                              It is not licensed, I asked them recently about it and they said "no restrictions", whatever that means.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              31 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X