Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SRA format

    Does anybody know about the SRA format specification? Does one exist?
    I just have found API on the NCBI site which can help to read SRA files but I haven't found any information about the format specification.

  • #2
    ask them: [email protected]

    Comment


    • #3
      You don't mean the SRA XML Specification, which is documented?
      This documentation provides application notes for the Sequence Read Archive (SRA) at the National Center for Biotechnology Information.


      Rather I assume you mean the binary SRA files whose first 8 bytes are "NCBI.sra"? If you find a link, or you prompt the NCBI to publish this, could you post the URL here please?

      Comment


      • #4
        Originally posted by maubp View Post
        You don't mean the SRA XML Specification, which is documented?
        This documentation provides application notes for the Sequence Read Archive (SRA) at the National Center for Biotechnology Information.


        Rather I assume you mean the binary SRA files whose first 8 bytes are "NCBI.sra"? If you find a link, or you prompt the NCBI to publish this, could you post the URL here please?
        Yes, I mean the binary file format. Ok, I will write them and will answer you if I find something.

        Comment


        • #5
          You could also check their source code of, say, fastq-dump.c or of some other dumping tool. It worked for us.

          Comment


          • #6
            Originally posted by vadim View Post
            You could also check their source code of, say, fastq-dump.c or of some other dumping tool. It worked for us.
            It is very difficult to understand a format specification from 12 Mbs of code so I want to try the simplest way at the beginning. If nothing is successful I will try to analyse the source code.

            Comment


            • #7
              Originally posted by vadim View Post
              Could you tell me please how did you got this address?

              Comment


              • #8
                Could you please explain what are you planning on doing with SRA data? Most people are happy with fastq/fasta dumps produced by standard tools. For something more complicated you could use the API from the SDK, which in my sense is much easier than understanding the format specs.

                Comment


                • #9
                  Originally posted by vadim View Post
                  Could you please explain what are you planning on doing with SRA data? Most people are happy with fastq/fasta dumps produced by standard tools. For something more complicated you could use the API from the SDK, which in my sense is much easier than understanding the format specs.
                  I'm working in the UGENE project and my next task is integration SRA format supporting into our tool. It is not simple to have just included SRA SDK into UGENE because of our tool is a cross-platform program but this SDK is only UNIX-supportable.

                  Comment


                  • #10
                    I believe SRA SDK can be build for Windows and Mac as well, although I have never actually tried this.
                    Is UGENE written in C++? In which case I would definitely consider re-using NCBI's code.

                    Comment


                    • #11
                      Originally posted by vadim View Post
                      I believe SRA SDK can be build for Windows and Mac as well, although I have never actually tried this.
                      Is UGENE written in C++? In which case I would definitely consider re-using NCBI's code.
                      Yes, it is. C++ with Qt4.

                      Comment


                      • #12
                        Originally posted by maubp View Post
                        If you find a link, or you prompt the NCBI to publish this, could you post the URL here please?
                        Guys from NCBI said me that they don't give this documentation anybody. And if you want to use the SRA format then you need to use their API.

                        Comment


                        • #13
                          Originally posted by Deutsche View Post
                          Guys from NCBI said me that they don't give this documentation anybody. And if you want to use the SRA format then you need to use their API.
                          Well, at least they are clear about it.

                          Hurrah for the principles of openness and sharing in science! </sarcasm>

                          Comment


                          • #14
                            It's perhaps a reasonable stance to take as it gives them flexibility of changing the format without having to keep notifying people, just as long as the API remains constant. However it does rather block interfaces being written by others in alternative languages.

                            The format is almost certainly quite complex though. I remember lots of discussions and to-ing and fro-ing on the best algorithms for compressing traces, qualities and sequences, with different methods for each type. As others have suggested, I'd recommend using their API and if it doesn't port cleanly to Windows then making it fixing that may be an easier task than reimplementing.

                            I'm not sure what licence they use though and whether that would be a hindrance.

                            Comment


                            • #15
                              Originally posted by jkbonfield View Post
                              As others have suggested, I'd recommend using their API and if it doesn't port cleanly to Windows then making it fixing that may be an easier task than reimplementing.

                              I'm not sure what licence they use though and whether that would be a hindrance.
                              It should work under windows, see here:
                              SRA Tools. Contribute to ncbi/sra-tools development by creating an account on GitHub.


                              It is not licensed, I asked them recently about it and they said "no restrictions", whatever that means.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM
                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin



                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-14-2024, 06:13 AM
                              0 responses
                              33 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-08-2024, 08:03 AM
                              0 responses
                              72 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-07-2024, 08:13 AM
                              0 responses
                              81 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-06-2024, 09:51 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X