Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SRA format

    Does anybody know about the SRA format specification? Does one exist?
    I just have found API on the NCBI site which can help to read SRA files but I haven't found any information about the format specification.

  • #2
    ask them: [email protected]

    Comment


    • #3
      You don't mean the SRA XML Specification, which is documented?
      This documentation provides application notes for the Sequence Read Archive (SRA) at the National Center for Biotechnology Information.


      Rather I assume you mean the binary SRA files whose first 8 bytes are "NCBI.sra"? If you find a link, or you prompt the NCBI to publish this, could you post the URL here please?

      Comment


      • #4
        Originally posted by maubp View Post
        You don't mean the SRA XML Specification, which is documented?
        This documentation provides application notes for the Sequence Read Archive (SRA) at the National Center for Biotechnology Information.


        Rather I assume you mean the binary SRA files whose first 8 bytes are "NCBI.sra"? If you find a link, or you prompt the NCBI to publish this, could you post the URL here please?
        Yes, I mean the binary file format. Ok, I will write them and will answer you if I find something.

        Comment


        • #5
          You could also check their source code of, say, fastq-dump.c or of some other dumping tool. It worked for us.

          Comment


          • #6
            Originally posted by vadim View Post
            You could also check their source code of, say, fastq-dump.c or of some other dumping tool. It worked for us.
            It is very difficult to understand a format specification from 12 Mbs of code so I want to try the simplest way at the beginning. If nothing is successful I will try to analyse the source code.

            Comment


            • #7
              Originally posted by vadim View Post
              Could you tell me please how did you got this address?

              Comment


              • #8
                Could you please explain what are you planning on doing with SRA data? Most people are happy with fastq/fasta dumps produced by standard tools. For something more complicated you could use the API from the SDK, which in my sense is much easier than understanding the format specs.

                Comment


                • #9
                  Originally posted by vadim View Post
                  Could you please explain what are you planning on doing with SRA data? Most people are happy with fastq/fasta dumps produced by standard tools. For something more complicated you could use the API from the SDK, which in my sense is much easier than understanding the format specs.
                  I'm working in the UGENE project and my next task is integration SRA format supporting into our tool. It is not simple to have just included SRA SDK into UGENE because of our tool is a cross-platform program but this SDK is only UNIX-supportable.

                  Comment


                  • #10
                    I believe SRA SDK can be build for Windows and Mac as well, although I have never actually tried this.
                    Is UGENE written in C++? In which case I would definitely consider re-using NCBI's code.

                    Comment


                    • #11
                      Originally posted by vadim View Post
                      I believe SRA SDK can be build for Windows and Mac as well, although I have never actually tried this.
                      Is UGENE written in C++? In which case I would definitely consider re-using NCBI's code.
                      Yes, it is. C++ with Qt4.

                      Comment


                      • #12
                        Originally posted by maubp View Post
                        If you find a link, or you prompt the NCBI to publish this, could you post the URL here please?
                        Guys from NCBI said me that they don't give this documentation anybody. And if you want to use the SRA format then you need to use their API.

                        Comment


                        • #13
                          Originally posted by Deutsche View Post
                          Guys from NCBI said me that they don't give this documentation anybody. And if you want to use the SRA format then you need to use their API.
                          Well, at least they are clear about it.

                          Hurrah for the principles of openness and sharing in science! </sarcasm>

                          Comment


                          • #14
                            It's perhaps a reasonable stance to take as it gives them flexibility of changing the format without having to keep notifying people, just as long as the API remains constant. However it does rather block interfaces being written by others in alternative languages.

                            The format is almost certainly quite complex though. I remember lots of discussions and to-ing and fro-ing on the best algorithms for compressing traces, qualities and sequences, with different methods for each type. As others have suggested, I'd recommend using their API and if it doesn't port cleanly to Windows then making it fixing that may be an easier task than reimplementing.

                            I'm not sure what licence they use though and whether that would be a hindrance.

                            Comment


                            • #15
                              Originally posted by jkbonfield View Post
                              As others have suggested, I'd recommend using their API and if it doesn't port cleanly to Windows then making it fixing that may be an easier task than reimplementing.

                              I'm not sure what licence they use though and whether that would be a hindrance.
                              It should work under windows, see here:
                              SRA Tools. Contribute to ncbi/sra-tools development by creating an account on GitHub.


                              It is not licensed, I asked them recently about it and they said "no restrictions", whatever that means.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Advancing Precision Medicine for Rare Diseases in Children
                                by seqadmin




                                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                12-16-2024, 07:57 AM
                              • seqadmin
                                Recent Advances in Sequencing Technologies
                                by seqadmin



                                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                                Long-Read Sequencing
                                Long-read sequencing has seen remarkable advancements,...
                                12-02-2024, 01:49 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 12-17-2024, 10:28 AM
                              0 responses
                              27 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-13-2024, 08:24 AM
                              0 responses
                              43 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-12-2024, 07:41 AM
                              0 responses
                              29 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-11-2024, 07:45 AM
                              0 responses
                              42 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X