Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to convert sra-lite format to fastq?

    I am trying to dump sra-lite (sequence read archive) files to fastq format. On the NCBI Sequence Read Archive site it states:

    ...users are asked download runs of interest and execute dumps into the desired format using the SRA SDK toolkit available at http://www.ncbi.nlm.nih.gov/Traces/s...are&s=software

    I downloaded the precompiled toolkit for 64-bit architecture onto my macbookpro running snow leopard and tried to run the fastq-dump executable from the terminal, and get the error message "cannot execute binary file".

    Any guidance would be much appreciated!

  • #2
    Although I can get their CentOS 64bit running, it's realy slow, take about 10hrs to unpack one file. I am also interested to know more about this new SRA-tools.

    Comment


    • #3
      I just noticed they released a new MacOSX beta package.

      I downloaded that one and entered in the terminal $./fastq-dump -A SRP000910 -D SRR070499.lite.sra

      Received error message: "memory exhausted while constructing memory map within file system module - failed to open 'SRR070499.lite.sra'"

      Comment


      • #4
        Originally posted by tbusch0000 View Post
        I downloaded the precompiled toolkit for 64-bit architecture onto my macbookpro running snow leopard and tried to run the fastq-dump executable from the terminal, and get the error message "cannot execute binary file".
        My guess is you download a 64bit Linux binary, which won't work on the Mac.

        Comment


        • #5
          Originally posted by maubp View Post
          My guess is you download a 64bit Linux binary, which won't work on the Mac.
          Thanks, they've only just released the mac binaries. It will execute now, but gives the error message above.

          Comment


          • #6
            Originally posted by tbusch0000 View Post
            Received error message: "memory exhausted while constructing memory map within file system module - failed to open 'SRR070499.lite.sra'"
            How much RAM do you have, and how big is SRR070499.lite.sra?

            Comment


            • #7
              Originally posted by maubp View Post
              How much RAM do you have, and how big is SRR070499.lite.sra?
              I have 6GB RAM and the file is 3.5 GB

              Comment


              • #8
                I have to download and convert files to test Ray, the assembler I am working on (see a thread elsewhere on this forum).

                My take on sratoolkit (I use /software/sratoolkit.2.0b4-2-centos_linux64/):

                It is slow, but it works. My guess is that data are compressed, using something like LIBBZ2 (it is just a guess). That explains the compression ratio as well as the slowness.

                [boiseb01@ls30 MyShortReadArchive]$ ldd /software/sratoolkit.2.0b4-2-centos_linux64/fastq-dump
                linux-vdso.so.1 => (0x00007fff361ff000)
                libdl.so.2 => /lib64/libdl.so.2 (0x00000033f5a00000)
                libz.so.1 => /lib64/libz.so.1 (0x00000033f6600000)
                libbz2.so.1 => /lib64/libbz2.so.1 (0x0000003403e00000)
                libm.so.6 => /lib64/libm.so.6 (0x00000033f5600000)
                libc.so.6 => /lib64/libc.so.6 (0x00000033f5200000)
                /lib64/ld-linux-x86-64.so.2 (0x00000033f4e00000)
                Binaries are linked against libz and libbz2, but the slowness indicates that they probably rely on libbz2.

                Comment


                • #9
                  I'm not 100% sure how memmap works on the Mac, but it sounds like you should have enough RAM to load the whole file into memory (assuming no other memory hungry applications are running at the same time). Can you find a smaller example to test?

                  Comment


                  • #10
                    Hi seb567,

                    How slow are you experiencing with fasta-dump?

                    My experiene is this: my computer is Xeon 2.4G 4core, 12G RAM, fasta-dump takes 600 minutes to finish one sra file.

                    I have tried the newest release and also different sra files. fastq-dump is always very slow.

                    Thanks,

                    Originally posted by seb567 View Post
                    I have to download and convert files to test Ray, the assembler I am working on (see a thread elsewhere on this forum).

                    My take on sratoolkit (I use /software/sratoolkit.2.0b4-2-centos_linux64/):

                    It is slow, but it works. My guess is that data are compressed, using something like LIBBZ2 (it is just a guess). That explains the compression ratio as well as the slowness.



                    Binaries are linked against libz and libbz2, but the slowness indicates that they probably rely on libbz2.

                    Comment


                    • #11
                      About 1-2 hours for a 2 GB sra file, though it is very approximated.

                      I downloaded all sra files for SRA010766, converted them from sra to fastq, then to fastq.gz. The script started yesterday 6 PM (EST).

                      So yours is slower, way slower.

                      [boiseb01@ls30 Illumina-SRX015621]$ ls
                      batch-3 SRR033559_1.fastq.gz SRR033570_1.fastq.gz SRR033581_1.fastq.gz SRR033592_1.fastq.gz SRR033603_1.fastq.gz SRR033614_1.fastq.gz SRR033625_1.fastq.gz
                      download.log SRR033559_2.fastq.gz SRR033570_2.fastq.gz SRR033581_2.fastq.gz SRR033592_2.fastq.gz SRR033603_2.fastq.gz SRR033614_2.fastq.gz SRR033625_2.fastq.gz
                      files.txt SRR033560_1.fastq.gz SRR033571_1.fastq.gz SRR033582_1.fastq.gz SRR033593_1.fastq.gz SRR033604_1.fastq.gz SRR033615_1.fastq.gz SRR033626_1.fastq.gz
                      list-sra.sh SRR033560_2.fastq.gz SRR033571_2.fastq.gz SRR033582_2.fastq.gz SRR033593_2.fastq.gz SRR033604_2.fastq.gz SRR033615_2.fastq.gz SRR033626_2.fastq.gz
                      newFiles SRR033561_1.fastq.gz SRR033572_1.fastq.gz SRR033583_1.fastq.gz SRR033594_1.fastq.gz SRR033605_1.fastq.gz SRR033616_1.fastq.gz SRR033627_1.fastq.gz
                      nohup.out SRR033561_2.fastq.gz SRR033572_2.fastq.gz SRR033583_2.fastq.gz SRR033594_2.fastq.gz SRR033605_2.fastq.gz SRR033616_2.fastq.gz SRR033627_2.fastq.gz
                      README SRR033562_1.fastq.gz SRR033573_1.fastq.gz SRR033584_1.fastq.gz SRR033595_1.fastq.gz SRR033606_1.fastq.gz SRR033617_1.fastq.gz SRR033628_1.fastq
                      SRA010766 SRR033562_2.fastq.gz SRR033573_2.fastq.gz SRR033584_2.fastq.gz SRR033595_2.fastq.gz SRR033606_2.fastq.gz SRR033617_2.fastq.gz SRR033628_2.fastq
                      SRR033552_1.fastq.gz SRR033563_1.fastq.gz SRR033574_1.fastq.gz SRR033585_1.fastq.gz SRR033596_1.fastq.gz SRR033607_1.fastq.gz SRR033618_1.fastq.gz SRR033629_1.fastq
                      SRR033552_2.fastq.gz SRR033563_2.fastq.gz SRR033574_2.fastq.gz SRR033585_2.fastq.gz SRR033596_2.fastq.gz SRR033607_2.fastq.gz SRR033618_2.fastq.gz SRR033629_2.fastq
                      SRR033553_1.fastq.gz SRR033564_1.fastq.gz SRR033575_1.fastq.gz SRR033586_1.fastq.gz SRR033597_1.fastq.gz SRR033608_1.fastq.gz SRR033619_1.fastq.gz SRR033630_1.fastq
                      SRR033553_2.fastq.gz SRR033564_2.fastq.gz SRR033575_2.fastq.gz SRR033586_2.fastq.gz SRR033597_2.fastq.gz SRR033608_2.fastq.gz SRR033619_2.fastq.gz SRR033630_2.fastq
                      SRR033554_1.fastq.gz SRR033565_1.fastq.gz SRR033576_1.fastq.gz SRR033587_1.fastq.gz SRR033598_1.fastq.gz SRR033609_1.fastq.gz SRR033620_1.fastq.gz SRR033631_1.fastq
                      SRR033554_2.fastq.gz SRR033565_2.fastq.gz SRR033576_2.fastq.gz SRR033587_2.fastq.gz SRR033598_2.fastq.gz SRR033609_2.fastq.gz SRR033620_2.fastq.gz SRR033631_2.fastq
                      SRR033555_1.fastq.gz SRR033566_1.fastq.gz SRR033577_1.fastq.gz SRR033588_1.fastq.gz SRR033599_1.fastq.gz SRR033610_1.fastq.gz SRR033621_1.fastq.gz SRR033632_1.fastq
                      SRR033555_2.fastq.gz SRR033566_2.fastq.gz SRR033577_2.fastq.gz SRR033588_2.fastq.gz SRR033599_2.fastq.gz SRR033610_2.fastq.gz SRR033621_2.fastq.gz SRR033632_2.fastq
                      SRR033556_1.fastq.gz SRR033567_1.fastq.gz SRR033578_1.fastq.gz SRR033589_1.fastq.gz SRR033600_1.fastq.gz SRR033611_1.fastq.gz SRR033622_1.fastq.gz SRR033633_1.fastq
                      SRR033556_2.fastq.gz SRR033567_2.fastq.gz SRR033578_2.fastq.gz SRR033589_2.fastq.gz SRR033600_2.fastq.gz SRR033611_2.fastq.gz SRR033622_2.fastq.gz SRR033633_2.fastq
                      SRR033557_1.fastq.gz SRR033568_1.fastq.gz SRR033579_1.fastq.gz SRR033590_1.fastq.gz SRR033601_1.fastq.gz SRR033612_1.fastq.gz SRR033623_1.fastq.gz
                      SRR033557_2.fastq.gz SRR033568_2.fastq.gz SRR033579_2.fastq.gz SRR033590_2.fastq.gz SRR033601_2.fastq.gz SRR033612_2.fastq.gz SRR033623_2.fastq.gz
                      SRR033558_1.fastq.gz SRR033569_1.fastq.gz SRR033580_1.fastq.gz SRR033591_1.fastq.gz SRR033602_1.fastq.gz SRR033613_1.fastq.gz SRR033624_1.fastq.gz
                      SRR033558_2.fastq.gz SRR033569_2.fastq.gz SRR033580_2.fastq.gz SRR033591_2.fastq.gz SRR033602_2.fastq.gz SRR033613_2.fastq.gz SRR033624_2.fastq.gz

                      Comment


                      • #12
                        Thanks for the tips.

                        I got the fastq-dump working on an x-large amazon cloud instance running cent os ami.

                        Comment


                        • #13
                          How to convert fastq format to sra files? is there any perl script for this conversion?

                          Comment


                          • #14
                            I want the table, that converts a byte from the sra file
                            into a sequence of nucleotides



                            SRA toolkit sourcecode has "4na" and "2na"

                            Comment


                            • #15
                              Why don't you either use fastq-dump or just download the gzipped fastq files from ENA (such as this one)?
                              Last edited by dpryan; 08-21-2013, 03:40 AM. Reason: forgot a word

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM
                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin



                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-14-2024, 06:13 AM
                              0 responses
                              34 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-08-2024, 08:03 AM
                              0 responses
                              72 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-07-2024, 08:13 AM
                              0 responses
                              81 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-06-2024, 09:51 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X