Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to convert sra-lite format to fastq?

    I am trying to dump sra-lite (sequence read archive) files to fastq format. On the NCBI Sequence Read Archive site it states:

    ...users are asked download runs of interest and execute dumps into the desired format using the SRA SDK toolkit available at http://www.ncbi.nlm.nih.gov/Traces/s...are&s=software

    I downloaded the precompiled toolkit for 64-bit architecture onto my macbookpro running snow leopard and tried to run the fastq-dump executable from the terminal, and get the error message "cannot execute binary file".

    Any guidance would be much appreciated!

  • #2
    Although I can get their CentOS 64bit running, it's realy slow, take about 10hrs to unpack one file. I am also interested to know more about this new SRA-tools.

    Comment


    • #3
      I just noticed they released a new MacOSX beta package.

      I downloaded that one and entered in the terminal $./fastq-dump -A SRP000910 -D SRR070499.lite.sra

      Received error message: "memory exhausted while constructing memory map within file system module - failed to open 'SRR070499.lite.sra'"

      Comment


      • #4
        Originally posted by tbusch0000 View Post
        I downloaded the precompiled toolkit for 64-bit architecture onto my macbookpro running snow leopard and tried to run the fastq-dump executable from the terminal, and get the error message "cannot execute binary file".
        My guess is you download a 64bit Linux binary, which won't work on the Mac.

        Comment


        • #5
          Originally posted by maubp View Post
          My guess is you download a 64bit Linux binary, which won't work on the Mac.
          Thanks, they've only just released the mac binaries. It will execute now, but gives the error message above.

          Comment


          • #6
            Originally posted by tbusch0000 View Post
            Received error message: "memory exhausted while constructing memory map within file system module - failed to open 'SRR070499.lite.sra'"
            How much RAM do you have, and how big is SRR070499.lite.sra?

            Comment


            • #7
              Originally posted by maubp View Post
              How much RAM do you have, and how big is SRR070499.lite.sra?
              I have 6GB RAM and the file is 3.5 GB

              Comment


              • #8
                I have to download and convert files to test Ray, the assembler I am working on (see a thread elsewhere on this forum).

                My take on sratoolkit (I use /software/sratoolkit.2.0b4-2-centos_linux64/):

                It is slow, but it works. My guess is that data are compressed, using something like LIBBZ2 (it is just a guess). That explains the compression ratio as well as the slowness.

                [boiseb01@ls30 MyShortReadArchive]$ ldd /software/sratoolkit.2.0b4-2-centos_linux64/fastq-dump
                linux-vdso.so.1 => (0x00007fff361ff000)
                libdl.so.2 => /lib64/libdl.so.2 (0x00000033f5a00000)
                libz.so.1 => /lib64/libz.so.1 (0x00000033f6600000)
                libbz2.so.1 => /lib64/libbz2.so.1 (0x0000003403e00000)
                libm.so.6 => /lib64/libm.so.6 (0x00000033f5600000)
                libc.so.6 => /lib64/libc.so.6 (0x00000033f5200000)
                /lib64/ld-linux-x86-64.so.2 (0x00000033f4e00000)
                Binaries are linked against libz and libbz2, but the slowness indicates that they probably rely on libbz2.

                Comment


                • #9
                  I'm not 100% sure how memmap works on the Mac, but it sounds like you should have enough RAM to load the whole file into memory (assuming no other memory hungry applications are running at the same time). Can you find a smaller example to test?

                  Comment


                  • #10
                    Hi seb567,

                    How slow are you experiencing with fasta-dump?

                    My experiene is this: my computer is Xeon 2.4G 4core, 12G RAM, fasta-dump takes 600 minutes to finish one sra file.

                    I have tried the newest release and also different sra files. fastq-dump is always very slow.

                    Thanks,

                    Originally posted by seb567 View Post
                    I have to download and convert files to test Ray, the assembler I am working on (see a thread elsewhere on this forum).

                    My take on sratoolkit (I use /software/sratoolkit.2.0b4-2-centos_linux64/):

                    It is slow, but it works. My guess is that data are compressed, using something like LIBBZ2 (it is just a guess). That explains the compression ratio as well as the slowness.



                    Binaries are linked against libz and libbz2, but the slowness indicates that they probably rely on libbz2.

                    Comment


                    • #11
                      About 1-2 hours for a 2 GB sra file, though it is very approximated.

                      I downloaded all sra files for SRA010766, converted them from sra to fastq, then to fastq.gz. The script started yesterday 6 PM (EST).

                      So yours is slower, way slower.

                      [boiseb01@ls30 Illumina-SRX015621]$ ls
                      batch-3 SRR033559_1.fastq.gz SRR033570_1.fastq.gz SRR033581_1.fastq.gz SRR033592_1.fastq.gz SRR033603_1.fastq.gz SRR033614_1.fastq.gz SRR033625_1.fastq.gz
                      download.log SRR033559_2.fastq.gz SRR033570_2.fastq.gz SRR033581_2.fastq.gz SRR033592_2.fastq.gz SRR033603_2.fastq.gz SRR033614_2.fastq.gz SRR033625_2.fastq.gz
                      files.txt SRR033560_1.fastq.gz SRR033571_1.fastq.gz SRR033582_1.fastq.gz SRR033593_1.fastq.gz SRR033604_1.fastq.gz SRR033615_1.fastq.gz SRR033626_1.fastq.gz
                      list-sra.sh SRR033560_2.fastq.gz SRR033571_2.fastq.gz SRR033582_2.fastq.gz SRR033593_2.fastq.gz SRR033604_2.fastq.gz SRR033615_2.fastq.gz SRR033626_2.fastq.gz
                      newFiles SRR033561_1.fastq.gz SRR033572_1.fastq.gz SRR033583_1.fastq.gz SRR033594_1.fastq.gz SRR033605_1.fastq.gz SRR033616_1.fastq.gz SRR033627_1.fastq.gz
                      nohup.out SRR033561_2.fastq.gz SRR033572_2.fastq.gz SRR033583_2.fastq.gz SRR033594_2.fastq.gz SRR033605_2.fastq.gz SRR033616_2.fastq.gz SRR033627_2.fastq.gz
                      README SRR033562_1.fastq.gz SRR033573_1.fastq.gz SRR033584_1.fastq.gz SRR033595_1.fastq.gz SRR033606_1.fastq.gz SRR033617_1.fastq.gz SRR033628_1.fastq
                      SRA010766 SRR033562_2.fastq.gz SRR033573_2.fastq.gz SRR033584_2.fastq.gz SRR033595_2.fastq.gz SRR033606_2.fastq.gz SRR033617_2.fastq.gz SRR033628_2.fastq
                      SRR033552_1.fastq.gz SRR033563_1.fastq.gz SRR033574_1.fastq.gz SRR033585_1.fastq.gz SRR033596_1.fastq.gz SRR033607_1.fastq.gz SRR033618_1.fastq.gz SRR033629_1.fastq
                      SRR033552_2.fastq.gz SRR033563_2.fastq.gz SRR033574_2.fastq.gz SRR033585_2.fastq.gz SRR033596_2.fastq.gz SRR033607_2.fastq.gz SRR033618_2.fastq.gz SRR033629_2.fastq
                      SRR033553_1.fastq.gz SRR033564_1.fastq.gz SRR033575_1.fastq.gz SRR033586_1.fastq.gz SRR033597_1.fastq.gz SRR033608_1.fastq.gz SRR033619_1.fastq.gz SRR033630_1.fastq
                      SRR033553_2.fastq.gz SRR033564_2.fastq.gz SRR033575_2.fastq.gz SRR033586_2.fastq.gz SRR033597_2.fastq.gz SRR033608_2.fastq.gz SRR033619_2.fastq.gz SRR033630_2.fastq
                      SRR033554_1.fastq.gz SRR033565_1.fastq.gz SRR033576_1.fastq.gz SRR033587_1.fastq.gz SRR033598_1.fastq.gz SRR033609_1.fastq.gz SRR033620_1.fastq.gz SRR033631_1.fastq
                      SRR033554_2.fastq.gz SRR033565_2.fastq.gz SRR033576_2.fastq.gz SRR033587_2.fastq.gz SRR033598_2.fastq.gz SRR033609_2.fastq.gz SRR033620_2.fastq.gz SRR033631_2.fastq
                      SRR033555_1.fastq.gz SRR033566_1.fastq.gz SRR033577_1.fastq.gz SRR033588_1.fastq.gz SRR033599_1.fastq.gz SRR033610_1.fastq.gz SRR033621_1.fastq.gz SRR033632_1.fastq
                      SRR033555_2.fastq.gz SRR033566_2.fastq.gz SRR033577_2.fastq.gz SRR033588_2.fastq.gz SRR033599_2.fastq.gz SRR033610_2.fastq.gz SRR033621_2.fastq.gz SRR033632_2.fastq
                      SRR033556_1.fastq.gz SRR033567_1.fastq.gz SRR033578_1.fastq.gz SRR033589_1.fastq.gz SRR033600_1.fastq.gz SRR033611_1.fastq.gz SRR033622_1.fastq.gz SRR033633_1.fastq
                      SRR033556_2.fastq.gz SRR033567_2.fastq.gz SRR033578_2.fastq.gz SRR033589_2.fastq.gz SRR033600_2.fastq.gz SRR033611_2.fastq.gz SRR033622_2.fastq.gz SRR033633_2.fastq
                      SRR033557_1.fastq.gz SRR033568_1.fastq.gz SRR033579_1.fastq.gz SRR033590_1.fastq.gz SRR033601_1.fastq.gz SRR033612_1.fastq.gz SRR033623_1.fastq.gz
                      SRR033557_2.fastq.gz SRR033568_2.fastq.gz SRR033579_2.fastq.gz SRR033590_2.fastq.gz SRR033601_2.fastq.gz SRR033612_2.fastq.gz SRR033623_2.fastq.gz
                      SRR033558_1.fastq.gz SRR033569_1.fastq.gz SRR033580_1.fastq.gz SRR033591_1.fastq.gz SRR033602_1.fastq.gz SRR033613_1.fastq.gz SRR033624_1.fastq.gz
                      SRR033558_2.fastq.gz SRR033569_2.fastq.gz SRR033580_2.fastq.gz SRR033591_2.fastq.gz SRR033602_2.fastq.gz SRR033613_2.fastq.gz SRR033624_2.fastq.gz

                      Comment


                      • #12
                        Thanks for the tips.

                        I got the fastq-dump working on an x-large amazon cloud instance running cent os ami.

                        Comment


                        • #13
                          How to convert fastq format to sra files? is there any perl script for this conversion?

                          Comment


                          • #14
                            I want the table, that converts a byte from the sra file
                            into a sequence of nucleotides



                            SRA toolkit sourcecode has "4na" and "2na"

                            Comment


                            • #15
                              Why don't you either use fastq-dump or just download the gzipped fastq files from ENA (such as this one)?
                              Last edited by dpryan; 08-21-2013, 03:40 AM. Reason: forgot a word

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Advancing Precision Medicine for Rare Diseases in Children
                                by seqadmin




                                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                12-16-2024, 07:57 AM
                              • seqadmin
                                Recent Advances in Sequencing Technologies
                                by seqadmin



                                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                                Long-Read Sequencing
                                Long-read sequencing has seen remarkable advancements,...
                                12-02-2024, 01:49 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 12-17-2024, 10:28 AM
                              0 responses
                              33 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-13-2024, 08:24 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-12-2024, 07:41 AM
                              0 responses
                              34 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-11-2024, 07:45 AM
                              0 responses
                              46 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X