Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fastq-dump for dummies

    Can someone provide a dummies guide to fastq-dump? I mean a really dumb guide: download here, install here, open it by doing this, do this to an sra to output a fastq. Initially words would work but a video format would be highly valuable.

    I see a SEQanswer youtube channel here. Full of short videos for the SRA toolkit and beyond. This could really help the large number of biologist that will begin using dbGAP datasets to guide their research.

    Am I the only one that see a simple visual userguide as a useful resource?

    Trust me I have read every thing google can find on fastq-dump and still can't get it to work on an sra. I am working in the windows environment. I expect most beginners will be in windows.
    Last edited by UpsetNotMad Scientist; 10-26-2012, 09:52 AM.

  • #2
    a) do you have access to a Linux server ? If not, it shouldn't be too tricky to ask for a user account.

    b) type the following:

    fastq-dump mySRA.sra

    On windows (guessing here), open a command shell, copy the _SRA and fastq-dump to the same directory.

    # cd to the directory
    cd c:\temp

    #run program
    fastq-dump mySRA.sra

    Hope that helps

    Comment


    • #3
      You might also want to look into getting your data from the ENA rather than GEO. They already do the extraction of files from the sra dumps and you can download them individually. They mirror all GEO data so you can just search with the GEO accession you want.

      Comment


      • #4
        Originally posted by colindaven View Post
        fastq-dump mySRA.sra
        The only thing I'd add is that if your data has more than one read per sample (ie paired end), then this will produce a single file with the two reads concatenated together. If you want separate files for the different reads you'll need to run:

        fastq-dump --split-files mySRA.sra

        ..which should really have been the default behaviour.

        Comment


        • #5
          Since the OP was asking for directions on how to use SRA toolkit in a windows environment here goes:

          1. Download the right software distribution (you should be using 64-bit windows with the file sizes involved .. if not, it is time to switch).

          http://www.ncbi.nlm.nih.gov/Traces/s...?view=software (for 64-bit windows: http://ftp-private.ncbi.nlm.nih.gov/...1.16-win64.zip)

          2. Extract the toolkit software folder and place it into a suitable location. e.g. c:\

          3. Open a terminal window ("start" --> type "cmd" in the search box --> press enter). This should open a terminal window. Generally this will put you in your "home directory" (e.g. c:\Users\your_user_name).

          4. In the terminal window.

          Code:
          cd c:\my_sra_files (replace with the right path for your SRA files)
          dir *.sra (verify that directory contains the .sra files)
          c:\sratoolkit.2.1.10-win64\bin\fastq-dump.exe --split-files filename.sra
          6. Be patient. The files are large and it will take some time (5 -10 min) to complete the extraction. make sure you have enough space available on the disk where you are extracting the files. The above command should extract the "fastq" files in the same directory where your .sra files are.

          7. Repeat for additional files as needed.
          Last edited by GenoMax; 10-26-2012, 04:15 AM. Reason: simplified directions

          Comment


          • #6
            Originally posted by GenoMax View Post
            Since the OP was asking for directions on how to use SRA toolkit in a windows environment here goes:

            1. Download the right software distribution (you should be using 64-bit windows with the file sizes involved .. if not, it is time to switch).

            http://www.ncbi.nlm.nih.gov/Traces/s...?view=software (for 64-bit windows: http://ftp-private.ncbi.nlm.nih.gov/...1.16-win64.zip)

            2. Extract the toolkit software folder and place it into a suitable location. e.g. c:\

            3. Open a terminal window ("start" --> type "cmd" in the search box --> press enter). This should open a terminal window. Generally this will put you in your "home directory" (e.g. c:\Users\your_user_name).

            4. In the terminal window.

            Code:
            cd c:\my_sra_files (replace with the right path for your SRA files)
            dir *.sra (verify that directory contains the .sra files)
            c:\sratoolkit.2.1.10-win64\bin\fastq-dump.exe --split-files filename.sra
            6. Be patient. The files are large and it will take some time (5 -10 min) to complete the extraction. make sure you have enough space available on the disk where you are extracting the files. The above command should extract the "fastq" files in the same directory where your .sra files are.

            7. Repeat for additional files as needed.
            Awesome. THANKS! Already more useful than any guide online.

            I unfortunately instantly get:

            The procedure entry point GetErrorMode could not be located in the dynamic link library KERNEL32.dll.

            When I run fastq-dump as directed with .sra file in the same directory and the exact cmd you said (adjusting for the directories). I am on a 32 bit windows system running XP (I know, really dated).

            Any suggestions?

            While I wait for replies. I am going to try the same thing on a 64 bit system with the 64 bit toolkit.

            Also, I looked at ENA, however this SRA is restricted access so Fastq is not avaliable.

            Start rant: It's nice that the Europeans don't put a burden on the less-equipped end-user. I don't get the logic of SRA anyway, especially encrypted SRA. Lets say a WGS experiment is 50G (low estimate, too). Decrypt give 100G total (old copy still there). Then make fastq gives another ~100G more. This data is now 200G of storage. While the original SRA saved 50% space? WTF? To save 50% (~50G) of space you cost the end-user 400% more resources? This does not include the hours of lost productivity due to the reformatting problems (like mine). End rant.
            Last edited by UpsetNotMad Scientist; 10-26-2012, 01:01 PM.

            Comment


            • #7
              Got it to work on a single SRA in Windows 7 64-bit with 64 bit toolkit. I was previously using XP with 32-bit toolkit. Now how do I get it do this on a full study instead of a run where the sra files are in a crap load of folders?

              BTW, GenoMax I could hug you right now. Not in an awkward way either.
              Last edited by UpsetNotMad Scientist; 10-26-2012, 10:08 AM.

              Comment


              • #8
                Originally posted by UpsetNotMad Scientist View Post
                I unfortunately instantly get:

                The procedure entry point GetErrorMode could not be located in the dynamic link library KERNEL32.dll.

                When I run fastq-dump as directed with .sra file in the same directory and the exact cmd you said (adjusting for the directories). I am on a 32 bit windows system running XP (I know, really dated).

                Any suggestions?
                Do you have service pack 3 for Windows XP installed? If not you may need to bite the bullet and install that. Apparently the error you mentioned may be related to absence of service pack 3 for XP.

                You are bound to run into some problem or the other using 32-bit windows. If you do have access to a 64-bit machine (and NTFS formatted disks that can handle large single files) you may want to switch.

                Comment


                • #9
                  Originally posted by UpsetNotMad Scientist View Post
                  Got it to work on a single SRA in Windows 7 64-bit with 64 bit toolkit. I was previously using XP with 32-bit toolkit. Now how do I get it do this on a full study instead of a run where the sra files are in a crap load of folders?
                  Not sure if this would work. Give it a try
                  Code:
                  fastq-dump --split-files *.sra

                  Comment


                  • #10
                    Originally posted by GenoMax View Post
                    Do you have service pack 3 for Windows XP installed? If not you may need to bite the bullet and install that. Apparently the error you mentioned may be related to absence of service pack 3 for XP.

                    You are bound to run into some problem or the other using 32-bit windows. If you do have access to a 64-bit machine (and NTFS formatted disks that can handle large single files) you may want to switch.
                    Yup, Service Pack 3 installed. I will just use the 64-bit system, Windows 7, NTFS.

                    Comment


                    • #11
                      Originally posted by GenoMax View Post
                      Not sure if this would work. Give it a try
                      Code:
                      fastq-dump --split-files *.sra
                      fastq-dump --split-files *.sra doesn't work for the same sra files even in the same directory let alone a directory of folder.

                      the decrypt.bin can decrypt all the files in a series of directories but fastq-dump can't convert?
                      Last edited by UpsetNotMad Scientist; 10-26-2012, 03:13 PM.

                      Comment


                      • #12
                        Program input options are decided by the program authors and not all programs accept input the same way. The following does seem to work in a specific directory.

                        Code:
                        fastq-dump --split-files file1.sra file2.sra file3.sra
                        Out of curiosity how many folders/files are you working with. You may be able to use batch file processing but if you are going to do that it may be simpler to do it in a UNIX environment with shell scripts.

                        Originally posted by UpsetNotMad Scientist View Post
                        fastq-dump --split-files *.sra doesn't work for the same sra files even in the same directory let alone a directory of folder.

                        the decrypt.bin can decrypt all the files in a series of directories but fastq-dump can't convert?

                        Comment


                        • #13
                          Originally posted by GenoMax View Post

                          Code:
                          fastq-dump --split-files file1.sra file2.sra file3.sra
                          Out of curiosity how many folders/files are you working with.
                          Thanks for your help GenoMax!

                          It is ~1400 runs. Here is an example: Click here

                          The folders are setup like this: SRP1\SRS1\SRX1\SRR1\SRA1.sra
                          Number of folders in each directory (not all same): 1\1\18\10\56\SRA1.sra

                          Putting in individual .sra in line of code seems excessive. The sra folder structure was created by the people who made fastq-dump and released it for windows. Either I am doing something wrong or there has to be a reasonable solution. Is there are way to do file1-1400.sra?
                          Last edited by UpsetNotMad Scientist; 11-05-2012, 02:59 PM.

                          Comment


                          • #14
                            What if PE/SE status is unknown

                            I'm looking at a paper that doesn't specify what length of sequence they generated or whether it is paired end or single end.

                            What happens if you use --split-files and a .sra file that is really single end?

                            Comment


                            • #15
                              Originally posted by [email protected] View Post
                              I'm looking at a paper that doesn't specify what length of sequence they generated or whether it is paired end or single end.

                              What happens if you use --split-files and a .sra file that is really single end?
                              It will work fine. You'll just end up with an extra _1 on the end of your file names.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              30 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X