Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • extract forward and reverse reads?

    Hello,
    I have forward and reverse reads in a fastq file from Ion Torrent PGM sequencing data and I would like to know if anyone knows a way that I can extract the forward and reverse reads into two separate files?

    And also, does anyone know of a way to extract certain reads from a fastq file given a list of read IDs?

    Thank you,
    Jennifer

  • #2
    Hi Jennifer,

    You can do that with BBTools:

    reformat.sh in=reads.fq out1=read1.fq out2=read2.fq

    and

    filterbyname.sh in=reads.fq out=filtered.fq names=names.txt


    -Brian

    Comment


    • #3
      Originally posted by JenBarb View Post
      And also, does anyone know of a way to extract certain reads from a fastq file given a list of read IDs?

      Thank you,
      Jennifer
      Heng Li's seqtk "subseq" option: https://github.com/lh3/seqtk

      Comment


      • #4
        Thank you so much for a quick reply! I really appreciate it. I will try it now.
        Jennifer

        Comment


        • #5
          Hi Brian,
          I am looking for the installation instructions on the page you sent and I can't find them. I also was looking for info about the two scripts that you sent. Is there a documentation page that describes what the scripts do and any argument options they take?

          Thank you,
          Jennifer

          Comment


          • #6
            There is no installation required for BBTools. You just uncompress the file. Then you can run the shell scripts (you may need to add execute permissions depending on what OS you are using). If you run the shell script by itself (e.g. $ reformat.sh) it will print information about all possible command line options.

            Here is a thread with information about reformat tool: http://seqanswers.com/forums/showthread.php?t=46174

            Comment


            • #7
              Thank you!!

              Comment


              • #8
                Hello again,
                So I tried the reformat.sh script on my fastq file. I then took each separate output file (fwd and reverse reads) and blasted the reads along a database of interest and am still finding that some reads align in the forward and some align in the reverse direction. My understanding is that the result of this program should have put only forward reads into one file and reverse reads into another file and thus the results of the alignment would be forward only and reverse only given the appropriate file. I am not finding this to be true. Thoughts?

                Comment


                • #9
                  In the original post you had talked about forward/reverse reads in a simple context (as if they are two reads from the two ends of a fragment).

                  reformat.sh will not separate reads that align in opposite orientations. It will only separate reads if they were interleaved in a single file (as long as they came from a single fragment).

                  You will need to parse the output from your alignment program (what program are you using?) to separate reads that align to +/- strands into two files. I am not sure if BBMap can write to separate alignment files based on the strand info.

                  Comment


                  • #10
                    Actually... there is a tool for that, "splitsam.sh", which is not part of the public distribution because I didn't think it would be of use to anyone. I've attached it to this post; just extract it and put it in the folder with the other shellscripts, then run it like this:

                    splitsam.sh mapped.sam forward.sam reverse.sam

                    You can also do that with samtools, by filtering on the 0x10 flag bit. In either case, they have to be mapped first, of course - you cannot determine which read goes to which strand from a fastq file.
                    Attached Files

                    Comment


                    • #11
                      Ask and ye shall receive

                      Roll that into BBMap download Brian!

                      Comment


                      • #12
                        Thank you so much for your help!

                        Comment


                        • #13
                          Originally posted by GenoMax View Post
                          Roll that into BBMap download Brian!
                          OK... I don't like to release incomplete things so I made it faster, added some features, and then rolled it into the download.

                          Originally posted by JenBarb View Post
                          Thank you so much for your help!
                          You're welcome!

                          Comment


                          • #14
                            Thank you again, Brian and GenoMax for all of your help.

                            I now am trying the filterbyname script and it does not seem to be pulling out only those reads that match a particular read id found in my names.txt file. Is there something I am missing?

                            sh /data/barbj/bbmap/filterbyname.sh in=003.fastq out=filter003.fq names=names003.txt

                            Comment


                            • #15
                              By default, "filterbyname" discards reads with names in your name list, and keeps the rest. To include them and discard the others, do this:

                              filterbyname.sh in=003.fastq out=filter003.fq names=names003.txt include=t

                              Sorry for the confusion. I guess that default is kind of odd.
                              Last edited by Brian Bushnell; 02-09-2015, 12:02 PM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X