Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • extract forward and reverse reads?

    Hello,
    I have forward and reverse reads in a fastq file from Ion Torrent PGM sequencing data and I would like to know if anyone knows a way that I can extract the forward and reverse reads into two separate files?

    And also, does anyone know of a way to extract certain reads from a fastq file given a list of read IDs?

    Thank you,
    Jennifer

  • #2
    Hi Jennifer,

    You can do that with BBTools:

    reformat.sh in=reads.fq out1=read1.fq out2=read2.fq

    and

    filterbyname.sh in=reads.fq out=filtered.fq names=names.txt


    -Brian

    Comment


    • #3
      Originally posted by JenBarb View Post
      And also, does anyone know of a way to extract certain reads from a fastq file given a list of read IDs?

      Thank you,
      Jennifer
      Heng Li's seqtk "subseq" option: https://github.com/lh3/seqtk

      Comment


      • #4
        Thank you so much for a quick reply! I really appreciate it. I will try it now.
        Jennifer

        Comment


        • #5
          Hi Brian,
          I am looking for the installation instructions on the page you sent and I can't find them. I also was looking for info about the two scripts that you sent. Is there a documentation page that describes what the scripts do and any argument options they take?

          Thank you,
          Jennifer

          Comment


          • #6
            There is no installation required for BBTools. You just uncompress the file. Then you can run the shell scripts (you may need to add execute permissions depending on what OS you are using). If you run the shell script by itself (e.g. $ reformat.sh) it will print information about all possible command line options.

            Here is a thread with information about reformat tool: http://seqanswers.com/forums/showthread.php?t=46174

            Comment


            • #7
              Thank you!!

              Comment


              • #8
                Hello again,
                So I tried the reformat.sh script on my fastq file. I then took each separate output file (fwd and reverse reads) and blasted the reads along a database of interest and am still finding that some reads align in the forward and some align in the reverse direction. My understanding is that the result of this program should have put only forward reads into one file and reverse reads into another file and thus the results of the alignment would be forward only and reverse only given the appropriate file. I am not finding this to be true. Thoughts?

                Comment


                • #9
                  In the original post you had talked about forward/reverse reads in a simple context (as if they are two reads from the two ends of a fragment).

                  reformat.sh will not separate reads that align in opposite orientations. It will only separate reads if they were interleaved in a single file (as long as they came from a single fragment).

                  You will need to parse the output from your alignment program (what program are you using?) to separate reads that align to +/- strands into two files. I am not sure if BBMap can write to separate alignment files based on the strand info.

                  Comment


                  • #10
                    Actually... there is a tool for that, "splitsam.sh", which is not part of the public distribution because I didn't think it would be of use to anyone. I've attached it to this post; just extract it and put it in the folder with the other shellscripts, then run it like this:

                    splitsam.sh mapped.sam forward.sam reverse.sam

                    You can also do that with samtools, by filtering on the 0x10 flag bit. In either case, they have to be mapped first, of course - you cannot determine which read goes to which strand from a fastq file.
                    Attached Files

                    Comment


                    • #11
                      Ask and ye shall receive

                      Roll that into BBMap download Brian!

                      Comment


                      • #12
                        Thank you so much for your help!

                        Comment


                        • #13
                          Originally posted by GenoMax View Post
                          Roll that into BBMap download Brian!
                          OK... I don't like to release incomplete things so I made it faster, added some features, and then rolled it into the download.

                          Originally posted by JenBarb View Post
                          Thank you so much for your help!
                          You're welcome!

                          Comment


                          • #14
                            Thank you again, Brian and GenoMax for all of your help.

                            I now am trying the filterbyname script and it does not seem to be pulling out only those reads that match a particular read id found in my names.txt file. Is there something I am missing?

                            sh /data/barbj/bbmap/filterbyname.sh in=003.fastq out=filter003.fq names=names003.txt

                            Comment


                            • #15
                              By default, "filterbyname" discards reads with names in your name list, and keeps the rest. To include them and discard the others, do this:

                              filterbyname.sh in=003.fastq out=filter003.fq names=names003.txt include=t

                              Sorry for the confusion. I guess that default is kind of odd.
                              Last edited by Brian Bushnell; 02-09-2015, 12:02 PM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Advancing Precision Medicine for Rare Diseases in Children
                                by seqadmin




                                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                12-16-2024, 07:57 AM
                              • seqadmin
                                Recent Advances in Sequencing Technologies
                                by seqadmin



                                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                                Long-Read Sequencing
                                Long-read sequencing has seen remarkable advancements,...
                                12-02-2024, 01:49 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 12-17-2024, 10:28 AM
                              0 responses
                              33 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-13-2024, 08:24 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-12-2024, 07:41 AM
                              0 responses
                              34 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-11-2024, 07:45 AM
                              0 responses
                              46 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X