Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • thh32
    Member
    • Feb 2014
    • 60

    Code to bin reads that have been mapped

    Basicaslly the idea we have is to map our metagenomeic reads to a file containing all isolated genomes from the environment. Then remove the mapped reads from the original reads files (of which there are 12 due to 6 paired end runs). I currently have mapped the reads to the reference genomes using BOWTIE2 but am now stuck on how to remove just these reads and to remove them from all the files. I assume some form of comparison and then grep removal into another file would work but am still new to coding so would appreciate any help people can provide.

    Thanks,
    Tom
  • mastal
    Senior Member
    • Mar 2009
    • 666

    #2
    I think there are parameters you can set in Bowtie so that it gives you a file with the unmapped reads, if that helps.

    Comment

    • dpryan
      Devon Ryan
      • Jul 2011
      • 3478

      #3
      It's probably easier to just use the convert the unmapped reads in the SAM/BAM file to fastq. If you have a BAM file ("alignments.bam") with both mapped and unmapped reads from bowtie2:

      Code:
      samtools view -bo unmapped.bam -f 0x4 alignments.bam
      Then you could use one of the many bam2fastq/bam2fastx/etc. programs to convert the result to fastq. Note that weird things may happen if you allow bowtie2 to produce mixed alignments and then do this, since then the resulting fastq files may no longer be in sync (i.e., the Nth read of each file may not originate from the same fragment)! So be aware of that.

      I should note that this will be a lot faster than grepping things or otherwise parsing and comparing the alignments and fastq files.

      Edit: As mastal noted, see also the --un-conc option (I would still use the --no-discordant and --no-mixed options).

      Comment

      • westerman
        Rick Westerman
        • Jun 2008
        • 1104

        #4
        I think that thh32's question is two-fold.

        1) "how to separate the mapped and/or unmapped reads from my alignment files". Both @mastal and @dpryan's answers are good.

        2) "how do I remove the unmapped reads from my original read files?"

        This is a harder question. I have a program in my tool chest called 'extract_reads' which takes a list of ids (easily obtained via grep) and either selects or removes the reads from a fastq file. Not sure where that program came from but it is what I would use.

        Comment

        • thh32
          Member
          • Feb 2014
          • 60

          #5
          Yes westerman it is 2 part really and I am currently running mastal's solution to the first part at the moment and I have had a look for this extract_reads program you have mentions, would it happen to be part of RNA-seq-Graph ?

          Comment

          • dpryan
            Devon Ryan
            • Jul 2011
            • 3478

            #6
            One could cheat on part 2 by just switching the "-f 0x4" to "-F 0x4" and then pipe that into picard samtofastq:
            Code:
            samtools view -hF 0x4 alignments.bam | java -Xmx30g -jar ${PICARD}/SamToFastq.jar I=/dev/stdin other_options_here
            It's not literally removing unmapped reads from the original fastq files, but rather creating a fastq file of the mapped reads. In practice, this should be the same (with the possible exception of read order).
            Last edited by dpryan; 02-27-2014, 08:32 AM. Reason: I meant "F", not "f"

            Comment

            • westerman
              Rick Westerman
              • Jun 2008
              • 1104

              #7
              @dpryan's method - using Picard tools' SamToFastq - will work. I should have thought of that myself!

              extract_reads was bundled into Tophat v.1. It isn't in the latest version of tophat. My sysadmin liked the program so much that he put it into our '/bin' directory and from that I put it on my list of tools to use. I don't use if very often.

              Comment

              • thh32
                Member
                • Feb 2014
                • 60

                #8
                Ok I will download samtools and give @dpryans's technique ago then. Thanks for all the help.

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  #9
                  I stumbled upon this earlier. It is probably not directly related to the original question but in case someone stumbles upon this thread after a search it may be an option: http://mattshirley.com/de-multiplexi...equencing-data

                  Comment

                  Latest Articles

                  Collapse

                  • GATTACAT
                    Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by GATTACAT
                    Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                    07-01-2026, 11:43 AM
                  • SEQadmin2
                    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by SEQadmin2


                    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                    Here are nine questions we think about, in roughly the order they matter, before...
                    06-18-2026, 07:11 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 07-02-2026, 11:08 AM
                  0 responses
                  7 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-30-2026, 05:37 AM
                  0 responses
                  12 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-26-2026, 11:10 AM
                  0 responses
                  20 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-17-2026, 06:09 AM
                  0 responses
                  54 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...