Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Code to bin reads that have been mapped

    Basicaslly the idea we have is to map our metagenomeic reads to a file containing all isolated genomes from the environment. Then remove the mapped reads from the original reads files (of which there are 12 due to 6 paired end runs). I currently have mapped the reads to the reference genomes using BOWTIE2 but am now stuck on how to remove just these reads and to remove them from all the files. I assume some form of comparison and then grep removal into another file would work but am still new to coding so would appreciate any help people can provide.

    Thanks,
    Tom

  • #2
    I think there are parameters you can set in Bowtie so that it gives you a file with the unmapped reads, if that helps.

    Comment


    • #3
      It's probably easier to just use the convert the unmapped reads in the SAM/BAM file to fastq. If you have a BAM file ("alignments.bam") with both mapped and unmapped reads from bowtie2:

      Code:
      samtools view -bo unmapped.bam -f 0x4 alignments.bam
      Then you could use one of the many bam2fastq/bam2fastx/etc. programs to convert the result to fastq. Note that weird things may happen if you allow bowtie2 to produce mixed alignments and then do this, since then the resulting fastq files may no longer be in sync (i.e., the Nth read of each file may not originate from the same fragment)! So be aware of that.

      I should note that this will be a lot faster than grepping things or otherwise parsing and comparing the alignments and fastq files.

      Edit: As mastal noted, see also the --un-conc option (I would still use the --no-discordant and --no-mixed options).

      Comment


      • #4
        I think that thh32's question is two-fold.

        1) "how to separate the mapped and/or unmapped reads from my alignment files". Both @mastal and @dpryan's answers are good.

        2) "how do I remove the unmapped reads from my original read files?"

        This is a harder question. I have a program in my tool chest called 'extract_reads' which takes a list of ids (easily obtained via grep) and either selects or removes the reads from a fastq file. Not sure where that program came from but it is what I would use.

        Comment


        • #5
          Yes westerman it is 2 part really and I am currently running mastal's solution to the first part at the moment and I have had a look for this extract_reads program you have mentions, would it happen to be part of RNA-seq-Graph ?

          Comment


          • #6
            One could cheat on part 2 by just switching the "-f 0x4" to "-F 0x4" and then pipe that into picard samtofastq:
            Code:
            samtools view -hF 0x4 alignments.bam | java -Xmx30g -jar ${PICARD}/SamToFastq.jar I=/dev/stdin other_options_here
            It's not literally removing unmapped reads from the original fastq files, but rather creating a fastq file of the mapped reads. In practice, this should be the same (with the possible exception of read order).
            Last edited by dpryan; 02-27-2014, 08:32 AM. Reason: I meant "F", not "f"

            Comment


            • #7
              @dpryan's method - using Picard tools' SamToFastq - will work. I should have thought of that myself!

              extract_reads was bundled into Tophat v.1. It isn't in the latest version of tophat. My sysadmin liked the program so much that he put it into our '/bin' directory and from that I put it on my list of tools to use. I don't use if very often.

              Comment


              • #8
                Ok I will download samtools and give @dpryans's technique ago then. Thanks for all the help.

                Comment


                • #9
                  I stumbled upon this earlier. It is probably not directly related to the original question but in case someone stumbles upon this thread after a search it may be an option: http://mattshirley.com/de-multiplexi...equencing-data

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  12 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  68 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X