Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Aligning only unique reads in Bowtie

    Hi everyone,

    I am aligning some ChIP-seq data using Bowtie. I have been using the -m option to throw out any reads with > 1 reportable alignment, but I would also like to try omitting non-unique reads. Is there a command line option to throw out any reads that are identical?

    Thanks!

  • #2
    If you just have the raw reads, you can use the "uniq" command in Linux to extract the unique reads (after sorting).

    SpliceMap: De novo detection of splice junctions from RNA-seq
    Download SpliceMap Comment here

    Comment


    • #3
      Thanks John!

      That sounds like it should be a useful command. I was just wondering if you could give me a little more detail.

      I have the ChIP-seq data as FASTQ files which I align using bowtie. Would I use the uniq command on the FASTQ prior file to alignment to generate another FASTQ containing only unique reads?

      i.e., prior to alignment, run uniq -u on the FASTQ?

      Thanks!

      Comment


      • #4
        No worries, but the method I suggested is a bit of a hack... It will require you to fiddle with the data a bit.

        Firstly, do you need to preserve the read-quality information? If so then it is probably best to write your own python or perl script to do it. I'm pretty sure there are existing tools to do this though... I just can't re-call off the top of my head.

        -----

        The method I suggested is to firstly extract the raw-reads from the FASTQ file by using
        instructions here



        Then sort the reads with http://en.wikipedia.org/wiki/Sort_(Unix)

        sort input_file > output_file

        Finally use "uniq"

        uniq -u input_file > output_file

        After you do this, you can align your reads using bowtie with the "-r" option for raw reads.
        SpliceMap: De novo detection of splice junctions from RNA-seq
        Download SpliceMap Comment here

        Comment


        • #5
          You can try fastx_collapser from http://hannonlab.cshl.edu/fastx_toolkit/

          Comment


          • #6
            Re: Aligning only unique reads in Bowtie

            I have few questions regarding the best practices that are adopted, in dealing with multiple alignments from a single read and presence of identical reads in the data (from Biology stand point) :

            I am curious, how important it is to deal with identical reads.
            Having many identical reads in data means something wrong with the
            experiment?

            What could be considered as max. cutoff value for the number of identical reads in the data, so as to not consider those reads?

            In the other case of a single read aligning at multiple places in a genome, what should be the cutoff value for number of multiple alignments, so as to not consider those reads?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            25 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            27 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X