Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie : More reference sequences Less aligned reads

    Hi All,

    I use Bowtie1 (version 1.0.0 for MacOSX)

    In order to discard some reads, I mapped reads to multiple reference sequences which I want to remove.

    I have a problem that Bowtie gave me fewer aligned reads, when I use more reference sequences.

    To be specific....
    Total sequences I want to discard are 21 sequences, and there are three different groups of sequences, and each groups have 7 sequences.

    Group A: A1,A2,A3,A4,A5,A6,A7. -> similarity:53%~99%, seq length: 1550nt
    Group B: B1,B2,B3,B4,B5,B6,B7. -> similarity:49%~99%, seq length: 2900nt
    Group C: C1,C2,C3,C4,C5,C6,C7. -> similarity:51%~99%, seq length: 120nt
    ====> Major targets are A1 and B1

    By using major two sequences, A1 & B1, I built a index file, and then did bowtie1.
    Its log file reports that:
    10.00% reads were reported as aligned reads,
    00.01% reads were reported as suppressed reads, and
    89.99% reads were reported as failed reads.

    After that, I did the same process with all 21 sequences : built a index, ran bowtie1.
    And I expected that this result would have more aligned reads than former result. However, it was absolutely wrong!

    Latter log file reports that:
    00.20% reads were reported as aligned reads,
    11.00% reads were reported as suppressed reads, and
    88.80% reads were reported as failed reads.

    I can not understand the reason why more reference sequences have fewer aligned reads.
    At least, it should have more or even reads than former result.
    Thankfully, # failed reads to align are similar each other.

    I used some options :
    bowtie `INDEX` -5 1 -n 0 -n 0 -k 1 -m 1 -l 20 --best --phred33-quals --un `UNMAPPED` -q `INPUT` -S `OUT` 2>> `LOG` -t

    Thank you!

    Jiyoung

  • #2
    I'm not an expert at Bowtie, but a couple things stand out to me. First, you have -n 0 -n 0 (-n 0 repeated) so is there an option missing and you wrote -n 0 instead?

    But the main issue is tied to the -m 1 option. You are telling Bowtie to only report reads that have a single valid alignment, otherwise suppress them. So when you include all the sequences in the index, in which sequences within the group have high similarity, you are making it very likely that Bowtie will find more than 1 valid alignment and suppress the reporting.
    Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

    Comment


    • #3
      Make sense! But why so many reads were suppressed?

      Originally posted by SNPsaurus View Post
      I'm not an expert at Bowtie, but a couple things stand out to me. First, you have -n 0 -n 0 (-n 0 repeated) so is there an option missing and you wrote -n 0 instead?

      But the main issue is tied to the -m 1 option. You are telling Bowtie to only report reads that have a single valid alignment, otherwise suppress them. So when you include all the sequences in the index, in which sequences within the group have high similarity, you are making it very likely that Bowtie will find more than 1 valid alignment and suppress the reporting.
      SNPsaurus, thanks!

      Yes, your explanation makes sense. So latter index with more reference sequences showed a few reduced failed reads.

      BUt still, it is unclear that why so many reads were suppressed ?
      Okay, it will be helpful to compare two output files! Thank you!

      Jiyoung

      Comment


      • #4
        No, the suppressed reads are the ones that are not reported because of your -m 1 option. In your first try (using A1 and B1) very few are suppressed because very few reads align to both A1 and B1. In the second try many more are suppressed because nearly every read that aligns, aligns to A1 and A2 and A3,4,5,6,7, or B1 and B2 and B3,4,5,6,7. When the read aligns to multiple index sequences, then it fails the -m 1 option and becomes suppressed.
        Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 08:47 AM
        0 responses
        11 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        54 views
        0 likes
        Last Post seqadmin  
        Working...
        X