Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie2 and differences in read counts. How can this be?

    So I have some RNA-seq data for prokaryotes. Lets say I have strain A and strain B, and for each I have two replicates and two conditions. I want to do differential expression on these.

    Now, if I align my reads from strain A to the reference genome for strain A with bowtie2, fine. Then I align my reads from strain B to reference genome A. Still mostly good.

    Here's the thing. This is the exact same read set, just aligned to two slightly different reference genomes. They should be the exact same for orthologous genes, or at least really close. But there are some genes that just show different numbers for certain genes.

    Lets say gene X is from strain A. Gene X has an ortholog, gene Y, in strain B. When you blast these sequences, are no mutiations. 100% identity. About 600 bases long. Let's say the read counts are as follows for the genes and their replicates.
    • Gene X-1: 585
    • Gene X-2: 528
    • Gene Y-1: 372
    • Gene Y-2: 325


    So my question is, how can this happen? It must have happened at the bowtie2 alignment step, but why? If it has multiple possible matching locations wouldn't there be a gene out there with the missing reads that I could find? Also, if it uses random seeding for alignment, shouldnt the two replicate runs have been different if that was the case? What could cause this sort of thing to happen? I would be happy to hear any thoughts. Thanks!

  • #2
    Hi,
    You aligns your reads on the whole genome reference I’m right ?
    I have an idea to check something may be you could try to take the 2 groups of reads for gene 1 and make a consensus to see if there is a specific kind of reads mapping on one references and not the other.
    Did you already check if there is a specificity for reads mapping on one reference and not the other ?

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM
    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    18 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    22 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    17 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    49 views
    0 likes
    Last Post seqadmin  
    Working...
    X