Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • strange mapping bias inside outside genes in prokayotic RNASeq

    Hi everyone,

    as it is mentioned in the title I have encountered a strange problem in a recent RNASeq experiment which I have never seen before.

    Some background:
    stranded RNASeq of prokaryotic mRNA; three samples (1 and 2 same condition; diff. harvest timepoint) with three replicates each; total RNA prep, DNAse digestion; rRNA depletion; library prep with dUTP method; Illumina sequencing; trimming; filtering of remaining tRNA/rRNA reads; mapping with bowtie2.

    After read counting, normalization and diff. expression analysis with Bayseq I had a strange bias towards one sample, which seems to dominate the rest. In the end I found out, that the summarized readcounts (of mapped reads inside genes) are in 6 out of 9 samples significantly lower and thus affecting the results of the diff. Expression analysis. I never had such a problem before and checked randomly ~6 other RNASeq projects and even if the input read numbers differ, the nomalized reads counts are equally distributed.

    To find out, what is the problem I counted all reads (sense and antisense reads inside and outside genes; see attachment). To my surprise there is a heavy bias towards the outside mapped genes in the problematic samples (2 and 3, all replicates).

    Does anyone has a clue what is the reason for this?

    I thought about DNA contamination, but then I would assume a equally distribution of reads sense and anti-sense. Or the dUTP library prep, because it is not sufficient for high GC organisms, but again there should be a equally distribution over the genome and of course over all samples. All samples, except 3.1 and 3.2, were treated (DNAse, depletion, library prep) in parallel.

    I would really appreciate any hint or suggestion!

    Tokikake
    Attached Files

  • #2
    No clue what the biological cause could be.
    What are your genes in this case? Protein coding sequences, or also noncoding RNA? If noncoding is included, does it also include noncoding besides tRNA and rRNA?
    I ask because the rRNA read removal (under the assumption you don't do that by mapping do your reference, but with e.g. sortMeRNA) is not fully reliable (depending on how close your organism to anything in the database is), and you might still end up with a half ton of rRNA/tRNA. And if that's not the case, also tmRNA might have a very high expression value.

    And what does the fastQC report say?

    Comment


    • #3
      Hi bastian,

      thank you for your fast answer!
      The rRNA/tRNA reads were filtered via mapping against the reference (and not with sortMeRNA). But the tmRNA was a good point. I added it to my annotations (so genes in general not only protein coding ones) and found up to 13% more reads mapping into genes. I try now to annotate the RNAs with rfam and infernal and hope to reduce the amount of outside mapped reads.
      Nevertheless it is strange and I cannot see the same mapping bias in other projects, where I also didn't annotate the tmRNA (or any RNA prior to mapping, except tRNAs/rrNAs of course). Aber: Man lernt niemals aus!

      Have anyone else ever tested the amount of mapped reads inside/outside genes? It would be interesting to know!

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM
      • seqadmin
        The Impact of AI in Genomic Medicine
        by seqadmin



        Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
        02-26-2024, 02:07 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-14-2024, 06:13 AM
      0 responses
      34 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-08-2024, 08:03 AM
      0 responses
      72 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-07-2024, 08:13 AM
      0 responses
      81 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-06-2024, 09:51 AM
      0 responses
      68 views
      0 likes
      Last Post seqadmin  
      Working...
      X