Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unmapped RNA-seq reads consist of repeated nucleotides (short homopolymeric regions)

    Hello!

    I have low mapping rate for the SOLiD RNA-seq data (organism - bacteria), around 30-40%, although usually we get 70-80%. I extracted unmapped reads and reads that have multiple hits (they are all poorly aligned and discarded from the further analysis), so:

    1) average quality is the same as for good samples (~26 bases)
    2) there is an enrichment of TTTTT for unmapped reads and different kind of other k-mers for multiple-hits reads (most of them consistent between samples)
    4) GC content is higher (53-55%) for unmapped and muliple-hits reads than for mapped reads (40%)
    3) if I look at reads, they look like they consist of short straches of repeated nucleotides:

    >178_1751_207_F3
    AGGGAAAGGCGAAAAGAACCCCGGCGAGGGGAGTGAAAAAGAACCTGAAACCGTGTACGT
    ACAAGGAGGGGAGAT
    >178_1751_758_F3
    CGAAAGGCGTAGTCGATGGGAAACAGGTTAATATTCCTGTACTTGGTGTTACTGCGAAGG
    GGGGACGGAGATGCG
    >178_1752_2_F3
    AAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGGAACTCCT
    TGCATCTAAATTTAT

    I also tried to assemble reads with Trinity, but all the derived contigs are mapped to our bacteria. Mapping agaist human genome did not give anything. It does not look like it is biological contamination. Checked for adapters and did trimming - nothing.
    Last edited by ritandr; 05-28-2014, 01:24 AM.

  • #2
    Just because unmapped reads does not fit to the human genome, it does not mean it is not contamination. I have found mouse contamination in tomato sequences.

    Comment


    • #3
      Solid reads should be in colorspace; you can't accurately convert them to base-space without mapping them. So, how did you generate those base-space reads in your post? Multi-hit and unmapped reads are fundamentally different. Also, it's hard to correctly convert a poorly-aligned read to base-space.

      In summary, I think you need to BLAST the original colorspace reads (assuming there's a colorspace version of BLAST) to see what they are.

      Comment


      • #4
        Thank you for answers,

        I did not find any difference in mapping percentage using color-space reads with LIfescope and base-space with Bowtie2, so the problem is not about their conversion. I have Blasted around 1300 of unmapped reads against nucleotide db NT, there are quite a lot of reads (25%) that are mapped to rRNA genes and to complete genome sequences (50%) of several bacteria (Bacillus and Enterococcus), and these species are the same for two different 'bad' samples. But it is impossible that they contaminate our samples. If I map against Bacillus and Enterococcus species, I get higher percentage of mapped reads (40-50%), than for our bacteria, but all of them are multiple-hit reads, and almost zero of unique reads. So, it looks like rRNA contamination, but from which source - I do not understand. The samples preparation also included rRNA exclusion...

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM
        • seqadmin
          The Impact of AI in Genomic Medicine
          by seqadmin



          Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
          02-26-2024, 02:07 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 03-14-2024, 06:13 AM
        0 responses
        34 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-08-2024, 08:03 AM
        0 responses
        72 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-07-2024, 08:13 AM
        0 responses
        81 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-06-2024, 09:51 AM
        0 responses
        68 views
        0 likes
        Last Post seqadmin  
        Working...
        X