Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Low reads mapping in RNA-Seq analysis

    Dear all.
    After starting to work a little while ago with RNA in prokaryotes in order to perform RNA-Seq analysis, I finally arrived to the moment of the bioinformatic analysis of the obtained reads.
    I prepared the cDNA library by following a modified TruSeq protocol for Illumina and the quality of the preparation by analysis using DNA High Sensitivity chip in Bioanalyzer was very good.
    After sequencing reaction, I used CLC Genomics Workbench to perform the RNA-Seq analysis. First of all, I run the tool for checking the quality of the reads, and the sequencing reaction seemed to be almost perfect, so I was very happy. But when running the RNA-Seq Analysis tool included in the software (using default parameters) it happens that more than 60% of the reads doesn't match with my reference genome.
    I have to say that as reference genome I use a multifasta file containing the list of all CDS, but not the assemble annotated genome.
    I was wondering why just 30% of the reads are mapped during the analysis…. Now I think that it might be due to the use of a CDS list… that would make all reads falling in between two CDS or intergenic regions will not be mapped. Am I right? Have any of you any other suggestion?
    Thank you very much in advance!!

  • #2
    How long are your "reference" fasta sequences? Since this is a prokaryote you do not need to account for introns so in theory the alignments should be simpler. Have you tried use the "align to reference" workflow instead of RNA-seq under transcriptome analysis in CLC?

    Does FastQC (http://www.bioinformatics.babraham.a...ojects/fastqc/) give the data a reasonably clean bill of health? No over-represented primer dimers/adapters.

    Comment


    • #3
      @GenoMax: He writes "intergenic regions", not "introns".

      @buthercup_ch: Why do you use CDS sequences instead of a genome sequence (As you indicate that there is one available)? If the genome is poorly annotated, it is possible - although highly unlikely - that your reads map to yet unknown/not annotated genes. Instead, I would rather look for non-coding RNAs in your reads. But a mapping to the genome will tell you more.

      Comment


      • #4
        There's a lot of weird junk in prok RNA-seq that does not map well, though normally it's under 10% of the reads. Still, mapping to the full genome, as suggested, with a splice-capable aligner will give a much better picture of what's happening. Introns are rare in prokaryotes, but self-splicing genes do exist. Furthermore, many bacteria are capable of modifying their own DNA to combat viruses (and alternately, can have their DNA modified by viruses) which creates reads that appear to have structural variations.

        It's also possible that you have some kind of contamination. I suggest BLASTing a thousand or so reads to NT and RefSeq Microbial to see what they are. They could be human, phiX, or some common bacterial contaminant, for example.

        Comment


        • #5
          Well, what have you looked at? Have you pulled out some of the unmapped reads to examine them? Check their quality? BLASTed them? Done de novo assembly on them, to see if you can make a contig?

          Comment


          • #6
            I would bet this is caused by ribosomal RNA, which comprise the largest fraction of all reads, and do not ribo-deplete well with standard kits like Ribo-Zero (at least this is what happened to us a couple of years ago). Of course, they will not map to CDSs - mapping to the full genome will reveal them as multi-mappers.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 08:47 AM
            0 responses
            15 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            54 views
            0 likes
            Last Post seqadmin  
            Working...
            X