Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rna

    I aligned RNA sequenceing data to human genome using HISAT. To my surprised, the alignment rates were too low, about 75%. So, I want to know:
    1) Why was the alignment rate (~75%) so low?
    2) As for the RNA sequencing data of human tissue, which of alignment software is suitable between HISAT, TopHat2 and STAR?

    Thanks a lot!

  • #2
    No one can tell you why you only got a 75% alignment rate without looking at your data. Perhaps you have some notable adapter contamination or quality issues and didn't trim (or used end to end alignment). Perhaps you have bacterial or other contamination, have a look at some of the unaligned reads.

    Generally HISAT or STAR would be used for RNAseq data, since tophat2 is just too slow. I should note that STAR is nice in that it will give you a summary of why it couldn't align reads (e.g., they're too short, too many possible hits, etc.).

    Comment


    • #3
      Originally posted by dpryan View Post
      No one can tell you why you only got a 75% alignment rate without looking at your data. Perhaps you have some notable adapter contamination or quality issues and didn't trim (or used end to end alignment). Perhaps you have bacterial or other contamination, have a look at some of the unaligned reads.

      Generally HISAT or STAR would be used for RNAseq data, since tophat2 is just too slow. I should note that STAR is nice in that it will give you a summary of why it couldn't align reads (e.g., they're too short, too many possible hits, etc.).
      Thank you, dpryan!
      Actually, these RNA sequencing data have be removed the adapter and be filtered low quality reads.
      I suspected that HISAT caused the low mapping rate, firstly. So, for the same reference genome and the same input RNA sequencing data, I run tophat2 (v2.1.0). But the mapping rate of tophat2 is about 70%. So, I think that the low mapping rate maybe is not the result of alignment software, but reference genome I chose.
      It is worth mentioning that the reference genome sequence downloaded from Ensembl is “dna_rm” type (masked genomic DNA). Maybe it is the cause. Next, I will test the “dna” type (unmasked) reference genome.

      Comment


      • #4
        Yup, using a hard masked reference would cause that. Just use either the soft masked or unmasked (the results will be the same).

        Comment


        • #5
          Originally posted by dpryan View Post
          Yup, using a hard masked reference would cause that. Just use either the soft masked or unmasked (the results will be the same).

          Thank you, dpryan. I replaced the masked reference with unmasked genome reference. The mapping rate was about 90%, but I found one another question. The discordant alignment rate was so high. The following are links. Could you help me to see the results? Thank you so much.

          Application of sequencing to RNA analysis (RNA-Seq, whole transcriptome, SAGE, expression analysis, novel organism mining, splice variants)


          Ps: The HISAT results of soft masked reference were as same as the unmasked. And the parameters were defaults.
          Last edited by skly; 08-27-2015, 11:31 PM.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 11:49 AM
          0 responses
          13 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-24-2024, 08:47 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          61 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Working...
          X