Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • bbm
    Member
    • Sep 2011
    • 38

    how to increase the mapping rate?

    I have a set of RNA-seq dataset of single end 100bp reads (30 million per sample), and first using tophat2, mapping rate is only 5% to the ref genome. Then I tried to trim raw data to 40-100bp, and mapping rate increase to 18%. I'm doing the mapping with no trimmed data right now...

    I wonder what other ways I can try to increase the mapping rate? trim read range to 50-100? increase the phred score based on fastqc?

    Any comments will be appreciated!
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    Can you post the FastQC plots of what the data looks like? No point in doing random trimming of data.

    Take a few reads and do an old fashioned blast to make sure the data is from your sample/correct genome. Mistakes sometimes happen at sequencing cores.

    Comment

    • bbm
      Member
      • Sep 2011
      • 38

      #3
      I just had no trimming data alignment, and it is 15%.

      15.66% overall alignment rate

      I will post the fastqc plots soon. Thank you!

      Comment

      • bbm
        Member
        • Sep 2011
        • 38

        #4
        Attached here is the fastqc before I trimmed
        Attached Files

        Comment

        • bbm
          Member
          • Sep 2011
          • 38

          #5
          This is the fastqc after I trimmed using trimmomatic, 40-100bp

          java -jar /usr/local/apps/trimmomatic/Trimmomatic-0.32/trimmomatic-0.32.jar SE 1.fastq 1.trimmed.fastq ILLUMINACLIP:/usr/local/apps/trimmomatic/Trimmomatic-0.32/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:40
          Attached Files

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            Q-score wise there is no issue, so the problem must lie elsewhere. It is possible to get great data that may not align at all so this is only part of the QC. Report back on the blast result. Do the GC plots look strange?

            Comment

            • Brian Bushnell
              Super Moderator
              • Jan 2014
              • 2709

              #7
              I don't really understand what you mean by "trimming to 40-100bp". But, it would not surprise me if your problem was adapter contamination; do you know what kind of adapters were used? They might not be TruSeq.

              Comment

              • WhatsOEver
                Senior Member
                • Apr 2012
                • 215

                #8
                What organism are you working with and what is your reference?
                I have seen such fastqc results just recently.
                The reason was a severe rRNA contamination. Maybe mRNA enrichment / ribo-depletion didn't work (or wasn't done)?
                If the respective sequences are not (or are only partially) represented in your reference, you can of course not map to them. Look at the sequence duplication levels: if there is an increase at 10k, this is an indication for that. If you are working with human samples, the relatively high GC content is another one.
                To verify this, simply use the rRNA sequences as reference and map to them.

                Comment

                • bbm
                  Member
                  • Sep 2011
                  • 38

                  #9
                  Originally posted by GenoMax View Post
                  Q-score wise there is no issue, so the problem must lie elsewhere. It is possible to get great data that may not align at all so this is only part of the QC. Report back on the blast result. Do the GC plots look strange?
                  here is the overall fastqc
                  Attached Files

                  Comment

                  • bbm
                    Member
                    • Sep 2011
                    • 38

                    #10
                    Originally posted by WhatsOEver View Post
                    What organism are you working with and what is your reference?
                    I have seen such fastqc results just recently.
                    The reason was a severe rRNA contamination. Maybe mRNA enrichment / ribo-depletion didn't work (or wasn't done)?
                    If the respective sequences are not (or are only partially) represented in your reference, you can of course not map to them. Look at the sequence duplication levels: if there is an increase at 10k, this is an indication for that. If you are working with human samples, the relatively high GC content is another one.
                    To verify this, simply use the rRNA sequences as reference and map to them.
                    The reference is honeybee genome, which is the 2nd version so far. Thank you for your suggestion. I think it may be the problem of low quality lib prep.

                    Comment

                    • bbm
                      Member
                      • Sep 2011
                      • 38

                      #11
                      Originally posted by Brian Bushnell View Post
                      I don't really understand what you mean by "trimming to 40-100bp". But, it would not surprise me if your problem was adapter contamination; do you know what kind of adapters were used? They might not be TruSeq.
                      The lib was done by the NEBNext® RNA Library Prep Kit for Illumina, so it should be TruSeq adaptors.

                      Comment

                      • WhatsOEver
                        Senior Member
                        • Apr 2012
                        • 215

                        #12
                        From looking at the fastqc output (btw: there is a new, slightly better fastqc version available), I can only say again that it looks very similar to our rRNA "contaminated" samples. More interestingly, we also used the NEB kit...

                        Comment

                        • GenoMax
                          Senior Member
                          • Feb 2008
                          • 7142

                          #13
                          @bbm: Were you mapping to the entire genome or just the transcriptome?

                          Comment

                          Latest Articles

                          Collapse

                          • SEQadmin2
                            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                            by SEQadmin2


                            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                            ...
                            Yesterday, 10:05 AM
                          • SEQadmin2
                            Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                            by SEQadmin2


                            With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                            Introduction

                            Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                            05-22-2026, 06:42 AM
                          • SEQadmin2
                            Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                            by SEQadmin2

                            Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                            Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                            05-06-2026, 09:04 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by SEQadmin2, Yesterday, 12:03 PM
                          0 responses
                          19 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, Yesterday, 11:40 AM
                          0 responses
                          14 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 05-28-2026, 11:40 AM
                          0 responses
                          29 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 05-26-2026, 10:12 AM
                          0 responses
                          31 views
                          0 reactions
                          Last Post SEQadmin2  
                          Working...