Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to increase the mapping rate?

    I have a set of RNA-seq dataset of single end 100bp reads (30 million per sample), and first using tophat2, mapping rate is only 5% to the ref genome. Then I tried to trim raw data to 40-100bp, and mapping rate increase to 18%. I'm doing the mapping with no trimmed data right now...

    I wonder what other ways I can try to increase the mapping rate? trim read range to 50-100? increase the phred score based on fastqc?

    Any comments will be appreciated!

  • #2
    Can you post the FastQC plots of what the data looks like? No point in doing random trimming of data.

    Take a few reads and do an old fashioned blast to make sure the data is from your sample/correct genome. Mistakes sometimes happen at sequencing cores.

    Comment


    • #3
      I just had no trimming data alignment, and it is 15%.

      15.66% overall alignment rate

      I will post the fastqc plots soon. Thank you!

      Comment


      • #4
        Attached here is the fastqc before I trimmed
        Attached Files

        Comment


        • #5
          This is the fastqc after I trimmed using trimmomatic, 40-100bp

          java -jar /usr/local/apps/trimmomatic/Trimmomatic-0.32/trimmomatic-0.32.jar SE 1.fastq 1.trimmed.fastq ILLUMINACLIP:/usr/local/apps/trimmomatic/Trimmomatic-0.32/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:40
          Attached Files

          Comment


          • #6
            Q-score wise there is no issue, so the problem must lie elsewhere. It is possible to get great data that may not align at all so this is only part of the QC. Report back on the blast result. Do the GC plots look strange?

            Comment


            • #7
              I don't really understand what you mean by "trimming to 40-100bp". But, it would not surprise me if your problem was adapter contamination; do you know what kind of adapters were used? They might not be TruSeq.

              Comment


              • #8
                What organism are you working with and what is your reference?
                I have seen such fastqc results just recently.
                The reason was a severe rRNA contamination. Maybe mRNA enrichment / ribo-depletion didn't work (or wasn't done)?
                If the respective sequences are not (or are only partially) represented in your reference, you can of course not map to them. Look at the sequence duplication levels: if there is an increase at 10k, this is an indication for that. If you are working with human samples, the relatively high GC content is another one.
                To verify this, simply use the rRNA sequences as reference and map to them.

                Comment


                • #9
                  Originally posted by GenoMax View Post
                  Q-score wise there is no issue, so the problem must lie elsewhere. It is possible to get great data that may not align at all so this is only part of the QC. Report back on the blast result. Do the GC plots look strange?
                  here is the overall fastqc
                  Attached Files

                  Comment


                  • #10
                    Originally posted by WhatsOEver View Post
                    What organism are you working with and what is your reference?
                    I have seen such fastqc results just recently.
                    The reason was a severe rRNA contamination. Maybe mRNA enrichment / ribo-depletion didn't work (or wasn't done)?
                    If the respective sequences are not (or are only partially) represented in your reference, you can of course not map to them. Look at the sequence duplication levels: if there is an increase at 10k, this is an indication for that. If you are working with human samples, the relatively high GC content is another one.
                    To verify this, simply use the rRNA sequences as reference and map to them.
                    The reference is honeybee genome, which is the 2nd version so far. Thank you for your suggestion. I think it may be the problem of low quality lib prep.

                    Comment


                    • #11
                      Originally posted by Brian Bushnell View Post
                      I don't really understand what you mean by "trimming to 40-100bp". But, it would not surprise me if your problem was adapter contamination; do you know what kind of adapters were used? They might not be TruSeq.
                      The lib was done by the NEBNext® RNA Library Prep Kit for Illumina, so it should be TruSeq adaptors.

                      Comment


                      • #12
                        From looking at the fastqc output (btw: there is a new, slightly better fastqc version available), I can only say again that it looks very similar to our rRNA "contaminated" samples. More interestingly, we also used the NEB kit...

                        Comment


                        • #13
                          @bbm: Were you mapping to the entire genome or just the transcriptome?

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM
                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 06:37 PM
                          0 responses
                          11 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, Yesterday, 06:07 PM
                          0 responses
                          10 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-22-2024, 10:03 AM
                          0 responses
                          51 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-21-2024, 07:32 AM
                          0 responses
                          67 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X