Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • TopHat2 - Low percentage of mapped reads

    Hi, everyone!
    I'm worried about my mapping results with TopHat2.
    I'm working with five sets of reads and these are the mapping results:

    S1 - 65.8 %
    S2 - 29.6 %
    S3 - 80.3 %
    S4 - 15.4 %
    S5 - 65.3 %

    I am mapping them to a reference genome with a gtf file with a command like this:

    $ tophat -p 8 -G genes.gtf -o C1_R1_thout --library-type=fr-unstranded \
    genome C1_R1_1.fq C1_R1_2.fq

    What should I do about the S2 and S4?

    Would it be worth to try to map all samples with other tool like trinity or should I take a look to unmapped reads in BLAT or the best thing to would be to loose my parameters with a higher value of "--read-mismatches", "--read-gap-length", "--read-edit-dist"?
    Thanks a lot since now and best regards?

  • #2
    Trinity is not a mapping tool. Trinity is a denovo transcriptome assembly tool. You may end up using it but it should not be your first tool to use to figure out this problem.

    What you need to do is figure out what species/reference your S2 and S4 reads come from. Yes, yes, yes. I know you are probably insisting that they have to come from the same reference as S1, S3 and S5 but through out the years I have encountered a number of samples that are not from what the customer said they came from. Contamination, lab mistakes, or even de-novo discovery (this is where Trinity could be useful) all make for interesting Tophat results. As a side note, I am currently working on a transcriptome project where not only did we find one sample with cricket RNA instead of bacterial RNA (turns out customer's lab mate works on cricket) but also multiple samples which appear to be from different but related bacteria. Or perhaps our samples are from a yet un-characterized bacteria?

    Anyway, two things to do:

    1) If you have not done so run fastQC on your samples to double check the quality of them.

    2) Take a couple thousand reads from each sample and map them to NT (nucleotide database) and see if they are all mapping to the same species.

    That should give you some insights.

    Comment


    • #3
      Originally posted by westerman View Post
      2) Take a couple thousand reads from each sample and map them to NT (nucleotide database) and see if they are all mapping to the same species.
      That may be overkill Even with 20-30 reads the problem should become apparent if there is obvious contamination of foreign DNA.

      Comment


      • #4
        Hi everyone,

        I´ve just read these messages and I was wondering if 60-80% is a good mapping result. I ´ve got the same result in my RNAseq data, and, despite the fact many papers have this percentage of mapped reads, does not have to be higher since the mapping is against a reference genome?

        Cheers
        Pablo

        Comment


        • #5
          Originally posted by pcalzadilla View Post
          Hi everyone,

          I´ve just read these messages and I was wondering if 60-80% is a good mapping result. I ´ve got the same result in my RNAseq data, and, despite the fact many papers have this percentage of mapped reads, does not have to be higher since the mapping is against a reference genome?

          Cheers
          Pablo
          Depends on your reference. Not all are as good as, say, human. In the plant and animal sequencing I deal with often I am lucky to find a reference that is (a) highly characterized and (b) related within the last couple million years to the organism I am working with.

          For human 60% mapping would be poor. For an unknown fungus it could be very good.

          Comment


          • #6
            Clearly the data that mapped is fine so that part (60-80%) is a good result.

            If you are curious as to why the rest did not map then you can take those reads and run blast on a few to see if you can get a quick answer. If there is obvious contamination (from an unrelated species) then you need to start worrying.

            Have you scanned/trimmed the data for presence of adapters etc?

            Comment


            • #7
              Yes, the trimming was Ok! However, I will run blast to those unmapped reads to discard any possible contamination.

              Thanks a lot!

              Comment


              • #8
                Originally posted by pcalzadilla View Post
                Yes, the trimming was Ok! However, I will run blast to those unmapped reads to discard any possible contamination.

                Thanks a lot!
                Well .. those unmapped reads are not going to contribute to read counts but having consistent presence of unexpected foreign sequences in your samples is not a good thing (if that is what you find via blast). They could be influencing your experiment in an unexpected way and may lead to erroneous results.
                Last edited by GenoMax; 01-25-2016, 09:23 AM.

                Comment


                • #9
                  GenoMax has a good point -- check the unmapped reads to see if they are a different species. But as I mentioned if you are working with poorly characterized species then you may just find that those unmapped reads simply do not map to anything.

                  Comment


                  • #10
                    In this case a negative blast result would be good (I would be curious to know if the result actually turns out to be negative) as @Rick points out.

                    Comment


                    • #11
                      I did blast to my unmapped reads and my blast results were negative; so that´s a good result as you said. As a consequence, my 70% of mapped reads are probably due to the uncomplete reference genome used. Am I right?

                      Thanks a lot
                      Pablo

                      Comment


                      • #12
                        Originally posted by pcalzadilla View Post
                        I did blast to my unmapped reads and my blast results were negative; so that´s a good result as you said. As a consequence, my 70% of mapped reads are probably due to the uncomplete reference genome used. Am I right?

                        Thanks a lot
                        Pablo
                        That would be my guess and what I would tell my customers with similar results.

                        Comment


                        • #13
                          @pcalzadilla: You could try to assemble all remaining un-mapped reads to derive some additional information but that may or may not be of interest depending on what kind of genome you are working with (complexity) and/or what the aim of your experiment is.

                          Comment


                          • #14
                            I was thinking about a way to run tophat again with my unmmapped reads.
                            Can anyone give suggestions of changing of values of parameters to loose the stringency of the analyse without losing mapping quality and increase the amount of mapped reads?

                            Comment


                            • #15
                              Have you verified that those unmapped reads are matching the right genome?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              30 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X