Seqanswers Leaderboard Ad

**westerman** · 01-06-2016, 11:03 AM

Trinity is not a mapping tool. Trinity is a denovo transcriptome assembly tool. You may end up using it but it should not be your first tool to use to figure out this problem.

What you need to do is figure out what species/reference your S2 and S4 reads come from. Yes, yes, yes. I know you are probably insisting that they have to come from the same reference as S1, S3 and S5 but through out the years I have encountered a number of samples that are not from what the customer said they came from. Contamination, lab mistakes, or even de-novo discovery (this is where Trinity could be useful) all make for interesting Tophat results. As a side note, I am currently working on a transcriptome project where not only did we find one sample with cricket RNA instead of bacterial RNA (turns out customer's lab mate works on cricket) but also multiple samples which appear to be from different but related bacteria. Or perhaps our samples are from a yet un-characterized bacteria?

Anyway, two things to do:

1) If you have not done so run fastQC on your samples to double check the quality of them.

2) Take a couple thousand reads from each sample and map them to NT (nucleotide database) and see if they are all mapping to the same species.

That should give you some insights.

**GenoMax** · 01-06-2016, 03:14 PM

Originally posted by westerman View Post

2) Take a couple thousand reads from each sample and map them to NT (nucleotide database) and see if they are all mapping to the same species.

That may be overkill

Even with 20-30 reads the problem should become apparent if there is obvious contamination of foreign DNA.

**pcalzadilla** · 01-25-2016, 09:00 AM

Hi everyone,

I´ve just read these messages and I was wondering if 60-80% is a good mapping result. I ´ve got the same result in my RNAseq data, and, despite the fact many papers have this percentage of mapped reads, does not have to be higher since the mapping is against a reference genome?

Cheers
Pablo

**westerman** · 01-25-2016, 09:07 AM

Originally posted by pcalzadilla View Post

Hi everyone,

I´ve just read these messages and I was wondering if 60-80% is a good mapping result. I ´ve got the same result in my RNAseq data, and, despite the fact many papers have this percentage of mapped reads, does not have to be higher since the mapping is against a reference genome?

Cheers
Pablo

Depends on your reference. Not all are as good as, say, human. In the plant and animal sequencing I deal with often I am lucky to find a reference that is (a) highly characterized and (b) related within the last couple million years to the organism I am working with.

For human 60% mapping would be poor. For an unknown fungus it could be very good.

**GenoMax** · 01-25-2016, 09:08 AM

Clearly the data that mapped is fine so that part (60-80%) is a good result.

If you are curious as to why the rest did not map then you can take those reads and run blast on a few to see if you can get a quick answer. If there is obvious contamination (from an unrelated species) then you need to start worrying.

Have you scanned/trimmed the data for presence of adapters etc?

**pcalzadilla** · 01-25-2016, 09:16 AM

Yes, the trimming was Ok! However, I will run blast to those unmapped reads to discard any possible contamination.

Thanks a lot!

**GenoMax** · 01-25-2016, 09:20 AM

Originally posted by pcalzadilla View Post

Yes, the trimming was Ok! However, I will run blast to those unmapped reads to discard any possible contamination.

Thanks a lot!

Well .. those unmapped reads are not going to contribute to read counts but having consistent presence of unexpected foreign sequences in your samples is not a good thing (if that is what you find via blast). They could be influencing your experiment in an unexpected way and may lead to erroneous results.

**westerman** · 01-25-2016, 09:35 AM

GenoMax has a good point -- check the unmapped reads to see if they are a different species. But as I mentioned if you are working with poorly characterized species then you may just find that those unmapped reads simply do not map to anything.

**GenoMax** · 01-25-2016, 09:42 AM

In this case a negative blast result would be good (I would be curious to know if the result actually turns out to be negative) as @Rick points out.

**pcalzadilla** · 01-25-2016, 11:52 AM

I did blast to my unmapped reads and my blast results were negative; so that´s a good result as you said. As a consequence, my 70% of mapped reads are probably due to the uncomplete reference genome used. Am I right?

Thanks a lot
Pablo

**westerman** · 01-25-2016, 12:41 PM

Originally posted by pcalzadilla View Post

I did blast to my unmapped reads and my blast results were negative; so that´s a good result as you said. As a consequence, my 70% of mapped reads are probably due to the uncomplete reference genome used. Am I right?

Thanks a lot
Pablo

That would be my guess and what I would tell my customers with similar results.

**GenoMax** · 01-25-2016, 12:55 PM

@pcalzadilla: You could try to assemble all remaining un-mapped reads to derive some additional information but that may or may not be of interest depending on what kind of genome you are working with (complexity) and/or what the aim of your experiment is.

**caiosuz** · 02-15-2016, 10:00 AM

I was thinking about a way to run tophat again with my unmmapped reads.
Can anyone give suggestions of changing of values of parameters to loose the stringency of the analyse without losing mapping quality and increase the amount of mapped reads?

**GenoMax** · 02-15-2016, 10:10 AM

Have you verified that those unmapped reads are matching the right genome?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

TopHat2 - Low percentage of mapped reads

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News