Seqanswers Leaderboard Ad

**GenoMax** · 05-17-2017, 03:30 PM

For reference cross-posted: https://www.biostars.org/p/253337/

Please close this post with a cross-reference, if an acceptable answer is found in other forum(s).

**ShellfishGene** · 06-20-2017, 01:12 AM

repeats?

I currently have the same issue. I wonder if it's related to the fact that featureCounts counts multimapping reads once per mapping position. If you have some reads that map repetitive positions outside genes, and each of those reads maps for example to 50 repeats in the genome, the Unassigned_NoFeatures count will be greatly inflated.

**aprice67** · 06-20-2017, 05:41 AM

That might be one contributing factor, but it's not enough to explain the scale of the read loss, at least not in my data. The level of multi-mapped reads I'm seeing is something on average of 10-15%, and the level of reads with unassigned_noFeature is sometimes around half of the total aligned reads.

I have looked into this using multiple tools and always find the same thing, tried different parameter tuning, same thing. Looked at the alignments in a genome browser and it seems reads mostly are aligning to annotated regions.

I talked to some more biology-heavy people about it and they suggested it might be some small RNAs or rRNAs that aren't annotated but are highly expressed or amplified. I did find one case where this was true.

I noticed a spike in my GC plot in fastQC around 62%, blasted all of my overrepresented sequences, and found them to be a single 7s rRNA that wasn't annotated. So I feel like thats a possible contributing factor as well.

I've decided that if there is sufficient depth for the counted reads though, that I'm going to proceed with the data. I feel like at 15-20x coverage we're still likely to be getting good expression ratios, and most of the data i'm looking at have even 80-100x coverage. So since I've been unable to locate the cause of this perfectly, I'm just filtering samples that don't meet a high coverage cutoff, because what else can we do, right?

Anyhow, I wish you luck, let me know if you find anything else out about this.

**ShellfishGene** · 06-20-2017, 05:58 AM

Originally posted by aprice67 View Post

That might be one contributing factor, but it's not enough to explain the scale of the read loss, at least not in my data. The level of multi-mapped reads I'm seeing is something on average of 10-15%, and the level of reads with unassigned_noFeature is sometimes around half of the total aligned reads.

You have to be careful here: The mapper counts reads that multimap. featureCounts counts mappings. Thus if you have 10% multimapped reads, and each of those reads maps 5 times to the non-annotated part, those reads will make up ~35% of all counts in featureCounts. At least that's how I understand it.

Unannotated RNAs are of course also possible. You could look for areas of high coverage and substract the annotation gtf with bedtools to find those areas. Maybe I'll do that now with my data

.

**aprice67** · 06-20-2017, 06:07 AM

That's a very nice thing to know, thanks for pointing this out. I feel slightly safer now. I wonder, are you working with mouse data as well?

**ShellfishGene** · 06-20-2017, 06:10 AM

I wish! All so nicely annotated with all the tools for mouse. I'm working with weird fishes.

**aprice67** · 06-20-2017, 06:21 AM

As someone on the other side of the fence, let me assure you the grass is not quite so green as you might think. Maybe i'd rather have the weird fish. At least then I can say the annotation is to blame.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Large number of Unassigned_NoFeature reads from featureCounts

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News