Unconfigured Ad

**swbarnes2** · 05-21-2012, 01:39 PM

With 76-mer single reads, even for a perfectly diverse library, the theortical depth limit at any point is 152 if you use rmdup. So any gene that has more coverage than that ceiling is going to be whacked down to 152x. So you won't be able to quantify expression of those highly expressed genes.

That library sounds awfully non-diverse, but if your sample is dominated by a couple of genes at super high levels, maybe it's accurate. I guess you could examine the highly represented reads. Do they cover whole genes as if the sample had a huge amount of that RNA? Or is there just one position that has 100K reads, and adjacent positions have much less?

**arvid** · 05-22-2012, 12:15 AM

Exactly, I'd have a look at the shape of the read alignments before de-duplication to see whether it looks like PCR or simply very high coverage. 74 % isn't exceptionally high, I usually see 60-80 % for libraries which look OK.
In any case, de-duplication on reads for downstream quantification is a delicate matter, as it is difficult to discern PCR copies from valid, high-coverage, reads as swbarnes2 pointed out.

**inbarpl** · 05-22-2012, 12:28 AM

swbarnes2, Thanks a lot for your answer,
I guess this is exactly the case in my data set, the samples are from Arabidopsis so I guess that Rubisco gene is the dominant in the library. I will check what you've recommended using IGV. Sorry for my ignorance but could you please explain the definition of "theoretical depth limit" and the calculation you did to extract it for my parameters ?
many thanks
Inbar

**swbarnes2** · 05-22-2012, 08:36 AM

Originally posted by inbarpl View Post

swbarnes2, Thanks a lot for your answer,
I guess this is exactly the case in my data set, the samples are from Arabidopsis so I guess that Rubisco gene is the dominant in the library. I will check what you've recommended using IGV. Sorry for my ignorance but could you please explain the definition of "theoretical depth limit" and the calculation you did to extract it for my parameters ?
many thanks
Inbar

If you filter single end data for uniqueness, you will have exactly two reads beginning at every point; one in the forward direction, one in the reverse.

So with 76-mers, the base at position 100 will be covered by 152 reads, 76 in the forward direction, starting at bases 35-100, and 76 in the reverse direction, starting from 100-175. You can't have three reads all running forward, starting at position 75, becuae your rmdup will get rid of two of them.

With paired end, you can have three reads which run in the forward direction starting at base 75, if their mates all start at different sites, because if their mates are at different sites, they must have come from different fragments. So there's a ceiling there too, depending on how variant your insert sizes are, but it's far higher than the ceiling for single read runs.

Topics	Statistics	Last Post
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, Today, 08:59 AM	0 responses 9 views 0 reactions	Last Post by SEQadmin2 Today, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 17 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 30 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM

Unconfigured Ad

duplicate reads in Illumina short, single end reads of RNAseq data

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News