Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Downstream Cuffdiff analysis apadr007 Bioinformatics 2 11-23-2011 01:12 PM
454 downstream analysis. aloliveira Bioinformatics 5 11-16-2011 07:10 AM
the downstream analysis of RNA-seq Xi Wang RNA Sequencing 18 04-15-2011 08:43 AM
Regarding Unique reads, Unique alignments sridharacharya RNA Sequencing 2 09-20-2010 06:39 AM
Unique VS Non-Unique read analysis samt Bioinformatics 2 09-29-2009 10:44 AM

Thread Tools
Old 06-30-2009, 11:06 AM   #1
Senior Member
Location: USA

Join Date: Jan 2008
Posts: 482
Default unique reads for downstream analysis

I had a general query with short fixed length reads. Though I intend it for Solexa data only, it might be directly applicable to solid as well

For analysis after obtaining the set of reads, do people prefer taking a unique non redundant set of reads, before doing analysis like snp discovery, chip-seq, etc? Are the exactly same reads any information?

for de novo assembly, velvet behaves slightly different with a unique set of reads, than with some of the reads repeated.
bioinfosm is offline   Reply With Quote
Old 07-06-2009, 08:20 AM   #2
Junior Member
Location: Los Angeles

Join Date: Dec 2007
Posts: 9

I have similar question. I have seen people mentioning to remove redundant reads for chip-seq to minimize the risk of amplification bias. I tried it, and it certainly impact the peak finding a lot. I guess you should not take unique reads for RNA-seq or small RNA-seq. I will be glad to hear what other people's experience and comment about it.
xwu is offline   Reply With Quote
Old 07-06-2009, 03:51 PM   #3
Junior Member
Location: Europe

Join Date: May 2009
Posts: 3

I can imagine that the answer depends on the project.
If in a low coverage sequencing project I have many reads starting at the same position this would suggest PCR duplicates, especially for paired-end reads (assuming that PE duplicates are those for which both reads are exactly duplicated).

A highly expressed short gene in RNA-seq, on the other hand, will have many reads that start at the same position without them being PCR duplicates.
Just removing them would then lead to an underestimate of expression level.

The recent Sanger paper (Kozarewa et al.) calculated expected duplicate frequencies based on average coverage and read length for whole genome sequencing.
This makes sense, but I think only if you have an even distribution of coverage across the genome.
If something like mtDNA is present that has excess coverage I could get an overestimate of duplicate frequency if I assume they are all due to amplification bias.

So, yes, if it were easy to distinguish between duplicates due to high coverage and PCR duplicates, it might be preferable to eliminate them, at least for SNPs, RNA-Seq where counts matter...

But again, for example in RNA-seq, how to distinguish between duplicates due to high coverage and PCR duplicates ? Maybe calculating an expected duplicate frequency like the Sanger paper but on a gene by gene basis ?

May I ask what you mean by "velvet behaves slightly different" ?
Annette is offline   Reply With Quote
Old 07-07-2009, 02:30 PM   #4
Senior Member
Location: USA

Join Date: Jan 2008
Posts: 482

Thanks for the notes.

Velvet does de novo sequencing and gives different results if you input a non-redundant set of reads, than using all the reads as input
Another de novo tool edena produces a non redundant set of reads before it proceeds with the de novo assembly...
bioinfosm is offline   Reply With Quote

downstream, reads, redundant, short, unique

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 01:09 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO