SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Downstream Cuffdiff analysis apadr007 Bioinformatics 2 11-23-2011 12:12 PM
454 downstream analysis. aloliveira Bioinformatics 5 11-16-2011 06:10 AM
the downstream analysis of RNA-seq Xi Wang RNA Sequencing 18 04-15-2011 07:43 AM
Regarding Unique reads, Unique alignments sridharacharya RNA Sequencing 2 09-20-2010 05:39 AM
Unique VS Non-Unique read analysis samt Bioinformatics 2 09-29-2009 09:44 AM

Reply
 
Thread Tools
Old 06-30-2009, 10:06 AM   #1
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default unique reads for downstream analysis

I had a general query with short fixed length reads. Though I intend it for Solexa data only, it might be directly applicable to solid as well

For analysis after obtaining the set of reads, do people prefer taking a unique non redundant set of reads, before doing analysis like snp discovery, chip-seq, etc? Are the exactly same reads any information?

for de novo assembly, velvet behaves slightly different with a unique set of reads, than with some of the reads repeated.
bioinfosm is offline   Reply With Quote
Old 07-06-2009, 07:20 AM   #2
xwu
Junior Member
 
Location: Los Angeles

Join Date: Dec 2007
Posts: 9
Default

I have similar question. I have seen people mentioning to remove redundant reads for chip-seq to minimize the risk of amplification bias. I tried it, and it certainly impact the peak finding a lot. I guess you should not take unique reads for RNA-seq or small RNA-seq. I will be glad to hear what other people's experience and comment about it.
xwu is offline   Reply With Quote
Old 07-06-2009, 02:51 PM   #3
Annette
Junior Member
 
Location: Europe

Join Date: May 2009
Posts: 3
Default

I can imagine that the answer depends on the project.
If in a low coverage sequencing project I have many reads starting at the same position this would suggest PCR duplicates, especially for paired-end reads (assuming that PE duplicates are those for which both reads are exactly duplicated).

A highly expressed short gene in RNA-seq, on the other hand, will have many reads that start at the same position without them being PCR duplicates.
Just removing them would then lead to an underestimate of expression level.

The recent Sanger paper (Kozarewa et al.) calculated expected duplicate frequencies based on average coverage and read length for whole genome sequencing.
This makes sense, but I think only if you have an even distribution of coverage across the genome.
If something like mtDNA is present that has excess coverage I could get an overestimate of duplicate frequency if I assume they are all due to amplification bias.

So, yes, if it were easy to distinguish between duplicates due to high coverage and PCR duplicates, it might be preferable to eliminate them, at least for SNPs, RNA-Seq where counts matter...

But again, for example in RNA-seq, how to distinguish between duplicates due to high coverage and PCR duplicates ? Maybe calculating an expected duplicate frequency like the Sanger paper but on a gene by gene basis ?

May I ask what you mean by "velvet behaves slightly different" ?
Annette is offline   Reply With Quote
Old 07-07-2009, 01:30 PM   #4
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Thanks for the notes.

Velvet does de novo sequencing and gives different results if you input a non-redundant set of reads, than using all the reads as input
Another de novo tool edena produces a non redundant set of reads before it proceeds with the de novo assembly...
bioinfosm is offline   Reply With Quote
Reply

Tags
downstream, reads, redundant, short, unique

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:58 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO