SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   General (http://seqanswers.com/forums/forumdisplay.php?f=16)
-   -   Too many short reads and too little RAM? (http://seqanswers.com/forums/showthread.php?t=14227)

samanta 09-20-2011 08:27 PM

Too many short reads and too little RAM?
 
Someone asked me whether it makes sense to remove duplicate reads to get the library size down to fit RAM limit. I think it is a bad strategy as explained here -

http://www.homolog.us/blogs/2011/09/...n-k-mer-world/

zhidkov.ilia 09-20-2011 11:21 PM

I think duplicated reads removed to avoid biases that resulted from library preparation (for example) and not for reduction of data for de-novo assembly.

Ilia

samanta 09-20-2011 11:35 PM

That's a good point. Some filtering is necessary to take care of pileup of reads due to biases. I do that for alignment and SNP discovery, but think twice about it during de novo assembly. If no underlying genome is known, it is hard to tell whether the duplicated reads come from error or real sequence.

zhidkov.ilia 09-21-2011 12:28 AM

So when you assemble reads in to contigs, you will prefer that at least several reads will support the assembly. If you will have identical reads, you might obtain false contigs.

Ilia

samanta 09-21-2011 02:38 PM

It does not work that way for K-mer based assembler. Would you please explain your rationale? Why would one get false contigs?

zhidkov.ilia 09-22-2011 05:48 AM

Let me rephrase my last comment:
If duplicated reads don't contribute to downstream the de novo assembly pipe, it will be good idea to remove them.

Ilia


All times are GMT -8. The time now is 05:46 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.