SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   General (http://seqanswers.com/forums/forumdisplay.php?f=16)
-   -   Too many short reads and too little RAM? (http://seqanswers.com/forums/showthread.php?t=14227)

samanta 09-20-2011 09:27 PM

Too many short reads and too little RAM?
 
Someone asked me whether it makes sense to remove duplicate reads to get the library size down to fit RAM limit. I think it is a bad strategy as explained here -

http://www.homolog.us/blogs/2011/09/...n-k-mer-world/

zhidkov.ilia 09-21-2011 12:21 AM

I think duplicated reads removed to avoid biases that resulted from library preparation (for example) and not for reduction of data for de-novo assembly.

Ilia

samanta 09-21-2011 12:35 AM

That's a good point. Some filtering is necessary to take care of pileup of reads due to biases. I do that for alignment and SNP discovery, but think twice about it during de novo assembly. If no underlying genome is known, it is hard to tell whether the duplicated reads come from error or real sequence.

zhidkov.ilia 09-21-2011 01:28 AM

So when you assemble reads in to contigs, you will prefer that at least several reads will support the assembly. If you will have identical reads, you might obtain false contigs.

Ilia

samanta 09-21-2011 03:38 PM

It does not work that way for K-mer based assembler. Would you please explain your rationale? Why would one get false contigs?

zhidkov.ilia 09-22-2011 06:48 AM

Let me rephrase my last comment:
If duplicated reads don't contribute to downstream the de novo assembly pipe, it will be good idea to remove them.

Ilia


All times are GMT -8. The time now is 08:30 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.