View Single Post
Old 10-18-2016, 07:48 AM   #8
Senior Member
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,079

Originally Posted by horvathdp View Post
Possibly? If that works, then why do people not just use these essentially 1X files for assembly? I normally see a 20 or 30X coverage for assemblies. This all said, do you know of way to just eliminate duplicate entries in a fastq file based on identifiers rather than sequence?
I am not sure what you are referring to here.

If one was certain to have every part of starting material covered (e.g. if we had a theoretical sequencer that started at one end of the chromosome and went through the entire length) then 1x sequencing would be enough. By using 30x you are ensuring that all sequenceable areas would be sampled (and be represented in) your data.

In theory there can be no duplicate entries as far as sequence identifiers go (if you are referring to fastq headers). You would need to cat the same file twice to make a new one.
GenoMax is offline   Reply With Quote