I have a lot of reads for the genome size at hand. My coverage goes above 300 at times.
How do I get rid of reads (with their mates as well) that have NNNN or that have multiple bases at low quality?
For instance let's say I have 28 million pairs of reads and I want top 33% percent that have the highest qualoty (to optimize for 100x coverage)?
Anyone any idea? I seen I can use FastQC to analyse, but not actually remove.
How do I get rid of reads (with their mates as well) that have NNNN or that have multiple bases at low quality?
For instance let's say I have 28 million pairs of reads and I want top 33% percent that have the highest qualoty (to optimize for 100x coverage)?
Anyone any idea? I seen I can use FastQC to analyse, but not actually remove.
Comment