I have a pair of PE read files generated from a Truseq library. It has 33M pairs of 150bp reads. The bacteria is S auerus which has a genome size of 2.8Gbp. My goal is to call protein coding variants.
I am getting Per Base Sequence Quality, Sequence Duplication Level and Kmer Content errors from fastqc.
Does this look like it is only because of over-sequencing such that I don't need to do anything and let picard remove duplicates for me after mapping? If not, can you tell me what else I need to do to fix up the fastq.
Thanks a lot in advance!
I am getting Per Base Sequence Quality, Sequence Duplication Level and Kmer Content errors from fastqc.
Does this look like it is only because of over-sequencing such that I don't need to do anything and let picard remove duplicates for me after mapping? If not, can you tell me what else I need to do to fix up the fastq.
Thanks a lot in advance!
Comment