Hi all,
I have a question regarding the construction of libraries using genomic DNA. Specifically, we had two mate pair libraries and one fragment library (180 bp fragments) constructed and are having some trouble with the fragment library. First, there's a lot of variation in coverage, with some regions showing the expected ~25x coverage but others either not being covered at all or having very low coverage. Second, we ran these libraries through FastQC and came up with several warnings - it looks like something might be going on at the beginning of our reads, with per base sequence content, gc content and kmer content being off in this region (too many As, gc content to low and underrepresentation of a few kmers). It also seems like duplication levels are high in this library and per sequence gc content seems to be skewed. We didn't encounter these problems with the mate pair libraries, suggesting it's not a genome or dna problem. In browsing through this forum it looks like these problems are encountered with transcriptome data but we're using genomic dna.
I've attached files with some of the fastqc results. Has anyone else encountered these problems? Do you have any suggestions for how to overcome them?
Any feedback would be much appreciated.
Thanks,
Kira
I have a question regarding the construction of libraries using genomic DNA. Specifically, we had two mate pair libraries and one fragment library (180 bp fragments) constructed and are having some trouble with the fragment library. First, there's a lot of variation in coverage, with some regions showing the expected ~25x coverage but others either not being covered at all or having very low coverage. Second, we ran these libraries through FastQC and came up with several warnings - it looks like something might be going on at the beginning of our reads, with per base sequence content, gc content and kmer content being off in this region (too many As, gc content to low and underrepresentation of a few kmers). It also seems like duplication levels are high in this library and per sequence gc content seems to be skewed. We didn't encounter these problems with the mate pair libraries, suggesting it's not a genome or dna problem. In browsing through this forum it looks like these problems are encountered with transcriptome data but we're using genomic dna.
I've attached files with some of the fastqc results. Has anyone else encountered these problems? Do you have any suggestions for how to overcome them?
Any feedback would be much appreciated.
Thanks,
Kira