Dear all,
I'm getting my data via sequencing providers, i.e., I have no own lab.
Having made the switch from 36mers to 75mers last year, I'm finding myself in unexpected trouble. Namely, that when looking at data sets with the same nominal coverage, the coverage variance is now much, much higher with the 75mers than with the old 36mers.
As example: one of our work horses is a 45% GC bacterium which poses no big problems regarding GC content or repetitiveness.
When I did resequencing projects in the past, 30x coverage with 36mers was enough to ensure no holes were left in genome. Also, when there were genome duplications, these could be clearly and easily detected
Nowadays with 75mers, things got really, really nasty. There is now a very clear coverage bias toward low GC regions. It is so strong that one could think they are duplicated (and they clearly are not). Furthermore, having a 30x coverage is not nearly enough anymore to ensure that the whole genome is covered, there are literally hundreds of holes left open. Many of these holes show the infamous GGCxG problem. To get complete coverage I now need to go to at least 70x or 80x ... but this does not solve the problem of false positive genome duplications.
I have attached a PDF with two slides which show what things look like.
Has anyone else made this kind of observation? Any idea what could be the cause ... or what a remedy could be?
Regards,
B.
I'm getting my data via sequencing providers, i.e., I have no own lab.
Having made the switch from 36mers to 75mers last year, I'm finding myself in unexpected trouble. Namely, that when looking at data sets with the same nominal coverage, the coverage variance is now much, much higher with the 75mers than with the old 36mers.
As example: one of our work horses is a 45% GC bacterium which poses no big problems regarding GC content or repetitiveness.
When I did resequencing projects in the past, 30x coverage with 36mers was enough to ensure no holes were left in genome. Also, when there were genome duplications, these could be clearly and easily detected
Nowadays with 75mers, things got really, really nasty. There is now a very clear coverage bias toward low GC regions. It is so strong that one could think they are duplicated (and they clearly are not). Furthermore, having a 30x coverage is not nearly enough anymore to ensure that the whole genome is covered, there are literally hundreds of holes left open. Many of these holes show the infamous GGCxG problem. To get complete coverage I now need to go to at least 70x or 80x ... but this does not solve the problem of false positive genome duplications.
I have attached a PDF with two slides which show what things look like.
Has anyone else made this kind of observation? Any idea what could be the cause ... or what a remedy could be?
Regards,
B.
Comment