I am looking for a quick and simple way to identify regions of zero coverage in a sequencing set. My ultimate goal is to try and determine whether there might be genomic features which contribute to this.
I've used both GATK and Samtools depth to identify the depth of coverage at each base position across my genome. What I'd like is to have the base positions for each REGION which are not covered (ie. 2307-2412). I have depth of coverage on many samples, and would like to have a quick way of getting this information across all of them. Are there any functions out there which could do this? Am I better off just sticking with some sort of bash command?
Below is a sample of what the output of GATK is (depthofcoverage) as an example.
Locus Total_Depth Average_Depth_sample Depth_for_s1 Depth_for_s2 Depth_for_s3
genome:1 283 94.33 82 111 90
genome:2 284 94.67 82 111 91
genome:3 285 95.00 82 112 91
I've used both GATK and Samtools depth to identify the depth of coverage at each base position across my genome. What I'd like is to have the base positions for each REGION which are not covered (ie. 2307-2412). I have depth of coverage on many samples, and would like to have a quick way of getting this information across all of them. Are there any functions out there which could do this? Am I better off just sticking with some sort of bash command?
Below is a sample of what the output of GATK is (depthofcoverage) as an example.
Locus Total_Depth Average_Depth_sample Depth_for_s1 Depth_for_s2 Depth_for_s3
genome:1 283 94.33 82 111 90
genome:2 284 94.67 82 111 91
genome:3 285 95.00 82 112 91
Comment