Hi
Hopefully someone knows of some software or a script that will save me having to write one….
I want to put together some summary stats for my RAD sequence data (70bp Illumina single reads with a restriction site at one end). The reads are spread throughout the genome depending on the presence of the restriction site, and stack up directly on one another rather than forming contigs. I want to get a count of the number of reads for each particular sequence, so that I can get a frequency distribution of read coverage. I’ve tried using GATK DepthofCoverage walker, but this gives coverage per base relative to the reference, so it gives the sum of two sequence’s coverage where the sequences overlap.
My data is in SAM format, but I can’t simply do a count on chromosome position because the reverse strand sequences are 70bp away from the forward sequences.
Any ideas?
Thanks
Sam
Hopefully someone knows of some software or a script that will save me having to write one….
I want to put together some summary stats for my RAD sequence data (70bp Illumina single reads with a restriction site at one end). The reads are spread throughout the genome depending on the presence of the restriction site, and stack up directly on one another rather than forming contigs. I want to get a count of the number of reads for each particular sequence, so that I can get a frequency distribution of read coverage. I’ve tried using GATK DepthofCoverage walker, but this gives coverage per base relative to the reference, so it gives the sum of two sequence’s coverage where the sequences overlap.
My data is in SAM format, but I can’t simply do a count on chromosome position because the reverse strand sequences are 70bp away from the forward sequences.
Any ideas?
Thanks
Sam
Comment