I was trying out Picard's MarkDuplicates to remove duplicate reads before SNP identification in our targeted resequencing studies but I discovered that Picard classes non-identical reads that map to the same genomic location (same start and stop) as duplicates and only one read is kept. So this means that only a read from one haplotype can be kept for each location.
We're thinking of testing if it might be better to apply a filter/threshold to keep a certain number of reads that map to the same location instead of discarding all except one. Just wondering if anyone has tried something like this?
We're thinking of testing if it might be better to apply a filter/threshold to keep a certain number of reads that map to the same location instead of discarding all except one. Just wondering if anyone has tried something like this?
Comment