SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   threshold for duplicate removal? (http://seqanswers.com/forums/showthread.php?t=4373)

mard 03-16-2010 05:08 PM

threshold for duplicate removal?
 
I was trying out Picard's MarkDuplicates to remove duplicate reads before SNP identification in our targeted resequencing studies but I discovered that Picard classes non-identical reads that map to the same genomic location (same start and stop) as duplicates and only one read is kept. So this means that only a read from one haplotype can be kept for each location.

We're thinking of testing if it might be better to apply a filter/threshold to keep a certain number of reads that map to the same location instead of discarding all except one. Just wondering if anyone has tried something like this?

bioinfosm 03-18-2010 12:57 PM

Thats an interesting point. I agree with you that only one haplo will be kept in such a filtering. I have been only filtering reads that map to multiple locations; but keep using the duplicates, and guess that brings the PCR-bias in SNP identification (Hets look 30-40% variant, not 50%)

mard 03-21-2010 04:45 PM

Thanks for the information. I'm pretty new to next-gen analysis so am wondering if it's recommended to remove reads that map to multiple locations before SNP calling?


All times are GMT -8. The time now is 08:12 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.