I have a SAM file containing alignments of NGS paired reads against a set of assembled contigs (de novo).
If I might need to infer some information about the coverage of these contigs from the SAM file (e.g. inferring that contigs are unique ones or collapsed repeats with some copy numbers (repeat counts) in the genome), should I mark and remove duplicates either from SAM file (using Picard MarkDuplicates or SAMtools redup) or from reads sequences (FastUniq) or just keep everything as it is to not loose anything important that might affect downstream analyses?
Would you please share your opinion with me and let me know pros and cons of duplicate removal in this case?
Thanks.
If I might need to infer some information about the coverage of these contigs from the SAM file (e.g. inferring that contigs are unique ones or collapsed repeats with some copy numbers (repeat counts) in the genome), should I mark and remove duplicates either from SAM file (using Picard MarkDuplicates or SAMtools redup) or from reads sequences (FastUniq) or just keep everything as it is to not loose anything important that might affect downstream analyses?
Would you please share your opinion with me and let me know pros and cons of duplicate removal in this case?
Thanks.