SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Error with MarkDuplicates in Picard slowsmile Bioinformatics 13 11-01-2015 04:16 AM
How to use Picard's MarkDuplicates cliff Bioinformatics 12 01-26-2015 11:56 PM
Picard's MarkDuplicates -> OutOfMemoryError elgor Bioinformatics 15 08-05-2013 07:37 AM
MarkDuplicates in picard bair Bioinformatics 3 12-23-2010 12:00 PM
picard markduplicates on huge files rcorbett Bioinformatics 2 09-17-2010 05:39 AM

Reply
 
Thread Tools
Old 04-24-2010, 09:56 PM   #1
wangzkai
Member
 
Location: Southern California, USA

Join Date: Feb 2010
Posts: 11
Default Picard MarkDuplicates

This may be a naive question, but I was trying to figure out whether I should set "REMOVE_DUPLICATES" to true or false when using picard's "MarkDuplicates" to remove duplicate reads. Since I want to subsequently call variants using samtools pileup, I am not sure whether samtools pileup will then remove from consideration these duplicate reads that are marked by flags when it calls SNPs.

By setting the "REMOVE_DUPLICATES=true", my understanding is that the duplicates read will not even be written to the output file, which sounds a bit safer ...

Thanks for any insight on this!
wangzkai is offline   Reply With Quote
Old 04-25-2010, 10:32 AM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by wangzkai View Post
This may be a naive question, but I was trying to figure out whether I should set "REMOVE_DUPLICATES" to true or false when using picard's "MarkDuplicates" to remove duplicate reads. Since I want to subsequently call variants using samtools pileup, I am not sure whether samtools pileup will then remove from consideration these duplicate reads that are marked by flags when it calls SNPs.

By setting the "REMOVE_DUPLICATES=true", my understanding is that the duplicates read will not even be written to the output file, which sounds a bit safer ...

Thanks for any insight on this!
Save the duplicates so you never lose any data. You can them use the "-m" option in 'samtools pileup' to filter reads based on their flag (including duplicates: 0x400). If you need to make sure that your "FLAG" is set correctly, see http://picard.sourceforge.net/explain-flags.html.
nilshomer is offline   Reply With Quote
Old 05-18-2010, 10:14 PM   #3
xguo
Member
 
Location: Maryland

Join Date: Jul 2008
Posts: 48
Default

Quote:
Originally Posted by nilshomer View Post
Save the duplicates so you never lose any data. You can them use the "-m" option in 'samtools pileup' to filter reads based on their flag (including duplicates: 0x400). If you need to make sure that your "FLAG" is set correctly, see http://picard.sourceforge.net/explain-flags.html.
Is the read flag included in pileup file? I'm thinking to call SNP using all reads and then filter them based on the number of non-duplicate reads supporting a SNP.
xguo is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:33 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO