SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem removing duplicate reads? (samtools and picard) cbl Bioinformatics 19 09-17-2015 12:01 PM
Duplicate Reads myronpeto Bioinformatics 7 03-07-2013 08:36 AM
Removing similar sequence reads loba17 Bioinformatics 4 10-17-2011 08:31 AM
Removing duplicate reads for tophat? hong_sunwoo RNA Sequencing 2 10-09-2010 01:46 AM
Removing duplicate reads from multigig .csfasta Bueller_007 Bioinformatics 7 06-26-2010 04:07 PM

Reply
 
Thread Tools
Old 08-03-2011, 08:15 AM   #1
fabrice
Member
 
Location: paris

Join Date: Oct 2009
Posts: 86
Default example for using Picard removing duplicate reads?

Does anybody can give me such example?

java -jar ~/bin/picard/MarkDuplicates.jar REMOVE_DUPLICATES=true METRICS_FILE=dup.txt INPUT=accepted_hits.bam OUTPUT=remove_accepted_hits.bam

Thank you very much.
fabrice is offline   Reply With Quote
Old 08-03-2011, 10:26 AM   #2
C.R.
Member
 
Location: Germany

Join Date: Jun 2010
Posts: 25
Default

Hi,
I usually increase the RAM used, change the validation stringency and define a tmp dir.

My example will look like this:
java -Xmx4g -jar ~/bin/picard/MarkDuplicates.jar INPUT=accepted_hits.bam OUTPUT=remove_accepted_hits.bam METRICS_FILE=dup.txt VALIDATION_STRINGENCY=LENIENT REMOVE_DUPLICATES=true TMP_DIR=/tmp
C.R. is offline   Reply With Quote
Old 08-03-2011, 11:26 AM   #3
fabrice
Member
 
Location: paris

Join Date: Oct 2009
Posts: 86
Default

Thank you for your suggestions.
I am doing RNA-seq data analysis. I am still not very sure it is neccessary to remove the duplication or not.
fabrice is offline   Reply With Quote
Old 08-03-2011, 12:02 PM   #4
fabrice
Member
 
Location: paris

Join Date: Oct 2009
Posts: 86
Default

When I used picard to remove dulications, I used fastqc to check. It still have a high duplication level (63%). Why?
fabrice is offline   Reply With Quote
Old 02-08-2013, 03:22 PM   #5
chongm
Member
 
Location: Canada

Join Date: Sep 2012
Posts: 21
Default

Maybe your library has low complexity?
chongm is offline   Reply With Quote
Old 02-09-2013, 12:02 AM   #6
rosa_dentellare
Member
 
Location: Malaysia

Join Date: Sep 2011
Posts: 10
Default

I'm having trouble with this MarkDuplicates. I hope someone could help. When I didn't include the "VALIDATION_STRINGENCY=LENIENT" I got an error saying "Mate Alignment start should be !=0 because reference name != *". But when I included it there's these error line that keeps changing extremely but its running. Is this normal?
__________________
HTML Code:
<a href="http://www.mylivesignature.com" target="_blank"><img src="http://signatures.mylivesignature.com/54489/368/747C8ACDDDB7178899D9E6BAA765C3FC.png" style="border: 0 !important; background: transparent;"/></a>
rosa_dentellare is offline   Reply With Quote
Old 02-09-2013, 12:26 AM   #7
dariober
Senior Member
 
Location: Cambridge, UK

Join Date: May 2010
Posts: 311
Default

Quote:
Originally Posted by fabrice View Post
Thank you for your suggestions.
I am doing RNA-seq data analysis. I am still not very sure it is neccessary to remove the duplication or not.
I think in RNAseq is normal to see reads piling up at the same position (i.e. duplicates), especially in highly expressed genes and/or with high sequencing depth. So I don't think it is recommendable to remove duplicates. But see also this thread for longer discussion http://seqanswers.com/forums/showthread.php?t=6854.
dariober is offline   Reply With Quote
Old 10-18-2013, 02:55 AM   #8
vd4mindia
Member
 
Location: Milan

Join Date: May 2013
Posts: 40
Default

I need some suggestion about the mark duplicates command using picard tool, can anyone tell me if in the input and out label we can specify path as well or not? as I want to run 6 samples together so I will use it as a script and execute all of them one by one so can I specify path in the input and out like below.

java -Xmx14g -jar /data/PGP/gmelloni/picard-tools-1.84/picard-tools-1.84/MarkDuplicates.jar INPUT=/test/exome/input/SRR062634.sorted.bam OUTPUT=/test/exome/results/SRR062634_marked.sorted.bam REMOVE_DUPLICATES=False METRICS_FILE=metricN.log ASSUME_SORTED=True VALIDATION_STRINGENCY=LENIENT
vd4mindia is offline   Reply With Quote
Old 10-18-2013, 03:19 AM   #9
dariober
Senior Member
 
Location: Cambridge, UK

Join Date: May 2010
Posts: 311
Default

Quote:
Originally Posted by vd4mindia View Post
can anyone tell me if in the input and out label we can specify path as well or not?

java -Xmx14g -jar /data/PGP/gmelloni/picard-tools-1.84/picard-tools-1.84/MarkDuplicates.jar INPUT=/test/exome/input/SRR062634.sorted.bam OUTPUT=/test/exome/results/SRR062634_marked.sorted.bam REMOVE_DUPLICATES=False METRICS_FILE=metricN.log ASSUME_SORTED=True VALIDATION_STRINGENCY=LENIENT
I'm pretty sure you can specify the path, just try and see...
dariober is offline   Reply With Quote
Old 10-18-2013, 03:32 AM   #10
vd4mindia
Member
 
Location: Milan

Join Date: May 2013
Posts: 40
Default

Yes I have queued a script and its still running. Since am working from home my intenet is a bit weak so have to run script in cluster and cant asses the command line script so I was asking but it seems to work for me. Will keep posted if its successful or failed.
vd4mindia is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:21 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO