SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Error with MarkDuplicates in Picard slowsmile Bioinformatics 13 11-01-2015 04:16 AM
How to use Picard's MarkDuplicates cliff Bioinformatics 12 01-26-2015 11:56 PM
Error "RG ID on SAMRecord not found in header" from Picard's MarkDuplicates.jar‏ cliff Bioinformatics 4 11-10-2011 04:27 AM
MarkDuplicates in picard bair Bioinformatics 3 12-23-2010 12:00 PM
Picard MarkDuplicates wangzkai Bioinformatics 2 05-18-2010 10:14 PM

Reply
 
Thread Tools
Old 08-19-2010, 10:31 AM   #1
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default Picard MarkDuplicates error for RNA-Seq

I am trying to remove PCR duplicates using Picard.
My header looks like this:
Code:
@HD	VN:1.0	SO:sorted
@PG	ID:TopHat	VN:1.0.14	CL:/home/guo/bin/tophat -G ../../hg19.GFF3 -g 1 -o Dex -r 160 --solexa1.3-quals -p 4 ../../../../bowtie-0.12.3/indexes/hg19 ../../FASTQ/KUMC_PE_RNASEQ_sample6_1_sequence.txt ../../FASTQ/KUMC_PE_RNASEQ_sample6_2_sequence.txt
@SQ	SN:chr1	LN:249250621
@SQ	SN:chr2	LN:243199373
@SQ	SN:chr3	LN:198022430
@SQ	SN:chr4	LN:191154276
@SQ	SN:chr5	LN:180915260
@SQ	SN:chr6	LN:171115067
@SQ	SN:chr7	LN:159138663
@SQ	SN:chr8	LN:146364022
@SQ	SN:chr9	LN:141213431
@SQ	SN:chr10	LN:135534747
@SQ	SN:chr11	LN:135006516
@SQ	SN:chr12	LN:133851895
@SQ	SN:chr13	LN:115169878
@SQ	SN:chr14	LN:107349540
@SQ	SN:chr15	LN:102531392
@SQ	SN:chr16	LN:90354753
@SQ	SN:chr17	LN:81195210
@SQ	SN:chr18	LN:78077248
@SQ	SN:chr19	LN:59128983
@SQ	SN:chr20	LN:63025520
@SQ	SN:chr21	LN:48129895
@SQ	SN:chr22	LN:51304566
@SQ	SN:chrX	LN:155270560
@SQ	SN:chrY	LN:59373566
@SQ	SN:chrM	LN:16571
But I get this error when I run Picard.
Code:
net.sf.picard.sam.MarkDuplicates INPUT=Dex.bam OUTPUT=Dex.NoDup.bam METRICS_FILE=Dex.Metrics REMOVE_DUPLICATES=true ASSUME_SORTED=false    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 TMP_DIR=/tmp/shart3 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false
INFO	2010-08-19 12:25:28	MarkDuplicates	Start of doWork freeMemory: 31023560; totalMemory: 31588352; maxMemory: 620756992
INFO	2010-08-19 12:25:28	MarkDuplicates	Reading input file and constructing read end information.
INFO	2010-08-19 12:25:28	MarkDuplicates	Will retain up to 2463321 data points before spilling to disk.
[Thu Aug 19 12:25:28 CDT 2010] net.sf.picard.sam.MarkDuplicates done.
Runtime.totalMemory()=51314688
Exception in thread "main" java.lang.IllegalArgumentException: No enum const class net.sf.samtools.SAMFileHeader$SortOrder.sorted
Samtools can read it ok and it tells me :
Code:
46021417 in total
0 QC failure
0 duplicates
46021417 mapped (100.00%)
46021417 paired in sequencing
23145972 read1
22875445 read2
33385558 properly paired (72.54%)
39196142 with itself and mate mapped
6825275 singletons (14.83%)
0 with mate mapped to a different chr
0 with mate mapped to a different chr (mapQ>=5)
Why can't Picard Read it?
RockChalkJayhawk is offline   Reply With Quote
Old 08-19-2010, 02:26 PM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

The value for the sort order tag "SO" is sorted, whereas the SAM specification lists "unsorted", "queryname" or "coordinate" as allowable values. Picard validates SAM/BAM files, while samtools does not (as much).

Looks like a bug in tophat.
nilshomer is offline   Reply With Quote
Old 08-19-2010, 02:33 PM   #3
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default

Quote:
Originally Posted by nilshomer View Post
The value for the sort order tag "SO" is sorted, whereas the SAM specification lists "unsorted", "queryname" or "coordinate" as allowable values. Picard validates SAM/BAM files, while samtools does not (as much).

Looks like a bug in tophat.
Thanks NIls, I just figured that out. Also, another bug with Tophat is that it doesn't write the MRNM, MPOS, or ISIZE for PEs in the SAM file. So, I will try to re-align them using Bowtie to see if I can overcome this.

Interestingly, I also ran SOAPals, which did the same thing when I converted it to SAM. Is there any RNA-Seq aligner that outputs these data in SAM?
RockChalkJayhawk is offline   Reply With Quote
Old 07-11-2012, 09:00 AM   #4
newbietonextgen
Member
 
Location: USA

Join Date: Nov 2010
Posts: 56
Default

I am having the same problem. RNA-seq data. Have tried even Splice map and the sam files are just crap cannot use it down stream. Did you find a way around this? If so kindly, let me know.
newbietonextgen is offline   Reply With Quote
Old 07-11-2012, 09:03 AM   #5
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default

We've switched to the updated Tophat2, which seems to work well.
RockChalkJayhawk is offline   Reply With Quote
Old 07-11-2012, 09:13 AM   #6
newbietonextgen
Member
 
Location: USA

Join Date: Nov 2010
Posts: 56
Default

what version if you can add? I used 2.0.0.4 and my SAM/BAM files were not compatible with GATK. Same data aligned using SHRIMP, no problem works like a charm..
newbietonextgen is offline   Reply With Quote
Old 07-11-2012, 03:07 PM   #7
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default

Quote:
Originally Posted by newbietonextgen View Post
what version if you can add? I used 2.0.0.4 and my SAM/BAM files were not compatible with GATK. Same data aligned using SHRIMP, no problem works like a charm..
can you post the error message from GATK?
RockChalkJayhawk is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:34 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO