Default Error "RG ID on SAMRecord not found in header" from Picard's MarkDuplicates.jar‏

I am trying the following command:

java -Xmx4g -jar /home/picard-tools-1.35/MarkDuplicates.jar INPUT=$dir/input.bam OUTPUT=$dir/output.bam TMP_DIR=/home/temp METRICS_FILE=PCR_duplicates REMOVE_DUPLICATES=true ASSUME_SORTED=true VALIDATION_STRINGENCY=LENIENT

And getting lots of errors like "RG ID on SAMRecord not found in header".

The command is still running and I haven't seen output.bam coming out... From what I saw previously, it is most likely that I will end up getting no output when the command stops.

I used BWA for reads mapping, then SAMtools for sorting reads and merge reads from different lanes or different Illumina flowcells. For each bam file, I used the corresponding flowcell name as the prefix, e.g., FC124_1_sorted.bam.

Since I merged different flowcells from the same Illumina machine, I used "samtools merge -r". Based on SAMtools manual, "merge -r" can attach an RG tag to each alignment. The tag value is inferred from file names.

I am running commands in a cluster node which has 4GB RAM and 10Tb space and I assume the memory and space should be enough for picard.

Anyone knows how to fix the error "RG ID on SAMRecord not found in header"?


