Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Error details: Read FCC03A6ABXX:3:2107:11142:198335#TAGCTTAT is missing read group Mike_Brown Bioinformatics 3 01-06-2016 10:41 AM
bwa sampe: the read group line is not started with @RG frankyue50 Bioinformatics 3 11-22-2013 07:18 AM
read group: GATK or BWA option? m_elena_bioinfo Bioinformatics 9 12-09-2012 10:53 AM
BWA memory issues tgenahmet Bioinformatics 3 06-04-2009 03:55 AM

Thread Tools
Old 07-23-2019, 11:16 AM   #1
Location: san francisco

Join Date: Jul 2010
Posts: 11
Default bwa GATK4 read group issues

I am trying to run GATK4 HaplotypeCaller on bam files from single individuals generated with bwa mem on Ubuntu 18 with
$gatk HaplotypeCaller -R Tse_SBAPGDGG_D.fa -I AHP2746_sorted_dedup.bam -ERC GVCF -O AHP2746.gvcf
But I get the error:
A USER ERROR has occurred: Argument --emit-ref-confidence has a bad value: Can only be used in single sample mode currently. Use the --sample-name argument to run on a single sample out of a multi-sample BAM file.
I assume this is a read group error.
if I run
samtools view -H AHP2746_sorted_dedup.bam | grep @RG
I get nothing back.
If I run
samtools view -H AHP2746_sorted_dedup.bam
I get a long list of headers and the bottom look like this:
@SQ     LN:500  SN:scaffold_42026
@SQ     SN:scaffold_42025       LN:500
@SQ     SN:scaffold_42022       LN:500
@SQ     SN:scaffold_42019       LN:500
@SQ     SN:scaffold_42018       LN:500
@SQ     LN:500  SN:scaffold_42017
@SQ     SN:scaffold_42016       LN:500
@SQ     SN:scaffold_42015       LN:500
@SQ     SN:scaffold_42014       LN:500
@SQ     SN:scaffold_42013       LN:500
@SQ     SN:scaffold_42012       LN:500
@SQ     SN:scaffold_42011       LN:500
@SQ     SN:scaffold_42010       LN:500
@SQ     SN:scaffold_42009       LN:500
@SQ     SN:scaffold_42008       LN:500
@SQ     SN:scaffold_42006       LN:500
@SQ     SN:scaffold_42005       LN:500
@SQ     SN:scaffold_42004       LN:500
@SQ     SN:scaffold_42003       LN:500
@SQ     SN:scaffold_42002       LN:500
@SQ     SN:scaffold_42001       LN:500
@SQ     SN:scaffold_42000       LN:500
@SQ     SN:scaffold_41998       LN:500
@SQ     SN:scaffold_41997       LN:500
@SQ     SN:scaffold_41996       LN:500
@SQ     SN:scaffold_41995       LN:500
@PG     PN:bwa  VN:0.7.17-r1188 CL:bwa mem -t 32 Tse_SBAPGDGG_D AHP2746_L003_R1_001.fastq.gz AHP2746_L003_R2_001.fastq.gz ID:bwa
@PG     CL:elprep filter /dev/stdin AHP2746_sorted_dedup.bam --filter-unmapped-reads --mark-duplicates --mark-optical-duplicates AHP2746_output.metrics --optical-duplicates-pixel-distance 100 --remove-duplicates --sorting-order coordinate    PP:bwa  ID:elprep 4.1.3 PN:elprep       VN:4.1.3        DS:
I assume my bam files have no read group defined (I did not use the bwa mem -R option).
I then tried running bwa mem with -R
bwa mem -M -t 32 -R '@RG\tID:sample_1\tSM:AHP2746\tPL:illumina\tPU:lane1\tPI:420' Tse_SBAPGDGG_D AHP2746_L003_R1_001.fastq.gz AHP2746_L003_R2_001.fastq.gz > bwa-mem-Rtest_AHP2746.bam
the analyses completed, but I am using elprep to dedupe and order the bam files and elprep throws the error "2019/07/23 10:07:46 gzip: invalid header in NewBGZFReader"
my elprep command:
elprep filter bwa-mem-Rtest_AHP2746.bam bwa-mem-Rtest_AHP2746-elprep.bam --mark-duplicates --mark-optical-duplicates output.metrics --remove-duplicates --sorting-order coordinate --nr-of-threads 32

Anyone know what I am doing wrong here?

Additional info:
reference genome was generated and assembled with linked reads (10X) sequenced on a NovaSeq and Hi-C data.
the resequencing data were sequenced on HiSeq400 and some on NovaSeq, but all have same issue.
I am using the latest software versions available with conda installs.
wbsimey is offline   Reply With Quote
Old 01-21-2020, 05:48 AM   #2
Junior Member
Location: Europe

Join Date: Jan 2020
Posts: 2


Quite a while since you posted this so if you reached a solution that'd be great to hear.
Either way I'm having the same problem and thought I'd add some things to the

I am however not sure this is related to the RG tag. According to SAM format
specifications, the BC tag is where sample ID is stored so my guess is that is has to do
with this.

However, in my case I haven't specified any BC tag at all and I wonder if this could be
the problem.

SAM format specs:
FrickTobias is offline   Reply With Quote
Old 01-22-2020, 09:38 AM   #3
Junior Member
Location: Europe

Join Date: Jan 2020
Posts: 2

Okay I managed to solve my issue which I'll post here if someone else finds it useful in the future.


Add an arbitrary RG tag.


From reading at their github it seems gatk HaplotypeCaller will call on a utils function to assert the number of samples, which it does by counting the number of RG entries in the header. HaplotypCaller then exits with the error message if this count is not equal to 1, which seemingly also includes when it the utils function returns zero.


Add an RG tag to your header and tag all reads with the RG tag. The fastest solution is to do this with picardtools AddOrReplaceReadGroups.

Solution complication:

When running gatk HaplotypeCaller on my RG tagged file I got an error saying the uncompression length was invalid. I ran picardtools ValidateSamFile which warned that the index file was older than the BAM file, and after rerunning the indexing everything seemingly worked great (using samtools index).


NB: I'm running picardtools/gatk installed through conda, if you're using the .jar file directly just exchange picard/gatk with java --jar path/to/jar-file.jar

picard AddOrReplaceReadGroups I=reads.bam O=reads.tag.bam RGID=4 RGLB=lib1 RGPL=ILLUMINA RGPU=unit1 RGSM=20 

samtools index reads.tag.bam

gatk HaplotypeCaller I=reads.tag.bam -R ref.fa -ERC GVCF -O vars.vcf
PS. I noticed the BC tag is part of the RG tag, so disregard that part of my prior answer.
FrickTobias is offline   Reply With Quote

bwa, gatk4, read group

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 08:43 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO