Seqanswers Leaderboard Ad

**FrickTobias** · 01-21-2020, 05:48 AM

Hi,

Quite a while since you posted this so if you reached a solution that'd be great to hear.
Either way I'm having the same problem and thought I'd add some things to the
discussion.

I am however not sure this is related to the RG tag. According to SAM format
specifications, the BC tag is where sample ID is stored so my guess is that is has to do
with this.

However, in my case I haven't specified any BC tag at all and I wonder if this could be
the problem.

SAM format specs: https://samtools.github.io/hts-specs/SAMv1.pdf

**FrickTobias** · 01-22-2020, 05:43 AM

Okay I managed to solve my issue; so anyone reading this can try it out and judge for themselves if it works for them.

Problem:

The problem is that gatk assumes you've set the RG tag in all your SAM/BAM/CRAM files (which no one but broad institute actually do for single sample libraries). Looking at the current state of gatk HaplotypCaller at their GitHub it seems it will count the number of RG tags in your SAM header and then return the error message from the original post if this number is not exactly equal to one (including cases where it is zero).

Solution:

Add an RG tag to you input file (for example with picardtools AddOrReplaceReadGroups).

Solution complications:

I had a problem where I got instead got "invalid uncompressedLength" when running but from running picard ValidateSamFile it warned me my index file was older than the original file so I re-indexed the RG-tagged file, where it finishes without problems.

Commands used:

NB: I'm using picard installed via conda, if you're using the .jar file directly just replace "picard" with java --jar path/to/picardtools.jar (or gatk with java --jar path/to/gatk.jar).

Code:

picard AddOrReplaceReadGroups I=reads.bam O=reads.tagged.bam RGLB=lib1 RGPU=unit1 RGSM=20 RGPL=ILLUMINA RGID=4c

samtools index reads.tagged.bam 

gatk HaplotypeCaller -R ref.fa -I reads.tagged.bam -ERC GVCF -O vars.vcf

PS. I noticed the BC tag I mentioned earlier is defined under the RG tag so scratch what I wrote about that in the first post.

**FrickTobias** · 01-22-2020, 09:38 AM

Okay I managed to solve my issue which I'll post here if someone else finds it useful in the future.

RL; DR:

Add an arbitrary RG tag.

Problem:

From reading at their github it seems gatk HaplotypeCaller will call on a utils function to assert the number of samples, which it does by counting the number of RG entries in the header. HaplotypCaller then exits with the error message if this count is not equal to 1, which seemingly also includes when it the utils function returns zero.

Solution:

Add an RG tag to your header and tag all reads with the RG tag. The fastest solution is to do this with picardtools AddOrReplaceReadGroups.

Solution complication:

When running gatk HaplotypeCaller on my RG tagged file I got an error saying the uncompression length was invalid. I ran picardtools ValidateSamFile which warned that the index file was older than the BAM file, and after rerunning the indexing everything seemingly worked great (using samtools index).

Code:

NB: I'm running picardtools/gatk installed through conda, if you're using the .jar file directly just exchange picard/gatk with java --jar path/to/jar-file.jar

Code:

picard AddOrReplaceReadGroups I=reads.bam O=reads.tag.bam RGID=4 RGLB=lib1 RGPL=ILLUMINA RGPU=unit1 RGSM=20 

samtools index reads.tag.bam

gatk HaplotypeCaller I=reads.tag.bam -R ref.fa -ERC GVCF -O vars.vcf

PS. I noticed the BC tag is part of the RG tag, so disregard that part of my prior answer.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 55 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 51 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

bwa GATK4 read group issues

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News