![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Error details: Read FCC03A6ABXX:3:2107:11142:198335#TAGCTTAT is missing read group | Mike_Brown | Bioinformatics | 3 | 01-06-2016 09:41 AM |
bwa sampe: the read group line is not started with @RG | frankyue50 | Bioinformatics | 3 | 11-22-2013 06:18 AM |
read group: GATK or BWA option? | m_elena_bioinfo | Bioinformatics | 9 | 12-09-2012 09:53 AM |
BWA memory issues | tgenahmet | Bioinformatics | 3 | 06-04-2009 02:55 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: san francisco Join Date: Jul 2010
Posts: 16
|
![]()
Hello,
I am trying to run GATK4 HaplotypeCaller on bam files from single individuals generated with bwa mem on Ubuntu 18 with Code:
$gatk HaplotypeCaller -R Tse_SBAPGDGG_D.fa -I AHP2746_sorted_dedup.bam -ERC GVCF -O AHP2746.gvcf A USER ERROR has occurred: Argument --emit-ref-confidence has a bad value: Can only be used in single sample mode currently. Use the --sample-name argument to run on a single sample out of a multi-sample BAM file. I assume this is a read group error. if I run Code:
samtools view -H AHP2746_sorted_dedup.bam | grep @RG If I run Code:
samtools view -H AHP2746_sorted_dedup.bam Code:
@SQ LN:500 SN:scaffold_42026 @SQ SN:scaffold_42025 LN:500 @SQ SN:scaffold_42022 LN:500 @SQ SN:scaffold_42019 LN:500 @SQ SN:scaffold_42018 LN:500 @SQ LN:500 SN:scaffold_42017 @SQ SN:scaffold_42016 LN:500 @SQ SN:scaffold_42015 LN:500 @SQ SN:scaffold_42014 LN:500 @SQ SN:scaffold_42013 LN:500 @SQ SN:scaffold_42012 LN:500 @SQ SN:scaffold_42011 LN:500 @SQ SN:scaffold_42010 LN:500 @SQ SN:scaffold_42009 LN:500 @SQ SN:scaffold_42008 LN:500 @SQ SN:scaffold_42006 LN:500 @SQ SN:scaffold_42005 LN:500 @SQ SN:scaffold_42004 LN:500 @SQ SN:scaffold_42003 LN:500 @SQ SN:scaffold_42002 LN:500 @SQ SN:scaffold_42001 LN:500 @SQ SN:scaffold_42000 LN:500 @SQ SN:scaffold_41998 LN:500 @SQ SN:scaffold_41997 LN:500 @SQ SN:scaffold_41996 LN:500 @SQ SN:scaffold_41995 LN:500 @PG PN:bwa VN:0.7.17-r1188 CL:bwa mem -t 32 Tse_SBAPGDGG_D AHP2746_L003_R1_001.fastq.gz AHP2746_L003_R2_001.fastq.gz ID:bwa @PG CL:elprep filter /dev/stdin AHP2746_sorted_dedup.bam --filter-unmapped-reads --mark-duplicates --mark-optical-duplicates AHP2746_output.metrics --optical-duplicates-pixel-distance 100 --remove-duplicates --sorting-order coordinate PP:bwa ID:elprep 4.1.3 PN:elprep VN:4.1.3 DS:http://github.com/exascience/elprep I then tried running bwa mem with -R Code:
bwa mem -M -t 32 -R '@RG\tID:sample_1\tSM:AHP2746\tPL:illumina\tPU:lane1\tPI:420' Tse_SBAPGDGG_D AHP2746_L003_R1_001.fastq.gz AHP2746_L003_R2_001.fastq.gz > bwa-mem-Rtest_AHP2746.bam my elprep command: Code:
elprep filter bwa-mem-Rtest_AHP2746.bam bwa-mem-Rtest_AHP2746-elprep.bam --mark-duplicates --mark-optical-duplicates output.metrics --remove-duplicates --sorting-order coordinate --nr-of-threads 32 Anyone know what I am doing wrong here? Additional info: reference genome was generated and assembled with linked reads (10X) sequenced on a NovaSeq and Hi-C data. the resequencing data were sequenced on HiSeq400 and some on NovaSeq, but all have same issue. I am using the latest software versions available with conda installs. |
![]() |
![]() |
![]() |
#2 |
Junior Member
Location: Europe Join Date: Jan 2020
Posts: 2
|
![]()
Hi,
Quite a while since you posted this so if you reached a solution that'd be great to hear. Either way I'm having the same problem and thought I'd add some things to the discussion. I am however not sure this is related to the RG tag. According to SAM format specifications, the BC tag is where sample ID is stored so my guess is that is has to do with this. However, in my case I haven't specified any BC tag at all and I wonder if this could be the problem. SAM format specs: https://samtools.github.io/hts-specs/SAMv1.pdf |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: Europe Join Date: Jan 2020
Posts: 2
|
![]()
Okay I managed to solve my issue which I'll post here if someone else finds it useful in the future.
RL; DR: Add an arbitrary RG tag. Problem: From reading at their github it seems gatk HaplotypeCaller will call on a utils function to assert the number of samples, which it does by counting the number of RG entries in the header. HaplotypCaller then exits with the error message if this count is not equal to 1, which seemingly also includes when it the utils function returns zero. Solution: Add an RG tag to your header and tag all reads with the RG tag. The fastest solution is to do this with picardtools AddOrReplaceReadGroups. Solution complication: When running gatk HaplotypeCaller on my RG tagged file I got an error saying the uncompression length was invalid. I ran picardtools ValidateSamFile which warned that the index file was older than the BAM file, and after rerunning the indexing everything seemingly worked great (using samtools index). Code: NB: I'm running picardtools/gatk installed through conda, if you're using the .jar file directly just exchange picard/gatk with java --jar path/to/jar-file.jar Code:
picard AddOrReplaceReadGroups I=reads.bam O=reads.tag.bam RGID=4 RGLB=lib1 RGPL=ILLUMINA RGPU=unit1 RGSM=20 samtools index reads.tag.bam gatk HaplotypeCaller I=reads.tag.bam -R ref.fa -ERC GVCF -O vars.vcf |
![]() |
![]() |
![]() |
Tags |
bwa, gatk4, read group |
Thread Tools | |
|
|