Hello,
I am trying to run GATK4 HaplotypeCaller on bam files from single individuals generated with bwa mem on Ubuntu 18 with
But I get the error:
A USER ERROR has occurred: Argument --emit-ref-confidence has a bad value: Can only be used in single sample mode currently. Use the --sample-name argument to run on a single sample out of a multi-sample BAM file.
I assume this is a read group error.
if I run
I get nothing back.
If I run
I get a long list of headers and the bottom look like this:
I assume my bam files have no read group defined (I did not use the bwa mem -R option).
I then tried running bwa mem with -R
the analyses completed, but I am using elprep to dedupe and order the bam files and elprep throws the error "2019/07/23 10:07:46 gzip: invalid header in NewBGZFReader"
my elprep command:
Anyone know what I am doing wrong here?
Additional info:
reference genome was generated and assembled with linked reads (10X) sequenced on a NovaSeq and Hi-C data.
the resequencing data were sequenced on HiSeq400 and some on NovaSeq, but all have same issue.
I am using the latest software versions available with conda installs.
I am trying to run GATK4 HaplotypeCaller on bam files from single individuals generated with bwa mem on Ubuntu 18 with
Code:
$gatk HaplotypeCaller -R Tse_SBAPGDGG_D.fa -I AHP2746_sorted_dedup.bam -ERC GVCF -O AHP2746.gvcf
A USER ERROR has occurred: Argument --emit-ref-confidence has a bad value: Can only be used in single sample mode currently. Use the --sample-name argument to run on a single sample out of a multi-sample BAM file.
I assume this is a read group error.
if I run
Code:
samtools view -H AHP2746_sorted_dedup.bam | grep @RG
If I run
Code:
samtools view -H AHP2746_sorted_dedup.bam
Code:
@SQ LN:500 SN:scaffold_42026 @SQ SN:scaffold_42025 LN:500 @SQ SN:scaffold_42022 LN:500 @SQ SN:scaffold_42019 LN:500 @SQ SN:scaffold_42018 LN:500 @SQ LN:500 SN:scaffold_42017 @SQ SN:scaffold_42016 LN:500 @SQ SN:scaffold_42015 LN:500 @SQ SN:scaffold_42014 LN:500 @SQ SN:scaffold_42013 LN:500 @SQ SN:scaffold_42012 LN:500 @SQ SN:scaffold_42011 LN:500 @SQ SN:scaffold_42010 LN:500 @SQ SN:scaffold_42009 LN:500 @SQ SN:scaffold_42008 LN:500 @SQ SN:scaffold_42006 LN:500 @SQ SN:scaffold_42005 LN:500 @SQ SN:scaffold_42004 LN:500 @SQ SN:scaffold_42003 LN:500 @SQ SN:scaffold_42002 LN:500 @SQ SN:scaffold_42001 LN:500 @SQ SN:scaffold_42000 LN:500 @SQ SN:scaffold_41998 LN:500 @SQ SN:scaffold_41997 LN:500 @SQ SN:scaffold_41996 LN:500 @SQ SN:scaffold_41995 LN:500 @PG PN:bwa VN:0.7.17-r1188 CL:bwa mem -t 32 Tse_SBAPGDGG_D AHP2746_L003_R1_001.fastq.gz AHP2746_L003_R2_001.fastq.gz ID:bwa @PG CL:elprep filter /dev/stdin AHP2746_sorted_dedup.bam --filter-unmapped-reads --mark-duplicates --mark-optical-duplicates AHP2746_output.metrics --optical-duplicates-pixel-distance 100 --remove-duplicates --sorting-order coordinate PP:bwa ID:elprep 4.1.3 PN:elprep VN:4.1.3 DS:http://github.com/exascience/elprep
I then tried running bwa mem with -R
Code:
bwa mem -M -t 32 -R '@RG\tID:sample_1\tSM:AHP2746\tPL:illumina\tPU:lane1\tPI:420' Tse_SBAPGDGG_D AHP2746_L003_R1_001.fastq.gz AHP2746_L003_R2_001.fastq.gz > bwa-mem-Rtest_AHP2746.bam
my elprep command:
Code:
elprep filter bwa-mem-Rtest_AHP2746.bam bwa-mem-Rtest_AHP2746-elprep.bam --mark-duplicates --mark-optical-duplicates output.metrics --remove-duplicates --sorting-order coordinate --nr-of-threads 32
Anyone know what I am doing wrong here?
Additional info:
reference genome was generated and assembled with linked reads (10X) sequenced on a NovaSeq and Hi-C data.
the resequencing data were sequenced on HiSeq400 and some on NovaSeq, but all have same issue.
I am using the latest software versions available with conda installs.
Comment