Hello everybody,
I used bwa and samtools to map reads on a reference genome and thus obtain several bam files, one for each individual.
I later want to call variants and therefore proceed through the GATK Best Practices.
I am at the step of Indel Realignment.
I checked my BAM file with Picard command "ValidateSamFile"
and obtain a Read group missing error.
I thus added groups using Picard command "AddOrReplaceReadGroups" :
Now I try the validation again but I have this new error for a lot of reads :
ERROR: Record 415, Read name HWI-ST1206:14:C296WACXX:6:1301:7041:38865, NM tag (nucleotide differences) in file [2] does not match reality [3]
If I understand correctly NM is the number of mismatch between the read and the reference. So it would mean that the number of mismatch found between the read and the reference and saved in the NM tag is not the real one.
How is this possible ?
I am wondering at what step the NM tag is saved ? And the other tags ?
Are they necessary for calling variants with GATK ?
Thanks a lot
I used bwa and samtools to map reads on a reference genome and thus obtain several bam files, one for each individual.
I later want to call variants and therefore proceed through the GATK Best Practices.
I am at the step of Indel Realignment.
I checked my BAM file with Picard command "ValidateSamFile"
Code:
java -Xmx20g -jar /home/grosbalm/Scripts/ValidateSamFile.jar INPUT=Pd115_S1_t2_M1_f_d_RG.bam OUTPUT=out.bam REFERENCE_SEQUENCE=/data3/users/grosbalm/IlluminaData/Ref/AlMssallem/Pdac_ref2013s.fasta/Pdac_ref2013s.fasta
I thus added groups using Picard command "AddOrReplaceReadGroups" :
Code:
java -Xmx20g -jar /home/grosbalm/Scripts/AddOrReplaceReadGroups.jar I=../MarkDupli/Pd115_S1_t2_M1_f_d.bam O=Pd115_S1_t2_M1_f_d_RG.bam LB=Pd115 PL=ILLUMINA PU=Seq1 SM=Pd115_S1_t2_M1
ERROR: Record 415, Read name HWI-ST1206:14:C296WACXX:6:1301:7041:38865, NM tag (nucleotide differences) in file [2] does not match reality [3]
If I understand correctly NM is the number of mismatch between the read and the reference. So it would mean that the number of mismatch found between the read and the reference and saved in the NM tag is not the real one.
How is this possible ?
I am wondering at what step the NM tag is saved ? And the other tags ?
Are they necessary for calling variants with GATK ?
Thanks a lot
Comment