Seqanswers Leaderboard Ad

**ulz_peter** · 01-30-2012, 03:47 AM

Would you mind posting the error message?

What I notice: you can't do the table recalibration without generatin the recal_data.csv file through the CountCovariate utility from GATK..

**fc35802** · 01-30-2012, 03:49 AM

Humm, probably that's the problem.

The error message is this

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 1.3-21-gcb284ee):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
##### ERROR
##### ERROR MESSAGE: SAM/BAM file sample02187A_align_sorted.bam.bam is malformed: SAM file doesn't have any read groups defined in the header. The GATK no longer supports SAM files without read groups
##### ERROR ------------------------------------------------------------------------------------------

**ulz_peter** · 01-30-2012, 04:35 AM

Ahh. My favourite GATK error:
In order to perform analyses correctly with the GATK you need to specify a read group tag when aligning your sequences with BWA. Just add the following argument within the bwa sampe step

Code:

-r "@RG\tID:sample\tLB:sample\tPL:ILLUMINA\tSM:sample"

RG defines a ReadGroup, ID: specifies the name of the ReadGroup, LB defines the name of the library sequenced and SM defines the individual sample, PL defines the platform you used. You can leave all of them as sample (or anything else) in case you really got a single sample you wish to analyze.

You might wonder what that could be good for: These tags are used when merging BAM files from different samples to distinguish between the different samples.

However, although the error message states the thing I explained above, you should carry out the CountCovariate step before the TableRecalibration step, otherwise you have no recal_file (you specified it as SM.recal_data_sample.csv)

Hope that helps...

**fc35802** · 01-30-2012, 06:55 AM

Thanks for your help.
I made the change that you suggested in the sampe step.
It work all the way until I try to CoutCovariates, he said that :

##### ERROR MESSAGE: Lexicographically sorted human genome sequence detected in reads.
##### ERROR For safety's sake the GATK requires human contigs in karyotypic order: 1, 2, ..., 10, 11, ..., 20, 21, 22, X, Y with M either leading or trailing these contigs.
##### ERROR This is because all distributed GATK resources are sorted in karyotypic order, and your processing will fail when you need to use these files.
##### ERROR You can use the ReorderSam utility to fix this problem: http://www.broadinstitute.org/gsa/wi...php/ReorderSam
##### ERROR reads contigs = [chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr1, chr20, chr21, chr22, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chrM, chrX, chrY]
##### ERROR ------------------------------------------------------------------------------------------
But I already ReOrderedSam

java -jar /GenoStorage/Software/picard-tools/ReorderSam.jar I=Sample_align_sorted.bam O=Sample.karyotypic.bam REFERENCE=/GenoStorage/Genomas/hg19/hg19RefGenome.fasta

**ulz_peter** · 01-30-2012, 11:39 PM

Ok, that's something else. GATK wants to have the chromosomes ordered this way: chr1, chr2, chr3, ..., chrX, chrY, chrM. It seems your reference fasta file contains the chromosomes in a lexikographical ordering chr10 directly after chr1. When you reorder your SAM file ot orders the reads according to the order in your reference fasta file. You could download the single-chromosome fasta files from UCSC (http://hgdownload.cse.ucsc.edu/golde...9/chromosomes/) and order them using cat like that:

Code:

cat chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY chrM > hg19.fa

It might be working to just do the ReorderSam program again with the newly sorted reference fasta file, otherwise you'd need to repeat alignment. In case you're planning a pipeline you might want to have the reference file in order to save the ReorderSam step...

Hope that helps,
Peter

**fc35802** · 02-01-2012, 03:08 AM

I made your suggestion with the chromosomes.
Now I'm doing the ReorderSam

java -jar /GenoStorage/Software/picard-tools/ReorderSam.jar I=sample02187A_align.bam O=sample02187A.kayrotypic.bam REFERENCE=/GenoStorage/Genomas/hg19/hg19RefGenome.fa
[Wed Feb 01 11:04:17 WET 2012] net.sf.picard.sam.ReorderSam INPUT=sample02187A_align.bam OUTPUT=sample02187A.kayrotypic.bam REFERENCE=/GenoStorage/Genomas/hg19/hg19RefGenome.fa ALLOW_INCOMPLETE_DICT_CONCORDANCE=false ALLOW_CONTIG_LENGTH_DISCORDANCE=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Wed Feb 01 11:04:17 WET 2012] Executing as meusebio@IMMGene1 on Linux 2.6.32-71.29.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.6.0_22-b22
[Wed Feb 01 11:04:17 WET 2012] net.sf.picard.sam.ReorderSam done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=1518993408
Exception in thread "main" net.sf.picard.PicardException: Mismatch between sequence dictionary fasta index for /GenoStorage/Genomas/hg19/hg19RefGenome.fa, sequence 'chr10' != 'chr1'.
at net.sf.picard.reference.IndexedFastaSequenceFile.sanityCheckDictionaryAgainstIndex(IndexedFastaSequenceFile.java:143)
at net.sf.picard.reference.IndexedFastaSequenceFile.<init>(IndexedFastaSequenceFile.java:86)
at net.sf.picard.reference.IndexedFastaSequenceFile.<init>(IndexedFastaSequenceFile.java:95)
at net.sf.picard.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:75)
at net.sf.picard.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:58)
at net.sf.picard.sam.ReorderSam.doWork(ReorderSam.java:87)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:175)
at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:118)
at net.sf.picard.sam.ReorderSam.main(ReorderSam.java:77)

I already tried to do a dictionary like they say, but it doesn't work also

java -jar /GenoStorage/Software/picard-tools/CreateSequenceDictionary.jar REFERENCE=/GenoStorage/Genomas/hg19/hg19RefGenome.fa O=dictionary.bam
[Wed Feb 01 11:06:04 WET 2012] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/GenoStorage/Genomas/hg19/hg19RefGenome.fa OUTPUT=dictionary.bam TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Wed Feb 01 11:06:04 WET 2012] Executing as meusebio@IMMGene1 on Linux 2.6.32-71.29.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.6.0_22-b22
[Wed Feb 01 11:06:04 WET 2012] net.sf.picard.sam.CreateSequenceDictionary done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=1518993408
Exception in thread "main" net.sf.picard.PicardException: Mismatch between sequence dictionary fasta index for /GenoStorage/Genomas/hg19/hg19RefGenome.fa, sequence 'chr10' != 'chr1'.
at net.sf.picard.reference.IndexedFastaSequenceFile.sanityCheckDictionaryAgainstIndex(IndexedFastaSequenceFile.java:143)
at net.sf.picard.reference.IndexedFastaSequenceFile.<init>(IndexedFastaSequenceFile.java:86)
at net.sf.picard.reference.IndexedFastaSequenceFile.<init>(IndexedFastaSequenceFile.java:95)
at net.sf.picard.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:75)
at net.sf.picard.sam.CreateSequenceDictionary.makeSequenceDictionary(CreateSequenceDictionary.java:128)
at net.sf.picard.sam.CreateSequenceDictionary.doWork(CreateSequenceDictionary.java:113)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:175)
at net.sf.picard.sam.CreateSequenceDictionary.main(CreateSequenceDictionary.java:93)

**ulz_peter** · 02-01-2012, 03:25 AM

ok. see this line:

Code:

Exception in thread "main" net.sf.picard.PicardException: Mismatch between sequence dictionary fasta index for /GenoStorage/Genomas/hg19/hg19RefGenome.fa, sequence 'chr10' != 'chr1'.

He states you need to index your newly created reference fasta file, as the old index you've created (based on the wrong order) is not comaptible with the new order.

So index your fasta file ( I think I used tabix for that but I'm not sure). However the GATK tells you how to prepare input reference files here:

http://www.broadinstitute.org/gsa/wiki/index.php/Preparing_the_essential_GATK_input_files:_the_reference_genome

**fc35802** · 02-01-2012, 03:38 AM

I think I'm really bad at this

[meusebio@IMMGene1 meusebio]$ samtools faidx /GenoStorage/Genomas/hg19/hg19RefGenome.fa
[meusebio@IMMGene1 meusebio]$ java -jar /GenoStorage/Software/picard-tools/CreateSequenceDictionary.jar REFERENCE=/GenoStorage/Genomas/hg19/hg19RefGenome.fa O=hg19RefGenome.dict
[Wed Feb 01 11:34:44 WET 2012] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/GenoStorage/Genomas/hg19/hg19RefGenome.fa OUTPUT=hg19RefGenome.dict TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Wed Feb 01 11:34:44 WET 2012] Executing as meusebio@IMMGene1 on Linux 2.6.32-71.29.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.6.0_22-b22
[Wed Feb 01 11:34:44 WET 2012] net.sf.picard.sam.CreateSequenceDictionary done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=1518993408
Exception in thread "main" net.sf.picard.PicardException: Mismatch between sequence dictionary fasta index for /GenoStorage/Genomas/hg19/hg19RefGenome.fa, sequence 'chr10' != 'chr1'.
at net.sf.picard.reference.IndexedFastaSequenceFile.sanityCheckDictionaryAgainstIndex(IndexedFastaSequenceFile.java:143)
at net.sf.picard.reference.IndexedFastaSequenceFile.<init>(IndexedFastaSequenceFile.java:86)
at net.sf.picard.reference.IndexedFastaSequenceFile.<init>(IndexedFastaSequenceFile.java:95)
at net.sf.picard.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:75)
at net.sf.picard.sam.CreateSequenceDictionary.makeSequenceDictionary(CreateSequenceDictionary.java:128)
at net.sf.picard.sam.CreateSequenceDictionary.doWork(CreateSequenceDictionary.java:113)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:175)
at net.sf.picard.sam.CreateSequenceDictionary.main(CreateSequenceDictionary.java:93)

**ulz_peter** · 02-01-2012, 03:46 AM

that's strange...
in case you don't want to give it another try you may download the GATK ressource
input files here:

http://www.broadinstitute.org/gsa/wiki/index.php/GATK_resource_bundle#Obtaining_the_bundle

I have no clue why the procedure you did, did not work...

**fc35802** · 02-02-2012, 02:41 AM

Hi, again
I begin my work again using a Ref file in the correct order. I indexed it with bwa and after samtools.

Apparently everything worked fine, until the part I had to create a Sequence Dictionary.

java -jar /GenoStorage/Software/picard-tools/CreateSequenceDictionary.jar R=/GenoStorage/Genomas/hg19/hg19RefGenome.fa O=hg19RefGenome.dict
[Thu Feb 02 10:38:09 WET 2012] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/GenoStorage/Genomas/hg19/hg19RefGenome.fa OUTPUT=hg19RefGenome.dict TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Thu Feb 02 10:38:09 WET 2012] Executing as meusebio@IMMGene1 on Linux 2.6.32-71.29.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.6.0_22-b22
[Thu Feb 02 10:38:09 WET 2012] net.sf.picard.sam.CreateSequenceDictionary done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=1518993408
Exception in thread "main" net.sf.picard.PicardException: Mismatch between sequence dictionary fasta index for /GenoStorage/Genomas/hg19/hg19RefGenome.fa, sequence 'chr10' != 'chr1'.
at net.sf.picard.reference.IndexedFastaSequenceFile.sanityCheckDictionaryAgainstIndex(IndexedFastaSequenceFile.java:143)
at net.sf.picard.reference.IndexedFastaSequenceFile.<init>(IndexedFastaSequenceFile.java:86)
at net.sf.picard.reference.IndexedFastaSequenceFile.<init>(IndexedFastaSequenceFile.java:95)
at net.sf.picard.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:75)
at net.sf.picard.sam.CreateSequenceDictionary.makeSequenceDictionary(CreateSequenceDictionary.java:128)
at net.sf.picard.sam.CreateSequenceDictionary.doWork(CreateSequenceDictionary.java:113)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:175)
at net.sf.picard.sam.CreateSequenceDictionary.main(CreateSequenceDictionary.java:93)

If I use the old RefGeno it will work fine, but I think that's stupid, because is not in the correct order

**ulz_peter** · 02-02-2012, 03:14 AM

maybe you should specify the absolute path of the dict output file. Maybe you've got the old dict file in the same path as the newly createed hg19RefGenome.fa file and he puts the new sequence dictionary somewhere else...
I'm afraid that's the only thing I can come up with

**fc35802** · 02-02-2012, 06:39 AM

You are being great !

I'm already at GATK stage (I hope).

Another program, another problem

java -Xmx4g -jar /GenoStorage/Software/GATK/GenomeAnalysisTK.jar \ -R /GenoStorage/Genomas/hg19/hg19RefGenome.fa \ -knownSites /GenoStorage/BasesDados/ucsc_hg19/snp132CodingDbSnp.txt \ -I sample02187A_align_sorted.bam \ -T CountCovariates \ -cov ReadGroupCovariate \ -cov QualityScoreCovariate \ -cov CycleCovariate \ -cov DinucCovariate \ -recalFile sample02187A.recal_data.csv
INFO 14:32:42,831 HelpFormatter - ---------------------------------------------------------------------------------
INFO 14:32:42,833 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.3-21-gcb284ee, Compiled 2011/11/29 16:46:58
INFO 14:32:42,834 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 14:32:42,834 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
INFO 14:32:42,834 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
INFO 14:32:42,835 HelpFormatter - Program Args: -R /GenoStorage/Genomas/hg19/hg19RefGenome.fa -knownSites /GenoStorage/BasesDados/ucsc_hg19/snp132CodingDbSnp.txt -I sample02187A_align_sorted.bam -T CountCovariates -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile sample02187A.recal_data.csv
INFO 14:32:42,835 HelpFormatter - Date/Time: 2012/02/02 14:32:42
INFO 14:32:42,836 HelpFormatter - ---------------------------------------------------------------------------------
INFO 14:32:42,836 HelpFormatter - ---------------------------------------------------------------------------------
INFO 14:32:44,115 GATKRunReport - Uploaded run statistics report to AWS S3
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 1.3-21-gcb284ee):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
##### ERROR
##### ERROR MESSAGE: Invalid command line: Failed to parse value /GenoStorage/BasesDados/ucsc_hg19/snp132CodingDbSnp.txt for argument knownSites. Message: Invalid command line: No tribble type was provided on the command line and the type of the file could not be determined dynamically. Please add an explicit type tag :NAME listing the correct type from among the supported types:
##### ERROR Name FeatureType Documentation
##### ERROR BEAGLE BeagleFeature http://www.broadinstitute.org/gsa/ga...agleCodec.html
##### ERROR BED BEDFeature http://www.broadinstitute.org/gsa/ga..._BEDCodec.html
##### ERROR BEDTABLE TableFeature http://www.broadinstitute.org/gsa/ga...ableCodec.html
##### ERROR GELITEXT GeliTextFeature http://www.broadinstitute.org/gsa/ga...TextCodec.html
##### ERROR OLDDBSNP OldDbSNPFeature http://www.broadinstitute.org/gsa/ga...bSNPCodec.html
##### ERROR RAWHAPMAP RawHapMapFeature http://www.broadinstitute.org/gsa/ga...pMapCodec.html
##### ERROR REFSEQ RefSeqFeature http://www.broadinstitute.org/gsa/ga...fSeqCodec.html
##### ERROR SAMPILEUP SAMPileupFeature http://www.broadinstitute.org/gsa/ga...leupCodec.html
##### ERROR SAMREAD SAMReadFeature http://www.broadinstitute.org/gsa/ga...ReadCodec.html
##### ERROR TABLE TableFeature http://www.broadinstitute.org/gsa/ga...ableCodec.html
##### ERROR VCF VariantContext http://www.broadinstitute.org/gsa/ga..._VCFCodec.html
##### ERROR VCF3 VariantContext http://www.broadinstitute.org/gsa/ga...VCF3Codec.html
##### ERROR ------------------------------------------------------------------------------------------

**MQ-BCBB** · 02-02-2012, 01:46 PM

Hello,

I just had the same problem but found that the solution is the following:

Add name and file type after the -knownSites argument. If you file is in vcf format, use the following.

-knownSites:name,VCF snp132CodingDbSnp.txt

That is described at the bottom of page <http://www.broadinstitute.org/gsa/wiki/index.php/Input_files_for_the_GATK>

**ECO** · 02-03-2012, 06:40 AM

Moving this to the appropriate forum.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 56 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Hi, I have a doubt in GATK

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News