Seqanswers Leaderboard Ad

**emilyjia2000** · 10-26-2011, 06:09 AM

Thanks raonyguimaraes, you are right. After I put dictionary file and reference in the same directory, it works in that way, but another error message came out:
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 1.2-26-g43b0c98):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
##### ERROR
##### ERROR MESSAGE: Input files reads and reference have incompatible contigs: Order of contigs differences, which is unsafe.
##### ERROR reads contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY]
##### ERROR reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chrX, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr20, chrY, chr19, chr22, chr21, chr6_ssto_hap7, chr6_mcf_hap5, chr6_cox_hap2, chr6_mann_hap4, chr6_apd_hap1, chr6_qbl_hap6, chr6_dbb_hap3, chr17_ctg5_hap1, chr4_ctg9_hap1, chr1_gl000192_random, chrUn_gl000225, chr4_gl000194_random, chr4_gl000193_random, chr9_gl000200_random, chrUn_gl000222, chrUn_gl000212, chr7_gl000195_random, chrUn_gl000223, chrUn_gl000224, chrUn_gl000219, chr17_gl000205_random, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chr9_gl000199_random, chrUn_gl000211, chrUn_gl000213, chrUn_gl000220, chrUn_gl000218, chr19_gl000209_random, chrUn_gl000221, chrUn_gl000214, chrUn_gl000228, chrUn_gl000227, chr1_gl000191_random, chr19_gl000208_random, chr9_gl000198_random, chr17_gl000204_random, chrUn_gl000233, chrUn_gl000237, chrUn_gl000230, chrUn_gl000242, chrUn_gl000243, chrUn_gl000241, chrUn_gl000236, chrUn_gl000240, chr17_gl000206_random, chrUn_gl000232, chrUn_gl000234, chr11_gl000202_random, chrUn_gl000238, chrUn_gl000244, chrUn_gl000248, chr8_gl000196_random, chrUn_gl000249, chrUn_gl000246, chr17_gl000203_random, chr8_gl000197_random, chrUn_gl000245, chrUn_gl000247, chr9_gl000201_random, chrUn_gl000235, chrUn_gl000239, chr21_gl000210_random, chrUn_gl000231, chrUn_gl000229, chrM, chrUn_gl000226, chr18_gl000207_random]
##### ERROR ------------------------------------------------------------------------------------------

The chromosome in BAM file has been reordered, should I reorder reference file?
Thanks a lot

**Heisman** · 10-26-2011, 06:21 AM

Emily,

Make sure the various chromosome headers are in the same order in all files you are using (and make sure they match). They should also be in the same order as the way your files are sorted in whatever way you sort, I think.

**raonyguimaraes** · 10-26-2011, 06:29 AM

Please use everything from resource bundle ... http://www.broadinstitute.org/gsa/wi...esource_bundle

**emilyjia2000** · 10-26-2011, 07:05 AM

Thanks both of you. Now it works.

**raonyguimaraes** · 10-26-2011, 03:22 PM

I rerunned all my analysis several times since the alignment using BWA -I option till UnifiedGenotyper and I'm still getting this output from UnifiedGenotyper:

GenomeAnalysisTK.jar -T UnifiedGenotyper -l INFO -I output/exome.real.dedup.recal.bam -R ../.
./input/b37/human_g1k_v37.fasta -B:intervals,BED ../../input/bed/exome_plus10.merged.bed -B:dbsnp,VCF ../../input/dbsnp/dbsnp-134.vcf -glm BOTH -stand_call_
conf 50.0 -stand_emit_conf 20.0 -dcov 300 -A AlleleBalance -A DepthOfCoverage -A FisherStrand -o output/exome.raw.vcf -log logs/gatk/UnifiedGenotyper.log

Visited bases 3102559836
Callable bases 2864301370
Confidently called bases 1142400
% callable bases of all loci 92.321
% confidently called bases of all loci 0.037
% confidently called bases of callable loci 0.040
Actual calls made 350263

Since it's the same number of variants I'm getting from DNANexus I'm starting to believe this is the right number before filtering the variants ...

I decided to download the BED File from UCSC Table Browser as suggested on the manual using "Exon plus 10bp", and tried to use this file with UnifiedGenotyper since I was using a bedfile from "SeqCap EZ Human Exome Library v2.0".

Running this part again I was receiving a message saying that there were overlaps on the intervals, so decided to use bedtools to merge the intervals with the command:

mergeBed -i exome_plus10.bed > exome_plus10.merged.bed

Does anyone else had to do it ?

After doing this and trying again i'm receiving the message:

exome_plus10.merged.bed and reference have incompatible contigs: No overlapping contigs found.
exome_plus10.merged.bed contigs = [chr1, chr10, chr11, chr12, chr13,
chr14, chr15, chr16, chr17, chr17_ctg5_hap1, chr17_gl000205_random, chr18, chr19, chr19_gl000209_random, chr1_gl000191_random, chr2, chr20, chr21, chr22, ch
r3, chr4, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr5, chr6, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5
, chr6_qbl_hap6, chr6_ssto_hap7, chr7, chr7_gl000195_random, chr8, chr9, chrUn_gl000211, chrUn_gl000212, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chr
Un_gl000222, chrUn_gl000223, chrUn_gl000228, chrX, chrY]
##### ERROR reference contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT, GL000207.1, GL000226.1, GL000229
.1, GL000231.1, GL000210.1, GL000239.1, GL000235.1, GL000201.1, GL000247.1, GL000245.1, GL000197.1, GL000203.1, GL000246.1, GL000249.1, GL000196.1, GL000248
.1, GL000244.1, GL000238.1, GL000202.1, GL000234.1, GL000232.1, GL000206.1, GL000240.1, GL000236.1, GL000241.1, GL000243.1, GL000242.1, GL000230.1, GL000237
.1, GL000233.1, GL000204.1, GL000198.1, GL000208.1, GL000191.1, GL000227.1, GL000228.1, GL000214.1, GL000221.1, GL000209.1, GL000218.1, GL000220.1, GL000213
.1, GL000211.1, GL000199.1, GL000217.1, GL000216.1, GL000215.1, GL000205.1, GL000219.1, GL000224.1, GL000223.1, GL000195.1, GL000212.1, GL000222.1, GL000200
.1, GL000193.1, GL000194.1, GL000225.1, GL000192.1]

Sortbed didn't help me so I wrote an script in python to parse this bedfile and put everything as it should be chr1/1, chr2/2 and so on ... Does anyone else had to do the same?

I checked the quality of my reads with FASTQC and they looked ok, so I didn't do any clean on my reads before using BWA->GATK.

What I could use to clean my illumina reads ? NGS Backbone, SeqClean, CleanSeq, Prinseq, FastQX ? Does anyone improved the number of calls by doing it ?

For the Variant Quality Score Recalibration they suggest that "in order to achieve the best exome results one needs to use an exome SNP callset with at least 30 samples."

Does anyone tried to merge other exomes and got better results from it ?

Still looking for my 20k variants

**liu_xt005** · 10-27-2011, 07:36 AM

Filtering exon+10bp bed file

Originally posted by raonyguimaraes View Post

I rerunned all my analysis several times since the alignment using BWA -I option till UnifiedGenotyper and I'm still getting this output from UnifiedGenotyper:

Visited bases 3102559836
Callable bases 2864301370
Confidently called bases 1142400
% callable bases of all loci 92.321
% confidently called bases of all loci 0.037
% confidently called bases of callable loci 0.040
Actual calls made 350263

Since it's the same number of variants I'm getting from DNANexus I'm starting to believe this is the right number before filtering the variants ...

I decided to download the BED File from UCSC Table Browser as suggested on the manual using "Exon plus 10bp", and tried to use this file with UnifiedGenotyper since I was using a bedfile from "SeqCap EZ Human Exome Library v2.0".

Running this part again I was receiving a message saying that there were overlaps on the intervals, so decided to use bedtools to merge the intervals with the command:

Does anyone else had to do it ?

After doing this and trying again i'm receiving the message:

Sortbed didn't help me so I wrote an script in python to parse this bedfile and put everything as it should be chr1/1, chr2/2 and so on ... Does anyone else had to do the same?

I checked the quality of my reads with FASTQC and they looked ok, so I didn't do any clean on my reads before using BWA->GATK.

What I could use to clean my illumina reads ? NGS Backbone, SeqClean, CleanSeq, Prinseq, FastQX ? Does anyone improved the number of calls by doing it ?

For the Variant Quality Score Recalibration they suggest that "in order to achieve the best exome results one needs to use an exome SNP callset with at least 30 samples."

Does anyone tried to merge other exomes and got better results from it ?

Still looking for my 20k variants

The exon+10bp bed file helps. I got 150k variants from an individual without it, and reduced the number to 30k with it. The UCSC style bed file contains random and hapmap-derived positions. My reference genome for alignment does not contain such bases, so I manually filtered out (by perl) those from the exon+10bp bed file and made it work.

**raonyguimaraes** · 10-27-2011, 07:56 AM

dad

Hello all, I have good news ...

After using annovar, I finally got to the number of 22709 variants on my data.

From there I'm now trying to filter based on this approach:

The numbers are pretty close so I think I'm on the right track

22709 Variants
11.179 Variants
4766 Variants
4222 Variants
removed frequency > 0.01
878 Variants
427 Variants

**christophpale** · 11-22-2011, 12:48 AM

I think the -L argument expects the intervals file in SAM format
(http://www.broadinstitute.org/gsa/wi...line_arguments). If yes, use bedtools' bedToBam

Originally posted by liu_xt005 View Post

Following ulz_peter's original doc, I have some problem when doing the SNP-calling.

java -Xmx4g -jar /path/GenomeAnalysisTK-1.1-35-ge253f6f/GenomeAnalysisTK.jar \
-glm BOTH \
-R hg18.fa \
-T UnifiedGenotyper \
-I myinput.marked.realigned.fixed.recal.bam \
-D dbsnp132_hg18.txt \
-o myoutput.snps.vcf \
-metrics snps.metrics \
-stand_call_conf 50.0 \
-stand_emit_conf 10.0 \
-dcov 1000 \
-A DepthOfCoverage \
-A AlleleBalance \
-L hg18_exonIntervals.bed

This "-L" option does not work.
I got the hg18_exonIntervals.bed from UCSC as ulz_peter's original doc shows.
I run the SNP-calling without the "-L" line.
Then the variant quality score recalibration step does not work, generating an empty output.tranches file.

Can somebody help me out? Thanks a lot.

**blackgore** · 11-24-2011, 06:17 AM

In following the workflow mentioned above, I've come up against an error, and I'm wondering if I'm alone in this. Has anyone experienced difficulty with using CountCovariates tool, specifically with errors regarding accessing information from the input BAM file? I've tried this with several samples, but keep getting the same error, "Bad input: Could not find any usable data in the input BAM file(s)"

(for those interested, the BAM files in question are not empty, and work just fine with samtools view).

Code:

java -Xmx16g -jar /$Software/GenomeAnalysisTK-1.3-17-gc62082b/GenomeAnalysisTK.jar -T CountCovariates -R /$Genomes/Broad/Human/b37/human_g1k_v37.fasta -I $Projects/data/SampleA_bowtie.gatk.realign.bam -nt 8 -l INFO -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -log RECAL.log -recalFile RECAL.csv --knownSites $Genomes/Broad/Human/b37/dbsnp_132.b37.vcf

INFO  14:01:25,870 HelpFormatter - ---------------------------------------------------------------------------------
INFO  14:01:25,875 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.3-17-gc62082b, Compiled 2011/11/18 15:24:46
INFO  14:01:25,875 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO  14:01:25,876 HelpFormatter - Please view our documentation at [url]http://www.broadinstitute.org/gsa/wiki[/url]
INFO  14:01:25,876 HelpFormatter - For support, please view our support site at [url]http://getsatisfaction.com/gsa[/url]
INFO  14:01:25,877 HelpFormatter - Program Args: -T CountCovariates -R /$Genomes/Broad/Human/b37/human_g1k_v37.fasta -I $Projects/data/SampleA_bowtie.gatk.realign.bam -nt 8 -l INFO -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -log RECAL.log -recalFile RECAL.csv --knownSites $Genomes/Broad/Human/b37/dbsnp_132.b37.vcf
INFO  14:01:25,878 HelpFormatter - Date/Time: 2011/11/24 14:01:25
INFO  14:01:25,878 HelpFormatter - ---------------------------------------------------------------------------------
INFO  14:01:25,878 HelpFormatter - ---------------------------------------------------------------------------------
INFO  14:01:26,052 RodBindingArgumentTypeDescriptor - Dynamically determined type of $Genomes/Broad/Human/b37/dbsnp_132.b37.vcf to be VCF
INFO  14:01:26,064 GenomeAnalysisEngine - Strictness is SILENT
INFO  14:01:26,815 RMDTrackBuilder - Loading Tribble index from disk for file $Genomes/Broad/Human/b37/dbsnp_132.b37.vcf
INFO  14:01:30,532 MicroScheduler - Running the GATK in parallel mode with 8 concurrent threads
INFO  14:01:32,326 CountCovariatesWalker - The covariates being used here:
INFO  14:01:32,327 CountCovariatesWalker -      ReadGroupCovariate
INFO  14:01:32,327 CountCovariatesWalker -      QualityScoreCovariate
INFO  14:01:32,327 CountCovariatesWalker -      CycleCovariate
INFO  14:01:32,328 CountCovariatesWalker -      DinucCovariate
INFO  14:01:41,189 CountCovariatesWalker - Writing raw recalibration data...
INFO  14:01:44,145 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
INFO  14:01:44,146 HttpMethodDirector - Retrying request
INFO  14:01:44,149 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
INFO  14:01:44,149 HttpMethodDirector - Retrying request
INFO  14:01:44,152 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
INFO  14:01:44,153 HttpMethodDirector - Retrying request
INFO  14:01:44,155 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
INFO  14:01:44,155 HttpMethodDirector - Retrying request
INFO  14:01:44,158 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
INFO  14:01:44,158 HttpMethodDirector - Retrying request
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 1.3-17-gc62082b):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our wiki for extensive documentation [url]http://www.broadinstitute.org/gsa/wiki[/url]
##### ERROR Visit our forum to view answers to commonly asked questions [url]http://getsatisfaction.com/gsa[/url]
##### ERROR
##### ERROR MESSAGE: Bad input: Could not find any usable data in the input BAM file(s).
##### ERROR ------------------------------------------------------------------------------------------

**ulz_peter** · 11-24-2011, 06:26 AM

I've never seen that error...
Also I've never seen that HttpMethodDirector - I/O Exceptions...

Maybe you should talk to the GATK people at the GetSatisfaction Page:

503 Service Temporarily Unavailable

http://getsatisfaction.com/gsa

**blackgore** · 11-24-2011, 06:39 AM

The I/O exceptions are just slight warnings that the software can't "phone home" as it has to deal with proxies and firewalls and the like. It doesn't affect the result, and you can silence the errors by providing the right parameters. This is just an example output I grabbed.

**godfrey1** · 12-27-2011, 09:07 AM

Thanks !

Looks useful- thanks for distributing this.
GP

**Mali Salmon** · 01-18-2012, 03:15 AM

Hi All
I have been trying to follow the exome-analysis pipeline written by ulz_peter (thanks petter for this clear and nice document). I have encountered the same problem as pc2009open when trying to run "VariantRecalibrator" (using GATK version 1.0.5336)
I got the same error message: "Argument with name '--cluster_file' is missing.
I am wondering if you solved the problem, and if there is an updated analysis pipeline working with latest GATK version
Thanks

**ulz_peter** · 01-18-2012, 04:49 AM

Hi Mali Salmon,

I imported the document to the Seq-Wiki (see http://seqanswers.com/wiki/How-to/exome_analysis). The problem is: Variant recalibration doesn't really work for single-exome analyses. In the updated version (which is already kind of out-of-date) in the Seq Wiki, I already wrote that I got back to SNP filtering using VariantFiltration as I often got a lot of error messages trying to do Variant recalibration on a single sample.

There is a link to the GATK Homepage on the wiki. If you still want to do Variant Quality Score Recalibration I'd recommend you stick to their guidelines.

Hope that helps (and I am really happy some people really read the guideline)
Best regards,
Peter

**Mali Salmon** · 01-18-2012, 05:00 AM

Thanks a lot Peter for quick reply.
What do you mean by "single-exome" analysis? Do you mean single-end reads (as in my case) or single sample?
I actually have data from 4 patients, and I thought of finding variants for each patient separately. Would you recommend to run them all in a single analysis?
Thanks again
Mali

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 51 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 56 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News