Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    Thanks raonyguimaraes, you are right. After I put dictionary file and reference in the same directory, it works in that way, but another error message came out:
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version 1.2-26-g43b0c98):
    ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ##### ERROR Please do not post this error to the GATK forum
    ##### ERROR
    ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
    ##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
    ##### ERROR
    ##### ERROR MESSAGE: Input files reads and reference have incompatible contigs: Order of contigs differences, which is unsafe.
    ##### ERROR reads contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY]
    ##### ERROR reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chrX, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr20, chrY, chr19, chr22, chr21, chr6_ssto_hap7, chr6_mcf_hap5, chr6_cox_hap2, chr6_mann_hap4, chr6_apd_hap1, chr6_qbl_hap6, chr6_dbb_hap3, chr17_ctg5_hap1, chr4_ctg9_hap1, chr1_gl000192_random, chrUn_gl000225, chr4_gl000194_random, chr4_gl000193_random, chr9_gl000200_random, chrUn_gl000222, chrUn_gl000212, chr7_gl000195_random, chrUn_gl000223, chrUn_gl000224, chrUn_gl000219, chr17_gl000205_random, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chr9_gl000199_random, chrUn_gl000211, chrUn_gl000213, chrUn_gl000220, chrUn_gl000218, chr19_gl000209_random, chrUn_gl000221, chrUn_gl000214, chrUn_gl000228, chrUn_gl000227, chr1_gl000191_random, chr19_gl000208_random, chr9_gl000198_random, chr17_gl000204_random, chrUn_gl000233, chrUn_gl000237, chrUn_gl000230, chrUn_gl000242, chrUn_gl000243, chrUn_gl000241, chrUn_gl000236, chrUn_gl000240, chr17_gl000206_random, chrUn_gl000232, chrUn_gl000234, chr11_gl000202_random, chrUn_gl000238, chrUn_gl000244, chrUn_gl000248, chr8_gl000196_random, chrUn_gl000249, chrUn_gl000246, chr17_gl000203_random, chr8_gl000197_random, chrUn_gl000245, chrUn_gl000247, chr9_gl000201_random, chrUn_gl000235, chrUn_gl000239, chr21_gl000210_random, chrUn_gl000231, chrUn_gl000229, chrM, chrUn_gl000226, chr18_gl000207_random]
    ##### ERROR ------------------------------------------------------------------------------------------

    The chromosome in BAM file has been reordered, should I reorder reference file?
    Thanks a lot

    Comment


    • #47
      Emily,

      Make sure the various chromosome headers are in the same order in all files you are using (and make sure they match). They should also be in the same order as the way your files are sorted in whatever way you sort, I think.

      Comment


      • #48
        Please use everything from resource bundle ... http://www.broadinstitute.org/gsa/wi...esource_bundle

        Comment


        • #49
          Thanks both of you. Now it works.

          Comment


          • #50
            I rerunned all my analysis several times since the alignment using BWA -I option till UnifiedGenotyper and I'm still getting this output from UnifiedGenotyper:
            GenomeAnalysisTK.jar -T UnifiedGenotyper -l INFO -I output/exome.real.dedup.recal.bam -R ../.
            ./input/b37/human_g1k_v37.fasta -B:intervals,BED ../../input/bed/exome_plus10.merged.bed -B:dbsnp,VCF ../../input/dbsnp/dbsnp-134.vcf -glm BOTH -stand_call_
            conf 50.0 -stand_emit_conf 20.0 -dcov 300 -A AlleleBalance -A DepthOfCoverage -A FisherStrand -o output/exome.raw.vcf -log logs/gatk/UnifiedGenotyper.log
            Visited bases 3102559836
            Callable bases 2864301370
            Confidently called bases 1142400
            % callable bases of all loci 92.321
            % confidently called bases of all loci 0.037
            % confidently called bases of callable loci 0.040
            Actual calls made 350263

            Since it's the same number of variants I'm getting from DNANexus I'm starting to believe this is the right number before filtering the variants ...

            I decided to download the BED File from UCSC Table Browser as suggested on the manual using "Exon plus 10bp", and tried to use this file with UnifiedGenotyper since I was using a bedfile from "SeqCap EZ Human Exome Library v2.0".

            Running this part again I was receiving a message saying that there were overlaps on the intervals, so decided to use bedtools to merge the intervals with the command:
            mergeBed -i exome_plus10.bed > exome_plus10.merged.bed
            Does anyone else had to do it ?

            After doing this and trying again i'm receiving the message:

            exome_plus10.merged.bed and reference have incompatible contigs: No overlapping contigs found.
            exome_plus10.merged.bed contigs = [chr1, chr10, chr11, chr12, chr13,
            chr14, chr15, chr16, chr17, chr17_ctg5_hap1, chr17_gl000205_random, chr18, chr19, chr19_gl000209_random, chr1_gl000191_random, chr2, chr20, chr21, chr22, ch
            r3, chr4, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr5, chr6, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5
            , chr6_qbl_hap6, chr6_ssto_hap7, chr7, chr7_gl000195_random, chr8, chr9, chrUn_gl000211, chrUn_gl000212, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chr
            Un_gl000222, chrUn_gl000223, chrUn_gl000228, chrX, chrY]
            ##### ERROR reference contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT, GL000207.1, GL000226.1, GL000229
            .1, GL000231.1, GL000210.1, GL000239.1, GL000235.1, GL000201.1, GL000247.1, GL000245.1, GL000197.1, GL000203.1, GL000246.1, GL000249.1, GL000196.1, GL000248
            .1, GL000244.1, GL000238.1, GL000202.1, GL000234.1, GL000232.1, GL000206.1, GL000240.1, GL000236.1, GL000241.1, GL000243.1, GL000242.1, GL000230.1, GL000237
            .1, GL000233.1, GL000204.1, GL000198.1, GL000208.1, GL000191.1, GL000227.1, GL000228.1, GL000214.1, GL000221.1, GL000209.1, GL000218.1, GL000220.1, GL000213
            .1, GL000211.1, GL000199.1, GL000217.1, GL000216.1, GL000215.1, GL000205.1, GL000219.1, GL000224.1, GL000223.1, GL000195.1, GL000212.1, GL000222.1, GL000200
            .1, GL000193.1, GL000194.1, GL000225.1, GL000192.1]
            Sortbed didn't help me so I wrote an script in python to parse this bedfile and put everything as it should be chr1/1, chr2/2 and so on ... Does anyone else had to do the same?

            I checked the quality of my reads with FASTQC and they looked ok, so I didn't do any clean on my reads before using BWA->GATK.

            What I could use to clean my illumina reads ? NGS Backbone, SeqClean, CleanSeq, Prinseq, FastQX ? Does anyone improved the number of calls by doing it ?

            For the Variant Quality Score Recalibration they suggest that "in order to achieve the best exome results one needs to use an exome SNP callset with at least 30 samples."

            Does anyone tried to merge other exomes and got better results from it ?

            Still looking for my 20k variants
            Last edited by raonyguimaraes; 10-26-2011, 03:37 PM.

            Comment


            • #51
              Filtering exon+10bp bed file

              Originally posted by raonyguimaraes View Post
              I rerunned all my analysis several times since the alignment using BWA -I option till UnifiedGenotyper and I'm still getting this output from UnifiedGenotyper:

              Visited bases 3102559836
              Callable bases 2864301370
              Confidently called bases 1142400
              % callable bases of all loci 92.321
              % confidently called bases of all loci 0.037
              % confidently called bases of callable loci 0.040
              Actual calls made 350263

              Since it's the same number of variants I'm getting from DNANexus I'm starting to believe this is the right number before filtering the variants ...

              I decided to download the BED File from UCSC Table Browser as suggested on the manual using "Exon plus 10bp", and tried to use this file with UnifiedGenotyper since I was using a bedfile from "SeqCap EZ Human Exome Library v2.0".

              Running this part again I was receiving a message saying that there were overlaps on the intervals, so decided to use bedtools to merge the intervals with the command:


              Does anyone else had to do it ?

              After doing this and trying again i'm receiving the message:



              Sortbed didn't help me so I wrote an script in python to parse this bedfile and put everything as it should be chr1/1, chr2/2 and so on ... Does anyone else had to do the same?

              I checked the quality of my reads with FASTQC and they looked ok, so I didn't do any clean on my reads before using BWA->GATK.

              What I could use to clean my illumina reads ? NGS Backbone, SeqClean, CleanSeq, Prinseq, FastQX ? Does anyone improved the number of calls by doing it ?

              For the Variant Quality Score Recalibration they suggest that "in order to achieve the best exome results one needs to use an exome SNP callset with at least 30 samples."

              Does anyone tried to merge other exomes and got better results from it ?

              Still looking for my 20k variants
              The exon+10bp bed file helps. I got 150k variants from an individual without it, and reduced the number to 30k with it. The UCSC style bed file contains random and hapmap-derived positions. My reference genome for alignment does not contain such bases, so I manually filtered out (by perl) those from the exon+10bp bed file and made it work.

              Comment


              • #52
                dad

                Hello all, I have good news ...

                After using annovar, I finally got to the number of 22709 variants on my data.

                From there I'm now trying to filter based on this approach:


                The numbers are pretty close so I think I'm on the right track

                22709 Variants
                11.179 Variants
                4766 Variants
                4222 Variants
                removed frequency > 0.01
                878 Variants
                427 Variants

                Comment


                • #53
                  I think the -L argument expects the intervals file in SAM format
                  (http://www.broadinstitute.org/gsa/wi...line_arguments). If yes, use bedtools' bedToBam

                  Originally posted by liu_xt005 View Post
                  Following ulz_peter's original doc, I have some problem when doing the SNP-calling.

                  java -Xmx4g -jar /path/GenomeAnalysisTK-1.1-35-ge253f6f/GenomeAnalysisTK.jar \
                  -glm BOTH \
                  -R hg18.fa \
                  -T UnifiedGenotyper \
                  -I myinput.marked.realigned.fixed.recal.bam \
                  -D dbsnp132_hg18.txt \
                  -o myoutput.snps.vcf \
                  -metrics snps.metrics \
                  -stand_call_conf 50.0 \
                  -stand_emit_conf 10.0 \
                  -dcov 1000 \
                  -A DepthOfCoverage \
                  -A AlleleBalance \
                  -L hg18_exonIntervals.bed

                  This "-L" option does not work.
                  I got the hg18_exonIntervals.bed from UCSC as ulz_peter's original doc shows.
                  I run the SNP-calling without the "-L" line.
                  Then the variant quality score recalibration step does not work, generating an empty output.tranches file.

                  Can somebody help me out? Thanks a lot.

                  Comment


                  • #54
                    In following the workflow mentioned above, I've come up against an error, and I'm wondering if I'm alone in this. Has anyone experienced difficulty with using CountCovariates tool, specifically with errors regarding accessing information from the input BAM file? I've tried this with several samples, but keep getting the same error, "Bad input: Could not find any usable data in the input BAM file(s)"

                    (for those interested, the BAM files in question are not empty, and work just fine with samtools view).



                    Code:
                    java -Xmx16g -jar /$Software/GenomeAnalysisTK-1.3-17-gc62082b/GenomeAnalysisTK.jar -T CountCovariates -R /$Genomes/Broad/Human/b37/human_g1k_v37.fasta -I $Projects/data/SampleA_bowtie.gatk.realign.bam -nt 8 -l INFO -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -log RECAL.log -recalFile RECAL.csv --knownSites $Genomes/Broad/Human/b37/dbsnp_132.b37.vcf
                    
                    INFO  14:01:25,870 HelpFormatter - ---------------------------------------------------------------------------------
                    INFO  14:01:25,875 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.3-17-gc62082b, Compiled 2011/11/18 15:24:46
                    INFO  14:01:25,875 HelpFormatter - Copyright (c) 2010 The Broad Institute
                    INFO  14:01:25,876 HelpFormatter - Please view our documentation at [url]http://www.broadinstitute.org/gsa/wiki[/url]
                    INFO  14:01:25,876 HelpFormatter - For support, please view our support site at [url]http://getsatisfaction.com/gsa[/url]
                    INFO  14:01:25,877 HelpFormatter - Program Args: -T CountCovariates -R /$Genomes/Broad/Human/b37/human_g1k_v37.fasta -I $Projects/data/SampleA_bowtie.gatk.realign.bam -nt 8 -l INFO -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -log RECAL.log -recalFile RECAL.csv --knownSites $Genomes/Broad/Human/b37/dbsnp_132.b37.vcf
                    INFO  14:01:25,878 HelpFormatter - Date/Time: 2011/11/24 14:01:25
                    INFO  14:01:25,878 HelpFormatter - ---------------------------------------------------------------------------------
                    INFO  14:01:25,878 HelpFormatter - ---------------------------------------------------------------------------------
                    INFO  14:01:26,052 RodBindingArgumentTypeDescriptor - Dynamically determined type of $Genomes/Broad/Human/b37/dbsnp_132.b37.vcf to be VCF
                    INFO  14:01:26,064 GenomeAnalysisEngine - Strictness is SILENT
                    INFO  14:01:26,815 RMDTrackBuilder - Loading Tribble index from disk for file $Genomes/Broad/Human/b37/dbsnp_132.b37.vcf
                    INFO  14:01:30,532 MicroScheduler - Running the GATK in parallel mode with 8 concurrent threads
                    INFO  14:01:32,326 CountCovariatesWalker - The covariates being used here:
                    INFO  14:01:32,327 CountCovariatesWalker -      ReadGroupCovariate
                    INFO  14:01:32,327 CountCovariatesWalker -      QualityScoreCovariate
                    INFO  14:01:32,327 CountCovariatesWalker -      CycleCovariate
                    INFO  14:01:32,328 CountCovariatesWalker -      DinucCovariate
                    INFO  14:01:41,189 CountCovariatesWalker - Writing raw recalibration data...
                    INFO  14:01:44,145 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
                    INFO  14:01:44,146 HttpMethodDirector - Retrying request
                    INFO  14:01:44,149 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
                    INFO  14:01:44,149 HttpMethodDirector - Retrying request
                    INFO  14:01:44,152 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
                    INFO  14:01:44,153 HttpMethodDirector - Retrying request
                    INFO  14:01:44,155 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
                    INFO  14:01:44,155 HttpMethodDirector - Retrying request
                    INFO  14:01:44,158 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
                    INFO  14:01:44,158 HttpMethodDirector - Retrying request
                    ##### ERROR ------------------------------------------------------------------------------------------
                    ##### ERROR A USER ERROR has occurred (version 1.3-17-gc62082b):
                    ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
                    ##### ERROR Please do not post this error to the GATK forum
                    ##### ERROR
                    ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
                    ##### ERROR Visit our wiki for extensive documentation [url]http://www.broadinstitute.org/gsa/wiki[/url]
                    ##### ERROR Visit our forum to view answers to commonly asked questions [url]http://getsatisfaction.com/gsa[/url]
                    ##### ERROR
                    ##### ERROR MESSAGE: Bad input: Could not find any usable data in the input BAM file(s).
                    ##### ERROR ------------------------------------------------------------------------------------------

                    Comment


                    • #55
                      I've never seen that error...
                      Also I've never seen that HttpMethodDirector - I/O Exceptions...

                      Maybe you should talk to the GATK people at the GetSatisfaction Page:

                      Comment


                      • #56
                        The I/O exceptions are just slight warnings that the software can't "phone home" as it has to deal with proxies and firewalls and the like. It doesn't affect the result, and you can silence the errors by providing the right parameters. This is just an example output I grabbed.

                        Comment


                        • #57
                          Thanks !

                          Looks useful- thanks for distributing this.
                          GP

                          Comment


                          • #58
                            Hi All
                            I have been trying to follow the exome-analysis pipeline written by ulz_peter (thanks petter for this clear and nice document). I have encountered the same problem as pc2009open when trying to run "VariantRecalibrator" (using GATK version 1.0.5336)
                            I got the same error message: "Argument with name '--cluster_file' is missing.
                            I am wondering if you solved the problem, and if there is an updated analysis pipeline working with latest GATK version
                            Thanks

                            Comment


                            • #59
                              Hi Mali Salmon,

                              I imported the document to the Seq-Wiki (see http://seqanswers.com/wiki/How-to/exome_analysis). The problem is: Variant recalibration doesn't really work for single-exome analyses. In the updated version (which is already kind of out-of-date) in the Seq Wiki, I already wrote that I got back to SNP filtering using VariantFiltration as I often got a lot of error messages trying to do Variant recalibration on a single sample.

                              There is a link to the GATK Homepage on the wiki. If you still want to do Variant Quality Score Recalibration I'd recommend you stick to their guidelines.

                              Hope that helps (and I am really happy some people really read the guideline)
                              Best regards,
                              Peter

                              Comment


                              • #60
                                Thanks a lot Peter for quick reply.
                                What do you mean by "single-exome" analysis? Do you mean single-end reads (as in my case) or single sample?
                                I actually have data from 4 patients, and I thought of finding variants for each patient separately. Would you recommend to run them all in a single analysis?
                                Thanks again
                                Mali

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                59 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                57 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                51 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                56 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X