Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi.

    Thanks for all your help.

    I tried what you told me to do, and now the error is the following:

    java -Xmx4g -jar /GenoStorage/Software/GATK/GenomeAnalysisTK.jar \ -R /GenoStorage/Genomas/hg19/hg19RefGenome.fa \ -knownSites:name,VCF /GenoStorage/BasesDados/ucsc_hg19/snp132CodingDbSnp.txt \ -I sample02187A_align_sorted.bam \ -T CountCovariates \ -cov ReadGroupCovariate \ -cov QualityScoreCovariate \ -cov CycleCovariate \ -cov DinucCovariate \ -recalFile sample02187A.recal_data.csv
    INFO 13:39:44,563 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 13:39:44,565 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.3-21-gcb284ee, Compiled 2011/11/29 16:46:58
    INFO 13:39:44,566 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 13:39:44,566 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
    INFO 13:39:44,566 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
    INFO 13:39:44,567 HelpFormatter - Program Args: -R /GenoStorage/Genomas/hg19/hg19RefGenome.fa -knownSites:name,VCF /GenoStorage/BasesDados/ucsc_hg19/snp132CodingDbSnp.txt -I sample02187A_align_sorted.bam -T CountCovariates -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile sample02187A.recal_data.csv
    INFO 13:39:44,568 HelpFormatter - Date/Time: 2012/02/06 13:39:44
    INFO 13:39:44,568 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 13:39:44,568 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 13:39:44,590 GenomeAnalysisEngine - Strictness is SILENT
    INFO 13:39:44,688 RMDTrackBuilder - Creating Tribble index in memory for file /GenoStorage/BasesDados/ucsc_hg19/snp132CodingDbSnp.txt
    INFO 13:39:51,195 GATKRunReport - Uploaded run statistics report to AWS S3
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version 1.3-21-gcb284ee):
    ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ##### ERROR Please do not post this error to the GATK forum
    ##### ERROR
    ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
    ##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
    ##### ERROR
    ##### ERROR MESSAGE: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file
    ##### ERROR ------------------------------------------------------------------------------------------


    I downloaded the file from UCSC

    Comment


    • #17
      Originally posted by ulz_peter View Post
      Ok, that's something else. GATK wants to have the chromosomes ordered this way: chr1, chr2, chr3, ..., chrX, chrY, chrM. It seems your reference fasta file contains the chromosomes in a lexikographical ordering chr10 directly after chr1. When you reorder your SAM file ot orders the reads according to the order in your reference fasta file. You could download the single-chromosome fasta files from UCSC (http://hgdownload.cse.ucsc.edu/golde...9/chromosomes/) and order them using cat like that:
      Code:
      cat chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY chrM > hg19.fa
      It might be working to just do the ReorderSam program again with the newly sorted reference fasta file, otherwise you'd need to repeat alignment. In case you're planning a pipeline you might want to have the reference file in order to save the ReorderSam step...

      Hope that helps,
      Peter
      Hello Peter,
      I did order my reference in this way "cat chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY chrM > hg19.fa" but the problem is that the GATK latest bundle (2.3) has the "dbsnp_137.hg19.vcf" and the "1000G_phase1.indels.hg19.vcf" files that are ordered with the chrM at the beginning!!!!..
      I tried to reorder the vcf with the vcfsorter http://code.google.com/p/vcfsorter/....and took far to long to do half of the file (24h) ....then again I redone cat chrM chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY > hg19.fa but I wasn't able to reorder the already consructed Bam file with ReorderSam.jar and gave me an error after constructing the Sequencedictionary with CreateSequenceDictionary.jar :

      " java.lang.IllegalArgumentException: File is not a supported reference file type: /home/cox/ex_storage/cromosomi/hg19_2/Sequencedictionary.bam


      What do you think I should do now ?

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Advancing Precision Medicine for Rare Diseases in Children
        by seqadmin




        Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
        12-16-2024, 07:57 AM
      • seqadmin
        Recent Advances in Sequencing Technologies
        by seqadmin



        Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

        Long-Read Sequencing
        Long-read sequencing has seen remarkable advancements,...
        12-02-2024, 01:49 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 12-17-2024, 10:28 AM
      0 responses
      33 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-13-2024, 08:24 AM
      0 responses
      49 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-12-2024, 07:41 AM
      0 responses
      34 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-11-2024, 07:45 AM
      0 responses
      46 views
      0 likes
      Last Post seqadmin  
      Working...
      X