Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GATK error because of the order of reference chr.

    I commanded like this

    java -Xmx1g -jar ../GATK/GenomeAnalysisTK.jar -I P1_novo.reordered.sorted.bam -T RealignerTargetCreator -R ../reference/hg19_ucsc.fa -o P1.intervals --known ../reference/snp.vcf

    then, the following error message occurred.

    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version 1.4-21-g30b937d):
    ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ##### ERROR Please do not post this error to the GATK forum
    ##### ERROR
    ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
    ##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
    ##### ERROR
    ##### ERROR MESSAGE: Lexicographically sorted human genome sequence detected in reads.
    ##### ERROR For safety's sake the GATK requires human contigs in karyotypic order: 1, 2, ..., 10, 11, ..., 20, 21, 22, X, Y with M either leading or trailing these contigs.
    ##### ERROR This is because all distributed GATK resources are sorted in karyotypic order, and your processing will fail when you need to use these files.
    ##### ERROR You can use the ReorderSam utility to fix this problem: http://www.broadinstitute.org/gsa/wi...php/ReorderSam
    ##### ERROR reads contigs = [chr1, chr10, chr11, chr11_gl000202_random, chr12, chr13, chr14, chr15, chr16, chr17, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18, chr18_gl000207_random, chr19, chr19_gl000208_random, chr19_gl000209_random, chr1_gl000191_random, chr1_gl000192_random, chr2, chr20, chr21, chr21_gl000210_random, chr22, chr3, chr4, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr5, chr6, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7, chr7_gl000195_random, chr8, chr8_gl000196_random, chr8_gl000197_random, chr9, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chrM, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249, chrX, chrY]

    I downloaded reference file from ucsc and catered them by this order:

    grep chr ../reference/hg19_ucsc.fa
    >chr1
    >chr2
    >chr3
    >chr4
    >chr5
    >chr6
    >chr7
    >chr8
    >chr9
    >chr10
    >chr11
    >chr12
    >chr13
    >chr14
    >chr15
    >chr16
    >chr17
    >chr18
    >chr19
    >chr20
    >chr21
    >chr22
    >chrX
    >chrY
    >chrM
    >chrUn_gl000211
    >chrUn_gl000212
    >chrUn_gl000213
    >chrUn_gl000214
    >chrUn_gl000215
    >chrUn_gl000216
    >chrUn_gl000217
    >chrUn_gl000218
    >chrUn_gl000219
    >chrUn_gl000220
    >chrUn_gl000221
    >chrUn_gl000222
    >chrUn_gl000223
    >chrUn_gl000224
    >chrUn_gl000225
    >chrUn_gl000226
    >chrUn_gl000227
    >chrUn_gl000228
    >chrUn_gl000229
    >chrUn_gl000230
    >chrUn_gl000231
    >chrUn_gl000232
    >chrUn_gl000233
    >chrUn_gl000234
    >chrUn_gl000235
    >chrUn_gl000236
    >chrUn_gl000237
    >chrUn_gl000238
    >chrUn_gl000239
    >chrUn_gl000240
    >chrUn_gl000241
    >chrUn_gl000242
    >chrUn_gl000243
    >chrUn_gl000244
    >chrUn_gl000245
    >chrUn_gl000246
    >chrUn_gl000247
    >chrUn_gl000248
    >chrUn_gl000249

    as the instruction says, I commanded reordersam tool of picard like this:

    java -jar ../picard/ReorderSam.jar I=P1_novo.sorted.bam O=P1_novo.reordered.sorted.bam REFERENCE=../reference/hg19_ucsc.fa

    however, the result of GATK with changed bam file makes same error message.

    is there any solution?

  • #2
    Hi,
    I really feel your pain as I have struggled with the same thing a good few times.
    You could save yourself a lot of bother b getting both your reference fastq and the dbsnp vcf file from gatk, they will more likely play together
    Chris

    Comment


    • #3
      Hi, thankyou for your reply.

      would you let me know where can i downlaod dbsnp and fastq files from GATK?

      I need dbsnp 135 and hg19 reference.

      as i know, the data from GATK bundle is dbsnp 131?129? and hg18 reference.

      is it possible to download the recent data from GATK?

      please link the site

      Comment


      • #4
        If you are planning to use the Broad bundle,
        I reckon the bundle for hg19 is present.

        1. Have you tried downloading from the following ftp yet?
        ftp://[email protected]/1.2/hg19/

        dbsnp version of the Broad bundle hg19, as I know it, is dbsnp132.
        However, if there is no specific reason to use dbsnp135 (or I might be wrong!), I don't think there would be any problem to use dbsnp132...?

        2. Also, you must make sure your reference chromosome order and vcf chromosome order are the same.
        (Personally I recall struggling because dbsnp132_b37 had "MT" on the top of chromosomeID list.)
        Last edited by alexbmp; 03-20-2012, 01:48 AM.

        Comment


        • #5
          thank you

          Thank you alex!
          but I have some questions...

          1. as you might see in my reference file, chromosomes were ordered with this order(chr1~chr22,chrX/YMT,chrUn~).
          however, after I run the novoalign, the error message says that it has weird chromosome order -> chr1, chr10~19, chr2, chr20~~~~
          how can i handle this problem? it's out of my hand to fix aligning program.

          2. Do I need to include chrUn~ sequences in my reference fasta file?
          these chrUn~ are not included in VCF file, aren't they?
          if I include them, the calling snp step will bother me again???

          Comment


          • #6
            I haven't used NovoAlign, so don't fully trust me

            1-1. If you build alignment index before alignment, check if your index is in chromosomal order (chr1, chr2, chr3, ..., chrX, chrY, chrM or the equivalent).

            1-2. If it is, check if your alignment program output options that emits chromosome ID headers in un-coordinated or lexicographical (chr1, chr10, chr11, ..., chrM, chrX, chrY) fashion. I haven't seen this kind of alignment output option yet; I highly suspect your index file is ordered lexicographically as written, as in 1-1 (I had the same error).

            2. If you are talking about contigs (or not-fully-assembled chromosome fragments), I think it is good to include them in your alignment step.

            I reckon physically existing sequence from such contigs will be mapped there, probably decreasing your error rate. Thinking about it, I'm not sure of this (but I'll write my thoughts anyway. Somebody please correct me.)

            I also think you can just exclude SNPs from contigs if their existence bugs you.
            Contigs are not fully assembled chromosomes in the first place.

            Did I understand your questions fully?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 08:47 AM
            0 responses
            14 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            54 views
            0 likes
            Last Post seqadmin  
            Working...
            X