Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GATK IndelRealigner error, badly formed genome loc: unknown contig chrM.

    The original fasta file used to generate the alignments has the correct label of >ChrM. When I check the header of the input BAM I find the chrM @SQ line.

    What could cause this?

  • #2
    Originally posted by gumbos View Post
    The original fasta file used to generate the alignments has the correct label of >ChrM. When I check the header of the input BAM I find the chrM @SQ line.

    What could cause this?
    You really need to provide more info..

    Comment


    • #3
      What more info can I provide? I am trying to run the GATK pipeline following the best practices page. I have generated a SAM using the newest bowtie2 beta. I convert that SAM to a BAM, sort it, index it, then run RealignerTargetCreator, which finishes successfully. I use the intervals file and the same BAM to run IndelRealigner, which runs fine until the very end when it hits chrM, and reports that error.

      INFO 15:50:36,054 TraversalEngine - chr9:55145149 4.13e+07 69.1 m 100.3 s 99.8% 69.2 m 9.1 s
      ......
      ##### ERROR MESSAGE: Badly formed genome loc: Parameters to GenomeLocParser are incorrect:Unknown contig chrM

      So I went back and made sure that the fasta used for the alignment had the correct label, which it does. I checked the SAM file and the @SQ header has chrM, and I checked for some alignments and found chrM in the 3rd position. I checked for the presence of chrMt, as ensembl sometimes does it, anywhere in the SAM and did not find it.

      Comment


      • #4
        Are you using the same fasta that you used for alignment for the indel realigner? I know it kind of sounds stupid to ask, but everything else seems fine..
        By the way, ensembl "always" does MT. You must be using UCSC references..

        Comment


        • #5
          I am fairly sure I am using the correct reference fasta because it was all done via a shell script. As a test, I went back and took chrM out of the fasta, and it worked. However, I am now stuck on the CountCovariates step. Before I make another post, maybe you know:

          My reference VCF comes from converting the ensembl GVF to VCF with gvf2vcf. I then get the error "The provided VCF file is malformed at approximately line number 4: Empty alleles are not permitted in VCF records". I noticed referring to the table at http://vcftools.sourceforge.net/VCF-poster.pdf that my VCF does not contain the format, sample1 or sample2 columns. Are these required? If so, do you know of where I can get a proper Zebrafish dbSNP VCF? UCSC doesn't have one as far as I can tell, and the dbSNP ftp files do not come in VCF format as far as I can tell.

          Comment


          • #6
            I got the same message when I ran the IndelRealigner but my reference file does not include chrM. I then ran the UnifiedGenotyper anyways. Is my vcf file OK even though it gave an error that my file is truncated or corrupt?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 08:47 AM
            0 responses
            14 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            54 views
            0 likes
            Last Post seqadmin  
            Working...
            X