Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GATK IndelRealigner error

    Hello,

    I'm attempting to use GATK 2.3-9-ge5ebf34 to re-align some RNA-seq datasets. Below is the code I'm using.

    java -Xms16g -jar ~/bin/GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T IndelRealigner -R ../../../transcripts/reference_wMT.fa -I fat_no_dup.bam -targetIntervals fat_no_dup.intervals -o fat_nd_realigned.bam

    I'm receiving 2 errors that occur at many locations in the output. The first is below:

    IndelRealigner - When first element of the alt consensus is M, the second one must be I or D. Actual: 85M208M2D15M.

    I see that the issue is there are 2 M fields when it is only expecting 1. But I'm unsure of what this means, or what I can do to fix it. The bam file was aligned using tophat and allowing only 1 alignment per read. Any thoughts?

    The second error is:

    IndelRealigner - Not attempting realignment in interval 12:26475533-26475760 because there are too many reads.

    I know the issue here is there are too many reads at that interval, so I have to increase the -maxReadsForRealignment setting. Does anyone know of a way to determine what the highest coverage is for one of these regions, so I can set the max reads above that level, instead of picking a random number and trying again? Thanks!

  • #2
    In the first instance, the CIGAR string makes no sense. It should be 293M2D15M. For the second warning (these aren't errors), that's just an informative message to tell you that the coverage in that region is too high. There's no reason to play with -maxReadforRealignment too much, since those regions should just be ignored anyway.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 08:47 AM
    0 responses
    12 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    59 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    54 views
    0 likes
    Last Post seqadmin  
    Working...
    X