Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
GATK IndelRealigner error Heisman Bioinformatics 2 09-14-2011 03:25 PM
GATK error poisson200 Bioinformatics 3 07-08-2011 10:45 PM
GATK IndelRealigner returns empty output file bjfan Bioinformatics 3 06-08-2011 10:43 AM
GATK IndelRealigner error carljo Bioinformatics 1 04-06-2011 11:25 AM
GATK IndelRealigner Error:Code Exception HongxiangZheng Bioinformatics 1 12-14-2010 07:21 PM

Thread Tools
Old 11-15-2011, 07:09 PM   #1
Senior Member
Location: Boston, MA

Join Date: Nov 2010
Posts: 100
Question GATK IndelRealigner error

Hi All,
I am trying to perform a local realignment of some BAM generated with Novoalign. I run the following commands:

Command 1:
novoalign -f reads.fastq.gz -c 2 -d Mosaik/reference -o SAM 2> reads.novoalign_logS0.txt | samtools view -S -b -q 1 - | samtools sort - reads

Command 2:
java -Xmx2g -jar /usr/local/bin/picard/SortSam.jar I=reads.bam O=readssorted.bam SO=coordinate

Command 3:
java -Xmx2g -jar /usr/local/bin/gatk/GenomeAnalysisTK.jar -I readssorted.bam -R Mosaik/reference.fasta -T RealignerTargetCreator -o forIndelRealigner.intervals

Command 4:
java -Xmx2g -jar /usr/local/bin/gatk/GenomeAnalysisTK.jar -I readssorted.bam -R Mosaik/reference.fasta -T IndelRealigner --targetIntervals forIndelRealigner.intervals -o realignedBam.bam

The RealignerTargetCreator (3) finishes successfully and creates the required file:


However, when I run the last command (4) - IndelRealigner, I get the following error:

##### ERROR MESSAGE: File associated with name forIndelRealigner.intervals is malformed: Interval file could not be parsed in any supported format. caused by Failed to parse Genome Location string: LASV-reference:52-53

Any idea what might be the problem here? I have tried various things, but I always fail here.

My reference.dict looks like this:
@HD VN:1.0 SO:unsorted
@SQ SN:LASV-reference LN:3402 UR:file:/Users/kga/Desktop/Mosaik/reference.fasta M5:8a4c76005c28bef3f2775dbf6ffa2062

LASV-reference 3402 89 3402 3403

Thanks very much,

Last edited by kga1978; 11-15-2011 at 07:29 PM.
kga1978 is offline   Reply With Quote
Old 11-15-2011, 08:23 PM   #2
Senior Member
Location: St. Louis

Join Date: Dec 2010
Posts: 535

Could you maybe just post the first couple dozen lines of the various files used (or if a bunch of lines look similar just a representative line)?
Heisman is offline   Reply With Quote
Old 11-16-2011, 05:55 AM   #3
Senior Member
Location: Boston, MA

Join Date: Nov 2010
Posts: 100

Sure thing:


Sorted BAM file:
ILLUMINA_0142:3:1108:12467:139455#TGACCA/1 0 LASV-reference 2892 20 1S51M * 0 0 GTCTTTGGTCAAGTTGCTGTGAGCTCAAGTTGCCCATATAGACACCTGCACT Z_^cc`ce^aeegedghe_gggcfdhdhhaX^dfghfhhhdhdedg_dfdgh RG:Z:ZGO3HPVJRLW NM:i:2 MD:Z:24T1G24 ZA:Z:<@;0;0;;1;;>
ILLUMINA_0142:3:1104:8199:92212#TGACCA/1 0 LASV-reference 2893 19 52M * 0 0 CTTTGGTCAAGTTGCTGTGAGCTCAAGTTGCCCATATAGACACCTGCACTCA ^__cc``Yaa^b`beefhehhddf]dfgfhhRabcdbg`fffbcffghhfhf RG:Z:ZGO3HPVJRLW NM:i:3 MD:Z:23T1G24T1 ZA:Z:<@;0;0;;1;;>
ILLUMINA_0142:3:1108:20971:8153#TGACCA/1 16 LASV-reference 2893 19 52M * 0 0 CTTTGGTCAAGTTGCTGTGAGCTCAAGTTGCCCATATAGACACCTGCACTCA caQRccbeefcecb^PI[dXeb[X`hefdXbSSgbSd_`Qb[eecSc``__^ RG:Z:ZGO3HPVJRLW NM:i:3 MD:Z:23T1G24T1 ZA:Z:<@;0;0;;1;;>
ILLUMINA_0142:3:2102:12125:81885#TGACCA/1 16 LASV-reference 2894 18 51M1S * 0 0 TTTGGTCAAGTTGCTGTGAGCTCAAGTTGCCCATATAGACACCTGCACTCAG Z_f^fd_abhgbd`cfbdbbYJ`JRe\gebXec`e`Yb[cbabba\`cc__\ RG:Z:ZGO3HPVJRLW NM:i:3 MD:Z:22T1G24T1 ZA:Z:<@;0;0;;1;;>
ILLUMINA_0142:3:1208:5666:190436#TGACCA/1 16 LASV-reference 2895 20 52M * 0 0 TTGGTCAAGTTGCTGTGAGCTCAAGTTGCCCATATAGACACCTGCACTCAAT c]dhee^ee^^Hehfe_deebdeZeehebgd_gafabQJJeeeeccccc___ RG:Z:ZGO3HPVJRLW NM:i:3 MD:Z:21T1G24T3 ZA:Z:<@;0;0;;1;;>
ILLUMINA_0142:3:1204:18832:77734#TGACCA/1 0 LASV-reference 2897 19 49M * 0 0 GGTCAAGTTGCTGTGAGCTCAAGTTGCCCATATAGACACCTGCACTCAA abaeeeecggfggghhgfhihiiggiiiiiiiiiihiiiiiiiihiiih RG:Z:ZGO3HPVJRLW NM:i:3 MD:Z:19T1G24T2 ZA:Z:<@;0;0;;1;;>

The .intervals, .fai and .dict files are exactly as described above - no further text in those.

Thanks very much

Last edited by kga1978; 11-16-2011 at 03:56 PM. Reason: typo
kga1978 is offline   Reply With Quote
Old 11-16-2011, 02:02 PM   #4
Senior Member
Location: Boston, MA

Join Date: Nov 2010
Posts: 100

Anybody any thoughts? This is driving me nuts and SRMA doesn't appear to be working either (separate post)

Thanks in advance.
kga1978 is offline   Reply With Quote
Old 11-16-2011, 02:58 PM   #5
Senior Member
Location: San Diego

Join Date: May 2008
Posts: 912

Does your fasta file really say "reference", and not "LASV-reference"?
swbarnes2 is offline   Reply With Quote
Old 11-16-2011, 03:56 PM   #6
Senior Member
Location: Boston, MA

Join Date: Nov 2010
Posts: 100

Sorry, that is my bad - I tried to make another reference with just the word 'reference' - but the one I have been using correctly says 'LASV-reference' - I have corrected the typo.
kga1978 is offline   Reply With Quote
Old 11-17-2011, 02:59 AM   #7
Senior Member
Location: Aberdeen, Scotland

Join Date: Jan 2010
Posts: 388

You do know you can get in touch directly with the GATK team here:

They're very responsive to questions.
Bukowski is offline   Reply With Quote
Old 12-16-2011, 08:58 AM   #8
Junior Member
Location: USA

Join Date: Jul 2009
Posts: 1

GATK is picky about the file name. Try changing the extension to ".interval_list"
JLand52 is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 05:45 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO