Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GATK IndelRealigner error

    Hi All,
    I am trying to perform a local realignment of some BAM generated with Novoalign. I run the following commands:

    Command 1:
    novoalign -f reads.fastq.gz -c 2 -d Mosaik/reference -o SAM 2> reads.novoalign_logS0.txt | samtools view -S -b -q 1 - | samtools sort - reads

    Command 2:
    java -Xmx2g -jar /usr/local/bin/picard/SortSam.jar I=reads.bam O=readssorted.bam SO=coordinate

    Command 3:
    java -Xmx2g -jar /usr/local/bin/gatk/GenomeAnalysisTK.jar -I readssorted.bam -R Mosaik/reference.fasta -T RealignerTargetCreator -o forIndelRealigner.intervals

    Command 4:
    java -Xmx2g -jar /usr/local/bin/gatk/GenomeAnalysisTK.jar -I readssorted.bam -R Mosaik/reference.fasta -T IndelRealigner --targetIntervals forIndelRealigner.intervals -o realignedBam.bam

    The RealignerTargetCreator (3) finishes successfully and creates the required file:

    LASV-reference:52-53
    LASV-reference:305-339
    LASV-reference:439-519

    However, when I run the last command (4) - IndelRealigner, I get the following error:

    ##### ERROR MESSAGE: File associated with name forIndelRealigner.intervals is malformed: Interval file could not be parsed in any supported format. caused by Failed to parse Genome Location string: LASV-reference:52-53

    Any idea what might be the problem here? I have tried various things, but I always fail here.

    My reference.dict looks like this:
    @HD VN:1.0 SO:unsorted
    @SQ SN:LASV-reference LN:3402 UR:file:/Users/kga/Desktop/Mosaik/reference.fasta M5:8a4c76005c28bef3f2775dbf6ffa2062

    reference.fasta.fai:
    LASV-reference 3402 89 3402 3403


    Thanks very much,
    Kristian
    Last edited by kga1978; 11-15-2011, 08:29 PM.

  • #2
    Could you maybe just post the first couple dozen lines of the various files used (or if a bunch of lines look similar just a representative line)?

    Comment


    • #3
      Sure thing:

      Reference:
      >LASV-reference
      GCGCACAGTGGATCCTAGGCATTTTTGGTTGCGCAATTCAAGTGTCCTATTTAAAATGGGACAAATAGTGACATTCTTCCAGGAAGTGCCTCATGTAATAGAAGAGGTGATGAACATTGTTCTCATTGCACTGTCTGTACTAGCAGTGCTGAAAGGTCTGTACAATTTTGCAACGTGTGGCCTTGTTGGTTTGGTCACTTTCCTCCTGTTGTGTGGTAGGTCTTGCACAACCAGTCTTTATAAAGGGGTTTATGAGCTTCAGACTCTGGAACTAAACATGGAGACACTCAATATGACCATGCCTCTCTCCTGCACAAAGAACAACAGTCATCATTATATAATGGTGGGCAATGAGACAGGACTAGAACTGACCTTGACCAACACGAGCATTATTAATCACAAATTTTGCAATCTGTCTGATGCCCACAAAAAGAACCTCTATGACCACGCTCTTATGAGCATAATCTCAACTTTCCACTTGTCCATCCCCAACTTCAATCAGTATGAGGCAATGAGCTGCGATTTTAATGGGGGAAA

      Sorted BAM file:
      ILLUMINA_0142:3:1108:12467:139455#TGACCA/1 0 LASV-reference 2892 20 1S51M * 0 0 GTCTTTGGTCAAGTTGCTGTGAGCTCAAGTTGCCCATATAGACACCTGCACT Z_^cc`ce^aeegedghe_gggcfdhdhhaX^dfghfhhhdhdedg_dfdgh RG:Z:ZGO3HPVJRLW NM:i:2 MD:Z:24T1G24 ZA:Z:<@;0;0;;1;;>
      ILLUMINA_0142:3:1104:8199:92212#TGACCA/1 0 LASV-reference 2893 19 52M * 0 0 CTTTGGTCAAGTTGCTGTGAGCTCAAGTTGCCCATATAGACACCTGCACTCA ^__cc``Yaa^b`beefhehhddf]dfgfhhRabcdbg`fffbcffghhfhf RG:Z:ZGO3HPVJRLW NM:i:3 MD:Z:23T1G24T1 ZA:Z:<@;0;0;;1;;>
      ILLUMINA_0142:3:1108:20971:8153#TGACCA/1 16 LASV-reference 2893 19 52M * 0 0 CTTTGGTCAAGTTGCTGTGAGCTCAAGTTGCCCATATAGACACCTGCACTCA caQRccbeefcecb^PI[dXeb[X`hefdXbSSgbSd_`Qb[eecSc``__^ RG:Z:ZGO3HPVJRLW NM:i:3 MD:Z:23T1G24T1 ZA:Z:<@;0;0;;1;;>
      ILLUMINA_0142:3:2102:12125:81885#TGACCA/1 16 LASV-reference 2894 18 51M1S * 0 0 TTTGGTCAAGTTGCTGTGAGCTCAAGTTGCCCATATAGACACCTGCACTCAG Z_f^fd_abhgbd`cfbdbbYJ`JRe\gebXec`e`Yb[cbabba\`cc__\ RG:Z:ZGO3HPVJRLW NM:i:3 MD:Z:22T1G24T1 ZA:Z:<@;0;0;;1;;>
      ILLUMINA_0142:3:1208:5666:190436#TGACCA/1 16 LASV-reference 2895 20 52M * 0 0 TTGGTCAAGTTGCTGTGAGCTCAAGTTGCCCATATAGACACCTGCACTCAAT c]dhee^ee^^Hehfe_deebdeZeehebgd_gafabQJJeeeeccccc___ RG:Z:ZGO3HPVJRLW NM:i:3 MD:Z:21T1G24T3 ZA:Z:<@;0;0;;1;;>
      ILLUMINA_0142:3:1204:18832:77734#TGACCA/1 0 LASV-reference 2897 19 49M * 0 0 GGTCAAGTTGCTGTGAGCTCAAGTTGCCCATATAGACACCTGCACTCAA abaeeeecggfggghhgfhihiiggiiiiiiiiiihiiiiiiiihiiih RG:Z:ZGO3HPVJRLW NM:i:3 MD:Z:19T1G24T2 ZA:Z:<@;0;0;;1;;>

      The .intervals, .fai and .dict files are exactly as described above - no further text in those.

      Thanks very much
      Last edited by kga1978; 11-16-2011, 04:56 PM. Reason: typo

      Comment


      • #4
        Anybody any thoughts? This is driving me nuts and SRMA doesn't appear to be working either (separate post)

        Thanks in advance.

        Comment


        • #5
          Does your fasta file really say "reference", and not "LASV-reference"?

          Comment


          • #6
            Sorry, that is my bad - I tried to make another reference with just the word 'reference' - but the one I have been using correctly says 'LASV-reference' - I have corrected the typo.

            Comment


            • #7
              You do know you can get in touch directly with the GATK team here:



              They're very responsive to questions.

              Comment


              • #8
                GATK is picky about the file name. Try changing the extension to ".interval_list"

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                18 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                22 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                47 views
                0 likes
                Last Post seqadmin  
                Working...
                X