Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GATK complains that my bam file isn't indexed

    I am running GATK's RealignerTargetCreator with this command:

    java -Xmx36g -jar GenomeAnalysisTK.jar -S LENIENT -T RealignerTargetCreator -R human_g1k_v37.fasta -o SRR098359.interval_list -I SRR098359.bam -B:snps,VCF 00-All.vcf

    The process quits with an error that includes this:

    Cannot process the provided BAM file(s) because they were not indexed.

    However, the bam file WAS indexed. I see the .bai file there. I recreated the index with the following command (in case something had gone wrong creating it):

    samtools index SRR098359_sorted.bam

    It created an identical .bai file and I ran RealignerTargetCreator again, and the same thing happened. Does anyone know what I'm doing wrong?

    Thank you.

    Eric

  • #2
    With a quick glance of what they require, it seems you may require your bam file to be coordinate sorted (before .bai file creation). You should have a look at picard tools.

    Comment


    • #3
      Originally posted by cedance View Post
      With a quick glance of what they require, it seems you may require your bam file to be coordinate sorted (before .bai file creation). You should have a look at picard tools.
      Hi cedance,

      Thanks for the suggestion, but I don't think this is my problem. My previous command coordinate-sorted them:

      java -jar /home/efoss/sequencing/picard-tools-1.52/SortSam.jar VALIDATION_STRINGENCY=LENIENT INPUT=SRR098359.bam OUTPUT=SRR098359_sorted.bam SORT_ORDER=coordinate

      Eric

      Comment


      • #4
        One last thing I could think of (the documentation says 1 or more aligned bam files as input). After you mapped with the software of your choice (the reads to your reference), did you obtain aligned reads alone? Maybe you should try using picard tools "ViewSam" with ALIGNMENT_STATUS=aligned to obtain the aligned reads from the bam file and then sort and index it. I would use picard tools for every operation instead of samtools. Sorry, I couldn't be of more help, but I guess this is worth a try.

        Comment


        • #5
          Maybe a typo, but why are you not using the SRR098359_sorted.bam file when you call GATK? Your command says you are using the unsorted BAM file.

          Comment


          • #6
            Hi maubp,

            THANK YOU, THANK YOU, THANK YOU!!!!!!!!! I stared at that so long without seeing my mistake. I feel very stupid, but also very grateful that you caught it.

            Best wishes,

            Eric

            Comment


            • #7


              Happy to help.

              Comment


              • #8
                I get 2 different error messages when I run gatk

                If I use the output of picard markedduplicate, I get error message on unindexed bam file whereas the bam file is already indexed as it is already generated by picard samsort before invoking picard markedduplicate. bai file exist too.

                And if I use the output of picard sortsam directly, I get
                ERROR MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '10'

                What would you advise?

                Thanks,

                Carol
                -----------------------------------
                java -jar SortSam.jar SO=coordinate INPUT=~/NGS/data/SRR062641.filt.sam OUTPUT=~/NGS/data/SRR062641.filt.bam VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true

                - no error is generated

                ~/NGS/pgm/GenomeAnalysisTK-2.4-9-g532efad$ java -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R /home/carolw/NGS/hg19/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa -o ~/NGS/data/SRR062641.filt.bam.list -I ~/NGS/data/SRR062641.filt.bam

                ERROR MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '10'

                -----------------------------------------------------------
                java -jar MarkDuplicates.jar INPUT=~/NGS/data/SRR062641.filt.bam OUTPUT=~/NGS/data/SRR062641.filt.marked.bam METRICS_FILE=metrics VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true

                - no error is generated

                java -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R /home/carolw/NGS/hg19/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa -o ~/NGS/data/SRR062641.filt.bam.list -I ~/NGS/data/SRR062641.filt.marked.bam

                ERROR MESSAGE: Invalid command line: Cannot process the provided BAM file(s) because they were not indexed. The GATK does offer limited processing of unindexed BAMs in --unsafe mode, but this GATK feature is currently unsupported.

                Comment


                • #9
                  Hi CarolW,

                  Sorry - I don't know what to suggest other than to look very carefully at the name of the index file compared to the name of the bam file.

                  Good luck.

                  Eric

                  Comment


                  • #10
                    I used "samtools index bamfile" created a bam.bai file, then i ran again.it was successful. thanks a lot.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 08:47 AM
                    0 responses
                    11 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    60 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    59 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    53 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X