Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GATK BAM error

    Hi

    I am having trouble with GATK ability to read my BAM files. THe BAM were created using tophat 2.0.0.4 and I used AddandReplaceReadGroups from Picard tools to do it. The code used was

    java -Xmx1g -jar ~/programs/picard-tools-1.47/AddOrReplaceReadGroups.jar I=/home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits.bam O=/home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG.bam SORT_ORDER=coordinate RGLB=Infected RGPL=illumina RGPU=HSWI72892 RGSM=1_4I.

    I did use the VALIDATION_STRINGENCY=LENIENT, but to effect. I do index the BAM files. I even tried SortSAM to see if i had a problem. I looked at another thread posted here but nothing happened...

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    The GATK run code is below....

    java -Xmx4g -jar GenomeAnalysisTK.jar -R chicken_order.fa --default_platform illumina --knownSites:variant,vcf ./trial_middle.vcf -I /home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG_reorder.bam -T CountCovariates -cov ReadGroupcovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile /home/sudeep/work/6-20-12/layer/all_infected_bams/1_4I_recaldata.csv
    INFO 02:04:00,711 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 02:04:00,714 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.6-11-g3b2fab9, Compiled 2012/06/20 13:28:25
    INFO 02:04:00,714 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 02:04:00,714 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
    INFO 02:04:00,715 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
    INFO 02:04:00,715 HelpFormatter - Program Args: -R chicken_order.fa --default_platform illumina --knownSites:variant,vcf ./trial_middle.vcf -I /home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG_reorder.bam -T CountCovariates -cov ReadGroupcovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile /home/sudeep/work/6-20-12/layer/all_infected_bams/1_4I_recaldata.csv
    INFO 02:04:00,716 HelpFormatter - Date/Time: 2012/06/29 02:04:00
    INFO 02:04:00,716 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 02:04:00,716 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 02:04:00,737 GenomeAnalysisEngine - Strictness is SILENT
    INFO 02:04:00,822 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 02:04:00,851 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.03
    INFO 02:04:00,867 RMDTrackBuilder - Loading Tribble index from disk for file ./trial_middle.vcf
    INFO 02:04:01,774 CountCovariatesWalker - The covariates being used here:
    INFO 02:04:01,774 CountCovariatesWalker - ReadGroupCovariate
    INFO 02:04:01,774 CountCovariatesWalker - QualityScoreCovariate
    INFO 02:04:01,775 CountCovariatesWalker - CycleCovariate
    INFO 02:04:01,775 CountCovariatesWalker - DinucCovariate
    INFO 02:04:01,854 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING]
    INFO 02:04:01,855 TraversalEngine - Location processed.sites runtime per.1M.sites completed total.runtime remaining
    INFO 02:04:03,192 GATKRunReport - Uploaded run statistics report to AWS S3
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version 1.6-11-g3b2fab9):
    ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ##### ERROR Please do not post this error to the GATK forum
    ##### ERROR
    ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
    ##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
    ##### ERROR
    ##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG_reorder.bam} is malformed: BAM file has a read with mismatching number of bases and base qualities. Offender: HWI-ST913:105:C0EYJACXX:5:1304:11235:16705 [100 bases] [0 quals]
    ##### ERROR -------------------------------------


    Please help. I could be something very simple

  • #2
    I did find something.

    This problem is only with Tophat based BAM files. I have a SHRIMP based BAM alignment and GATK works like a charm. Can any one shed some information as to why?

    Comment


    • #3
      Maybe a tophat bug? I think this is the important line of that error:

      12/layers/all_infected_bams/1_4I_accepted_hits_RG_reorder.bam} is malformed: BAM file has a read with mismatching number of bases and base qualities. Offender: HWI-ST913:105:C0EYJACXX:5:1304:11235:16705 [100 bases] [0 quals]

      It is saying that read has no quality values attached. Here is something you could do: run samtools view and grab that specific sequence, then see if indeed the sam line has no quality score information attached, or if it looks weird in some other way. If that is the case then maybe there is some bug with whatever version of tophat you are using?


      Here is one way to get that sequence:

      samtools view file.bam | grep "HWI-ST913:105:C0EYJACXX:5:1304:11235:16705" > bad_read.sam

      then you can look at bad_read.sam and see what's up.

      Comment


      • #4
        Also what tophat command did you use to do the mapping? Could always be a good-ol phred+33 vs phred+64 issue.

        Comment


        • #5
          Also you might want to see if you can find that read in your fastq file and double check that it has quality values there. Sometimes fastq files can become screwed up by various processing steps. Some programs that do mapping and other downstream stuff treat a bad fastq record differently so it could be that one program is dropping that read since it has no quality scores, and the other is including it? I don't know, I am just guessing at possibilities now.

          Comment


          • #6
            Tophat: 2.0.0.4

            the run command i used..

            ./tophat -p 4 -G /home/sudeep/work/6-20-12/Gallus_gallus.WASHUC2.67.gtf -o /home/sudeep/work/6-20-12/layers/Infected/1_4I /home/sudeep/programs/bowtie2-2.0.0-beta6/index/chicken_order /home/sudeep/work/6-20-12/layers/Infected/1_4I_R1.fastq.gz /home/sudeep/work/6-20-12/layers/Infected/1_4I_R2.fastq.gz

            Comment


            • #7
              samtools view file.bam | grep "HWI-ST913:105:C0EYJACXX:5:1304:11235:16705" > bad_read.sam

              Output. looks like there is * instead of quality score. Now have to check fastq file....

              HWI-ST913:105:C0EYJACXX:5:1304:11235:16705 153 1 134 3 100M * 0 0 GCCTTCAGATCCTTCTCTCCGGACCGTATGCTGACGGACTTCCCTGGCCCTGCTACCTGAGACCTGCTGCTTCCTCCCTGACTTACTCTGCGGCTTCTTC * AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YT:Z:UU NH:i:2 CC:Z:= CP:i:34437630 HI:i:0
              HWI-ST913:105:C0EYJACXX:5:1304:11235:16705 393 1 34437630 3 100M * 0 0 GAAGAAGCCGCAGAGTAAGTCAGGGAGGAAGCAGCAGGTCTCAGGTAGCAGGGCCAGGGAAGTCCGTCAGCATACGGTCCGGAGAGAAGGATCTGAAGGC * AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YT:Z:UU NH:i:2 HI:i:1

              Comment


              • #8
                It's sanger quality (1.9 Illumina pipeline)

                Comment


                • #9
                  also i have tried other software like Splice map and even the SAM file when converted to BAM doesn't pass through GATK BAM norm. SO strange. Shrimp alignment works fine...why is there so much difference in SAM format?

                  Comment


                  • #10
                    Originally posted by newbietonextgen View Post
                    Hi

                    I am having trouble with GATK ability to read my BAM files. THe BAM were created using tophat 2.0.0.4 and I used AddandReplaceReadGroups from Picard tools to do it. The code used was

                    java -Xmx1g -jar ~/programs/picard-tools-1.47/AddOrReplaceReadGroups.jar I=/home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits.bam O=/home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG.bam SORT_ORDER=coordinate RGLB=Infected RGPL=illumina RGPU=HSWI72892 RGSM=1_4I.

                    I did use the VALIDATION_STRINGENCY=LENIENT, but to effect. I do index the BAM files. I even tried SortSAM to see if i had a problem. I looked at another thread posted here but nothing happened...

                    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                    The GATK run code is below....

                    java -Xmx4g -jar GenomeAnalysisTK.jar -R chicken_order.fa --default_platform illumina --knownSites:variant,vcf ./trial_middle.vcf -I /home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG_reorder.bam -T CountCovariates -cov ReadGroupcovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile /home/sudeep/work/6-20-12/layer/all_infected_bams/1_4I_recaldata.csv
                    INFO 02:04:00,711 HelpFormatter - ---------------------------------------------------------------------------------
                    INFO 02:04:00,714 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.6-11-g3b2fab9, Compiled 2012/06/20 13:28:25
                    INFO 02:04:00,714 HelpFormatter - Copyright (c) 2010 The Broad Institute
                    INFO 02:04:00,714 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
                    INFO 02:04:00,715 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
                    INFO 02:04:00,715 HelpFormatter - Program Args: -R chicken_order.fa --default_platform illumina --knownSites:variant,vcf ./trial_middle.vcf -I /home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG_reorder.bam -T CountCovariates -cov ReadGroupcovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile /home/sudeep/work/6-20-12/layer/all_infected_bams/1_4I_recaldata.csv
                    INFO 02:04:00,716 HelpFormatter - Date/Time: 2012/06/29 02:04:00
                    INFO 02:04:00,716 HelpFormatter - ---------------------------------------------------------------------------------
                    INFO 02:04:00,716 HelpFormatter - ---------------------------------------------------------------------------------
                    INFO 02:04:00,737 GenomeAnalysisEngine - Strictness is SILENT
                    INFO 02:04:00,822 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
                    INFO 02:04:00,851 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.03
                    INFO 02:04:00,867 RMDTrackBuilder - Loading Tribble index from disk for file ./trial_middle.vcf
                    INFO 02:04:01,774 CountCovariatesWalker - The covariates being used here:
                    INFO 02:04:01,774 CountCovariatesWalker - ReadGroupCovariate
                    INFO 02:04:01,774 CountCovariatesWalker - QualityScoreCovariate
                    INFO 02:04:01,775 CountCovariatesWalker - CycleCovariate
                    INFO 02:04:01,775 CountCovariatesWalker - DinucCovariate
                    INFO 02:04:01,854 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING]
                    INFO 02:04:01,855 TraversalEngine - Location processed.sites runtime per.1M.sites completed total.runtime remaining
                    INFO 02:04:03,192 GATKRunReport - Uploaded run statistics report to AWS S3
                    ##### ERROR ------------------------------------------------------------------------------------------
                    ##### ERROR A USER ERROR has occurred (version 1.6-11-g3b2fab9):
                    ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
                    ##### ERROR Please do not post this error to the GATK forum
                    ##### ERROR
                    ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
                    ##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
                    ##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
                    ##### ERROR
                    ##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG_reorder.bam} is malformed: BAM file has a read with mismatching number of bases and base qualities. Offender: HWI-ST913:105:C0EYJACXX:5:1304:11235:16705 [100 bases] [0 quals]
                    ##### ERROR -------------------------------------


                    Please help. I could be something very simple

                    Hi newbietonextgen

                    I am also facing the similar problem.Have you sorted it out?

                    Regards

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:37 PM
                    0 responses
                    10 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 06:07 PM
                    0 responses
                    9 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    49 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    67 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X