SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
input BAM files for GATK Jane M Bioinformatics 26 07-30-2015 10:58 PM
GATK -T UnifiedGenotyper on multipel BAM files nguyendofx Bioinformatics 3 04-25-2012 02:27 AM
multiple BAM GATK unifiedgenotyper output memento Bioinformatics 0 02-22-2012 09:12 AM
casava 1.8 bam conversion to gatk bam kingsalex Bioinformatics 1 02-14-2012 12:47 PM
GATK calling Merged Bam files jayce_ocean Bioinformatics 3 03-16-2011 01:15 AM

Reply
 
Thread Tools
Old 06-29-2012, 06:36 AM   #1
newbietonextgen
Member
 
Location: USA

Join Date: Nov 2010
Posts: 56
Unhappy GATK BAM error

Hi

I am having trouble with GATK ability to read my BAM files. THe BAM were created using tophat 2.0.0.4 and I used AddandReplaceReadGroups from Picard tools to do it. The code used was

java -Xmx1g -jar ~/programs/picard-tools-1.47/AddOrReplaceReadGroups.jar I=/home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits.bam O=/home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG.bam SORT_ORDER=coordinate RGLB=Infected RGPL=illumina RGPU=HSWI72892 RGSM=1_4I.

I did use the VALIDATION_STRINGENCY=LENIENT, but to effect. I do index the BAM files. I even tried SortSAM to see if i had a problem. I looked at another thread posted here but nothing happened...

http://seqanswers.com/forums/showthread.php?t=16905

The GATK run code is below....

java -Xmx4g -jar GenomeAnalysisTK.jar -R chicken_order.fa --default_platform illumina --knownSites:variant,vcf ./trial_middle.vcf -I /home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG_reorder.bam -T CountCovariates -cov ReadGroupcovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile /home/sudeep/work/6-20-12/layer/all_infected_bams/1_4I_recaldata.csv
INFO 02:04:00,711 HelpFormatter - ---------------------------------------------------------------------------------
INFO 02:04:00,714 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.6-11-g3b2fab9, Compiled 2012/06/20 13:28:25
INFO 02:04:00,714 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 02:04:00,714 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
INFO 02:04:00,715 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
INFO 02:04:00,715 HelpFormatter - Program Args: -R chicken_order.fa --default_platform illumina --knownSites:variant,vcf ./trial_middle.vcf -I /home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG_reorder.bam -T CountCovariates -cov ReadGroupcovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile /home/sudeep/work/6-20-12/layer/all_infected_bams/1_4I_recaldata.csv
INFO 02:04:00,716 HelpFormatter - Date/Time: 2012/06/29 02:04:00
INFO 02:04:00,716 HelpFormatter - ---------------------------------------------------------------------------------
INFO 02:04:00,716 HelpFormatter - ---------------------------------------------------------------------------------
INFO 02:04:00,737 GenomeAnalysisEngine - Strictness is SILENT
INFO 02:04:00,822 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 02:04:00,851 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.03
INFO 02:04:00,867 RMDTrackBuilder - Loading Tribble index from disk for file ./trial_middle.vcf
INFO 02:04:01,774 CountCovariatesWalker - The covariates being used here:
INFO 02:04:01,774 CountCovariatesWalker - ReadGroupCovariate
INFO 02:04:01,774 CountCovariatesWalker - QualityScoreCovariate
INFO 02:04:01,775 CountCovariatesWalker - CycleCovariate
INFO 02:04:01,775 CountCovariatesWalker - DinucCovariate
INFO 02:04:01,854 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING]
INFO 02:04:01,855 TraversalEngine - Location processed.sites runtime per.1M.sites completed total.runtime remaining
INFO 02:04:03,192 GATKRunReport - Uploaded run statistics report to AWS S3
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 1.6-11-g3b2fab9):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
##### ERROR
##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG_reorder.bam} is malformed: BAM file has a read with mismatching number of bases and base qualities. Offender: HWI-ST913:105:C0EYJACXX:5:1304:11235:16705 [100 bases] [0 quals]
##### ERROR -------------------------------------


Please help. I could be something very simple
newbietonextgen is offline   Reply With Quote
Old 06-29-2012, 08:08 AM   #2
newbietonextgen
Member
 
Location: USA

Join Date: Nov 2010
Posts: 56
Default

I did find something.

This problem is only with Tophat based BAM files. I have a SHRIMP based BAM alignment and GATK works like a charm. Can any one shed some information as to why?
newbietonextgen is offline   Reply With Quote
Old 06-29-2012, 08:58 AM   #3
jstjohn
Member
 
Location: San Francisco, CA

Join Date: Jun 2010
Posts: 35
Default

Maybe a tophat bug? I think this is the important line of that error:

12/layers/all_infected_bams/1_4I_accepted_hits_RG_reorder.bam} is malformed: BAM file has a read with mismatching number of bases and base qualities. Offender: HWI-ST913:105:C0EYJACXX:5:1304:11235:16705 [100 bases] [0 quals]

It is saying that read has no quality values attached. Here is something you could do: run samtools view and grab that specific sequence, then see if indeed the sam line has no quality score information attached, or if it looks weird in some other way. If that is the case then maybe there is some bug with whatever version of tophat you are using?


Here is one way to get that sequence:

samtools view file.bam | grep "HWI-ST913:105:C0EYJACXX:5:1304:11235:16705" > bad_read.sam

then you can look at bad_read.sam and see what's up.
jstjohn is offline   Reply With Quote
Old 06-29-2012, 09:01 AM   #4
jstjohn
Member
 
Location: San Francisco, CA

Join Date: Jun 2010
Posts: 35
Default

Also what tophat command did you use to do the mapping? Could always be a good-ol phred+33 vs phred+64 issue.
jstjohn is offline   Reply With Quote
Old 06-29-2012, 09:03 AM   #5
jstjohn
Member
 
Location: San Francisco, CA

Join Date: Jun 2010
Posts: 35
Default

Also you might want to see if you can find that read in your fastq file and double check that it has quality values there. Sometimes fastq files can become screwed up by various processing steps. Some programs that do mapping and other downstream stuff treat a bad fastq record differently so it could be that one program is dropping that read since it has no quality scores, and the other is including it? I don't know, I am just guessing at possibilities now.
jstjohn is offline   Reply With Quote
Old 06-29-2012, 09:08 AM   #6
newbietonextgen
Member
 
Location: USA

Join Date: Nov 2010
Posts: 56
Default

Tophat: 2.0.0.4

the run command i used..

./tophat -p 4 -G /home/sudeep/work/6-20-12/Gallus_gallus.WASHUC2.67.gtf -o /home/sudeep/work/6-20-12/layers/Infected/1_4I /home/sudeep/programs/bowtie2-2.0.0-beta6/index/chicken_order /home/sudeep/work/6-20-12/layers/Infected/1_4I_R1.fastq.gz /home/sudeep/work/6-20-12/layers/Infected/1_4I_R2.fastq.gz
newbietonextgen is offline   Reply With Quote
Old 06-29-2012, 09:41 AM   #7
newbietonextgen
Member
 
Location: USA

Join Date: Nov 2010
Posts: 56
Default

samtools view file.bam | grep "HWI-ST913:105:C0EYJACXX:5:1304:11235:16705" > bad_read.sam

Output. looks like there is * instead of quality score. Now have to check fastq file....

HWI-ST913:105:C0EYJACXX:5:1304:11235:16705 153 1 134 3 100M * 0 0 GCCTTCAGATCCTTCTCTCCGGACCGTATGCTGACGGACTTCCCTGGCCCTGCTACCTGAGACCTGCTGCTTCCTCCCTGACTTACTCTGCGGCTTCTTC * AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YT:Z:UU NH:i:2 CC:Z:= CP:i:34437630 HI:i:0
HWI-ST913:105:C0EYJACXX:5:1304:11235:16705 393 1 34437630 3 100M * 0 0 GAAGAAGCCGCAGAGTAAGTCAGGGAGGAAGCAGCAGGTCTCAGGTAGCAGGGCCAGGGAAGTCCGTCAGCATACGGTCCGGAGAGAAGGATCTGAAGGC * AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YT:Z:UU NH:i:2 HI:i:1
newbietonextgen is offline   Reply With Quote
Old 06-29-2012, 11:40 AM   #8
newbietonextgen
Member
 
Location: USA

Join Date: Nov 2010
Posts: 56
Default

It's sanger quality (1.9 Illumina pipeline)
newbietonextgen is offline   Reply With Quote
Old 07-01-2012, 02:51 AM   #9
newbietonextgen
Member
 
Location: USA

Join Date: Nov 2010
Posts: 56
Default

also i have tried other software like Splice map and even the SAM file when converted to BAM doesn't pass through GATK BAM norm. SO strange. Shrimp alignment works fine...why is there so much difference in SAM format?
newbietonextgen is offline   Reply With Quote
Old 01-07-2013, 10:31 AM   #10
figo1019
Member
 
Location: germany

Join Date: Jun 2012
Posts: 32
Default

Quote:
Originally Posted by newbietonextgen View Post
Hi

I am having trouble with GATK ability to read my BAM files. THe BAM were created using tophat 2.0.0.4 and I used AddandReplaceReadGroups from Picard tools to do it. The code used was

java -Xmx1g -jar ~/programs/picard-tools-1.47/AddOrReplaceReadGroups.jar I=/home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits.bam O=/home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG.bam SORT_ORDER=coordinate RGLB=Infected RGPL=illumina RGPU=HSWI72892 RGSM=1_4I.

I did use the VALIDATION_STRINGENCY=LENIENT, but to effect. I do index the BAM files. I even tried SortSAM to see if i had a problem. I looked at another thread posted here but nothing happened...

http://seqanswers.com/forums/showthread.php?t=16905

The GATK run code is below....

java -Xmx4g -jar GenomeAnalysisTK.jar -R chicken_order.fa --default_platform illumina --knownSites:variant,vcf ./trial_middle.vcf -I /home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG_reorder.bam -T CountCovariates -cov ReadGroupcovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile /home/sudeep/work/6-20-12/layer/all_infected_bams/1_4I_recaldata.csv
INFO 02:04:00,711 HelpFormatter - ---------------------------------------------------------------------------------
INFO 02:04:00,714 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.6-11-g3b2fab9, Compiled 2012/06/20 13:28:25
INFO 02:04:00,714 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 02:04:00,714 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
INFO 02:04:00,715 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
INFO 02:04:00,715 HelpFormatter - Program Args: -R chicken_order.fa --default_platform illumina --knownSites:variant,vcf ./trial_middle.vcf -I /home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG_reorder.bam -T CountCovariates -cov ReadGroupcovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile /home/sudeep/work/6-20-12/layer/all_infected_bams/1_4I_recaldata.csv
INFO 02:04:00,716 HelpFormatter - Date/Time: 2012/06/29 02:04:00
INFO 02:04:00,716 HelpFormatter - ---------------------------------------------------------------------------------
INFO 02:04:00,716 HelpFormatter - ---------------------------------------------------------------------------------
INFO 02:04:00,737 GenomeAnalysisEngine - Strictness is SILENT
INFO 02:04:00,822 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 02:04:00,851 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.03
INFO 02:04:00,867 RMDTrackBuilder - Loading Tribble index from disk for file ./trial_middle.vcf
INFO 02:04:01,774 CountCovariatesWalker - The covariates being used here:
INFO 02:04:01,774 CountCovariatesWalker - ReadGroupCovariate
INFO 02:04:01,774 CountCovariatesWalker - QualityScoreCovariate
INFO 02:04:01,775 CountCovariatesWalker - CycleCovariate
INFO 02:04:01,775 CountCovariatesWalker - DinucCovariate
INFO 02:04:01,854 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING]
INFO 02:04:01,855 TraversalEngine - Location processed.sites runtime per.1M.sites completed total.runtime remaining
INFO 02:04:03,192 GATKRunReport - Uploaded run statistics report to AWS S3
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 1.6-11-g3b2fab9):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
##### ERROR
##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/home/sudeep/work/6-20-12/layers/all_infected_bams/1_4I_accepted_hits_RG_reorder.bam} is malformed: BAM file has a read with mismatching number of bases and base qualities. Offender: HWI-ST913:105:C0EYJACXX:5:1304:11235:16705 [100 bases] [0 quals]
##### ERROR -------------------------------------


Please help. I could be something very simple

Hi newbietonextgen

I am also facing the similar problem.Have you sorted it out?

Regards
figo1019 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:57 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO