SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
GATK variant recalibrator input files reeso123 Bioinformatics 14 05-30-2012 02:56 AM
casava 1.8 bam conversion to gatk bam kingsalex Bioinformatics 1 02-14-2012 11:47 AM
GATK pileup with merged BAM files yxl Bioinformatics 0 04-22-2011 07:07 PM
cufflinks accepting BAM files as input??? PFS Bioinformatics 1 03-18-2011 11:56 AM
GATK calling Merged Bam files jayce_ocean Bioinformatics 3 03-16-2011 12:15 AM

Reply
 
Thread Tools
Old 01-17-2012, 03:58 AM   #1
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default input BAM files for GATK

Hello,

I try to use GATK and I think that my BAM files are not well formated. I read that:

Quote:
The file must be binary (.bam).
The file must be indexed.
The file must be sorted in coordinate order with respect to the reference (i.e. the contig ordering in your bam must exactly match that of the reference you are using).
The file must have a proper bam header with read groups. Each read group must contain the platform (PL) and sample (SM) tags. For the platform value, we currently support 454, LS454, Illumina, Solid, ABI_Solid, and CG (all case-insensitive).
Each read in the file must be associated with exactly one read group.
I did the three first steps with samtools, but I'm not sure if my file is correctly sorted.
My main problem is for the latest steps. I don't know how to do and I don't understand the notion of read group.

Could someone clarify this please?
Thanks,
Jane
Jane M is offline   Reply With Quote
Old 01-17-2012, 04:32 AM   #2
Robby
Member
 
Location: Germany

Join Date: Mar 2011
Posts: 68
Default

Hello,

Which mapper/assembler do you use? BWA for example has the parameter "-r" in the sampe/samse step, which specifies the read group (in the header as well as in the reads).

Best regards
Robby
Robby is offline   Reply With Quote
Old 01-17-2012, 05:25 AM   #3
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

If you look at the SAM header in your BAM file, do you have an @RG lines (read groups) as well as one @SQ line per reference sequence?
maubp is offline   Reply With Quote
Old 01-17-2012, 05:28 AM   #4
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default

It has been done by a sequencing plateform (allowance), they used CASAVA, I don't know if it was the version 1.7 or 1.8...
Jane M is offline   Reply With Quote
Old 01-17-2012, 05:30 AM   #5
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default

Quote:
Originally Posted by Robby View Post
Hello,

Which mapper/assembler do you use? BWA for example has the parameter "-r" in the sampe/samse step, which specifies the read group (in the header as well as in the reads).

Best regards
Robby
Here is my sam file:
Quote:
more s_garma-fibros_converted.sam
@PG ID:illumina_export2sam.pl VN:2.0.0 CL:/usr/local/bin/illumina_export2sam.pl --read1=s_garma-fibros_1_export.txt
--read2=s_garma-fibros_2_export.txt
HWI-ST584_81:4:1101:1198:2065 89 chr7 156837088 28 76M * 0 0 GGCTTGAACAACGGAAATGTGTCAAATGT
GTCAGCTCCCAGCTCAGAGACTGGGAGACCAGGCCGAGGCGCCGGCN ############################################################################ BC:Z:
AGTCAA XD:Z:75A SM:i:28 AS:i:0
HWI-ST584_81:4:1101:1198:2065 165 * 0 0 * chr7 156837088 0 GGAGCCCTGCTGCGTAGTNNNNNNNNNNA
CACGGTGTATTATTACTTTCCCAGGACCACCGTAACAAAGTAGCACA CCCFFFFFHHHHGJHGIH########################################################## BC:Z:
AGTCAA
HWI-ST584_81:4:1101:1174:2078 73 chr5 41048452 254 76M * 0 0 AAACGTGTTTTCCATAGGTCTACCAATTT
TGGGTGAATTATCTCAGGCAGTATCTTCAAAAGCCCTATTGCACCAG CCCFFFFDHHHHHIIIJJIJJIJJJJJJJJJJJGHGJJJIJIJJJJIJJJIIJJJJJJJIIJGIJJJIIIJJHHHA BC:Z:
AGTCAA XD:Z:76 SM:i:359 AS:i:0
HWI-ST584_81:4:1101:1174:2078 133 * 0 0 * chr5 41048452 0 GCCCCTGAAATTGATGAGNNNNNNNNNNT
CTATCCTTCAGGTAATATCTATGCCTGCCAGTTTAGGGGAGTTACAT @@CFFFFFHGHFHIJJFG########################################################## BC:Z:
AGTCAA
HWI-ST584_81:4:1101:1242:2096 73 chr9 135156764 254 76M * 0 0 CCAATCCAAGAAAGATGTCTCTCCCTCCT
GAAAACAAAAATTTTAAAAAGCCCCTTCCATTTTAAAGCAATCTGAA ;<<:BDDD<D;FFFABFFCFEHFFFIFIIIIIIII=BGIIIFEFC>G@GDECD;BFF@FFEFCFDAEFEFCEEFA? BC:Z:
AGTCAA XD:Z:76 SM:i:359 AS:i:0
HWI-ST584_81:4:1101:1242:2096 133 * 0 0 * chr9 135156764 0 ACTGCCTCTCTCTTCTGTNTCNNNNNNNG
GCTGGACAGTCTTGTGAAATTGAGACTCTTACTCCACTCATCCATCC ;?@DHHDHDEBE<G########################################################## BC:Z:
AGTCAA
And my bam file is not readable:

Quote:
more s_garma-fibros_converted_sorted.bam
 7�,7�cr"�1e��9���1X3�8�
��5�PCY�N(ǒ���,6��^L�%D�<CN8ψ�{�/vϘ!,�.�g*–��284��yf^L�Xa<s�S���x^L���<K��6m�P��C�ƃp�!Cz�=&ψ��3�
^L
la��d����
--Plus--(0%)
Thanks for your answers !
Jane M is offline   Reply With Quote
Old 01-17-2012, 06:14 AM   #6
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 833
Default

you can read BAM files using samtools view <file.bam>. Pipe that through more or less to reduce information overload.
gringer is offline   Reply With Quote
Old 01-17-2012, 06:45 AM   #7
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default

Quote:
Originally Posted by gringer View Post
you can read BAM files using samtools view <file.bam>. Pipe that through more or less to reduce information overload.
Thank you.
So my bam file looks like this:

Quote:
HWI-ST584_81:4:2107:8344:147434 147 chr1 13474 26 76M = 13404 -146 TCCTGACAGGCAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGGAGACGGTGTTTGTCAT #A@CA@@38(CCACC@?>AA?B?BA=A<BA;ED@=GACGCCEFB;???1?)0081@>HF;GFDFHHDFBDFFF@@@ BC:Z:AGTCAA XD:Z:76 SM:i:3 AS:i:26
HWI-ST584_81:4:1204:11080:95072 147 chr1 13480 26 76M = 13418 -138 CAGGCAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGGAGACGGTGTTTGTCATGGGCCT >CACDACCAC?>CCABBDDB?;8>DFED=;:7D===8DGCIGCJIJGIIGGIHIIDHE=BGFGGHHHHFFFFFC@@ BC:Z:AGTCAA XD:Z:76 SM:i:3 AS:i:26
There is no @RG lines and @SQ line...
Jane M is offline   Reply With Quote
Old 01-17-2012, 06:47 AM   #8
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 833
Default

Quote:
There is no @RG lines and @SQ line...
*cough*

Sorry, I should have mentioned that earlier. If you want to show the header information, you need to put a -h in there as well.

Code:
samtools view -h <file.bam>
I recommend looking at the samtools documentation and the SAM specification:

http://samtools.sourceforge.net/
gringer is offline   Reply With Quote
Old 01-17-2012, 06:59 AM   #9
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

You definitely need @RG records in your file.

To check this do
Code:
samtools view -H file.bam | grep ^@RG
This will probably yield no results.

Use Picard to add a readgroup to your file e.g.
Code:
java -jar $PICARD/AddOrReplaceReadGroups I=file.bam O=newfile.bam \
SORT_ORDER=coordinate CREATE_INDEX=true  \
RGPL=illumina RGID=11 RGSM=mysample
After this is done then run GATK on newfile.bam

Replace the RGID/RGSM with the values you desire
zee is offline   Reply With Quote
Old 01-17-2012, 08:36 AM   #10
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default

Quote:
Originally Posted by zee View Post
You definitely need @RG records in your file.

To check this do
Code:
samtools view -H file.bam | grep ^@RG
This will probably yield no results.



Use Picard to add a readgroup to your file e.g.
Code:
java -jar $PICARD/AddOrReplaceReadGroups I=file.bam O=newfile.bam \
SORT_ORDER=coordinate CREATE_INDEX=true  \
RGPL=illumina RGID=11 RGSM=mysample
After this is done then run GATK on newfile.bam

Replace the RGID/RGSM with the values you desire
Thanks for your help!

Exactly, I've got no result:
Quote:
[merlevede@U1009-PCJane patient1]$ samtools view -H s_garma-fibros_converted_sorted.bam | grep ^@RG
[merlevede@U1009-PCJane patient1]$
Is it easier to use Picard to add the header or to rerun the alignment, with BWA this time and with the right options?
Jane M is offline   Reply With Quote
Old 01-17-2012, 11:19 AM   #11
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Using Picard is easy enough, the command that zee posted should solve your problems.
dpryan is offline   Reply With Quote
Old 01-18-2012, 03:01 AM   #12
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default

Thanks a lot for your help.
I have installed picard and I tried to run it:

Quote:
java -jar ./AddOrReplaceReadGroups.jar I=~/../../../data/patient1/s_garma-fibros_converted_sorted.bam O=~/../../../data/patient1/picard_s_garma-fibros_converted_sorted.bam \
> SORT_ORDER=coordinate CREATE_INDEX=true \
> RGPL=illumina RGID=11 RGSM=garma-fibros
ERROR: Option 'RGLB' is required.
but I need one more option: RGLB which is defined as:
Quote:
RGLB=String
LB=String Read Group Library Required.
I found this discussion http://seqanswers.com/forums/showthread.php?t=11887 and RGLB seems to require a fastq file... So, will Picard more or less realigned my alignment?
I will try to add my fastq file
Jane M is offline   Reply With Quote
Old 01-18-2012, 03:15 AM   #13
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

You're just telling Picard to creating a missing information section at the start of the file and add a small label to the associated reads, it won't do anything else. The UnifiedGenotyper error message you posted at the start of the thread indicated that it doesn't even use the RGLB field, so you can probably enter anything you want in there. You could probably enter gibberish and not have it matter for this purpose.
dpryan is offline   Reply With Quote
Old 01-18-2012, 05:24 AM   #14
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default

Quote:
Originally Posted by dpryan View Post
You're just telling Picard to creating a missing information section at the start of the file and add a small label to the associated reads, it won't do anything else. The UnifiedGenotyper error message you posted at the start of the thread indicated that it doesn't even use the RGLB field, so you can probably enter anything you want in there. You could probably enter gibberish and not have it matter for this purpose.
OK, then I tried to fill the option with "gibberish", it ran for a while and gave:
Quote:
java -jar ./AddOrReplaceReadGroups.jar I=~/../../../data/patient1/s_garma-fibros_converted_sorted.bam O=~/../../../data/patient1/picard_s_garma-fibros_converted_sorted.bam \
> SORT_ORDER=coordinate CREATE_INDEX=true \
> RGPL=illumina RGID=garma-fibros RGSM=garma RGLB=toto RGPU=tata
[Wed Jan 18 13:53:51 CET 2012] net.sf.picard.sam.AddOrReplaceReadGroups INPUT=/home/merlevede/../../../data/patient1/s_garma-fibros_converted_sorted.bam OUTPUT=/home/merlevede/../../../data/patient1/picard_s_garma-fibros_converted_sorted.bam SORT_ORDER=coordinate RGID=garma-fibros RGLB=toto RGPL=illumina RGPU=tata RGSM=garma CREATE_INDEX=true VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false
[Wed Jan 18 13:53:51 CET 2012] Executing as merlevede@U1009-PCJane on Linux 3.1.6-1.fc16.x86_64 amd64; OpenJDK 64-Bit Server VM 1.6.0_22-b22; Picard version: 1.60(1086)
INFO 2012-01-18 13:53:51 AddOrReplaceReadGroups Created read group ID=garma-fibros PL=illumina LB=toto SM=garma


[Wed Jan 18 14:08:08 CET 2012] net.sf.picard.sam.AddOrReplaceReadGroups done. Elapsed time: 14,29 minutes.
Runtime.totalMemory()=2009399296
Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Read name HWI-ST584_81:4:1106:7158:91967, CIGAR M operator maps off end of reference
at net.sf.samtools.SAMUtils.processValidationErrors(SAMUtils.java:448)
at net.sf.samtools.BAMRecord.getCigar(BAMRecord.java:247)
at net.sf.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:136)
at net.sf.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:37)
at net.sf.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:210)
at net.sf.samtools.util.SortingCollection.add(SortingCollection.java:150)
at net.sf.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:170)
at net.sf.picard.sam.AddOrReplaceReadGroups.doWork(AddOrReplaceReadGroups.java:93)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
at net.sf.picard.sam.AddOrReplaceReadGroups.main(AddOrReplaceReadGroups.java:61)
[merlevede@U1009-PCJane picard-tools-1.60]$
I needed to add also the option RGPU.
The bam file that I got is of course empty
Jane M is offline   Reply With Quote
Old 01-18-2012, 05:34 AM   #15
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

From the Picard FAQ:
Quote:
Q: A Picard program complains that CIGAR M operator maps off the end of reference. I want this record to be treated as valid despite the fact that the alignment end is greater than the length of the reference sequence.

A: Picard validation errors may be turned into warnings by passing the command line argument VALIDATION_STRINGENCY=LENIENT. Picard validation messages may be suppressed completely with VALIDATION_STRINGENCY=SILENT. Another option is to use CleanSam to soft-clip these reads so they don't map off the end of the reference.
dpryan is offline   Reply With Quote
Old 01-18-2012, 07:48 AM   #16
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default

Thank you dpryan,

I think that it's finally working. I ran Picard and I got an non empty file and:
Quote:
java -jar ./AddOrReplaceReadGroups.jar I=~/../../../data/patient1/s_garma-fibros_converted_sorted.bam O=~/../../../data/patient1/picard_s_garma-fibros_converted_sorted.bam \
> SORT_ORDER=coordinate CREATE_INDEX=true \
> RGPL=illumina RGID=garma-fibros RGSM=garma RGLB=toto RGPU=tata VALIDATION_STRINGENCY=LENIENT
[Wed Jan 18 16:42:12 CET 2012] net.sf.picard.sam.AddOrReplaceReadGroups INPUT=/home/merlevede/../../../data/patient1/s_garma-fibros_converted_sorted.bam OUTPUT=/home/merlevede/../../../data/patient1/picard_s_garma-fibros_converted_sorted.bam SORT_ORDER=coordinate RGID=garma-fibros RGLB=toto RGPL=illumina RGPU=tata RGSM=garma VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false
[Wed Jan 18 16:42:12 CET 2012] Executing as merlevede@U1009-PCJane on Linux 3.1.6-1.fc16.x86_64 amd64; OpenJDK 64-Bit Server VM 1.6.0_22-b22; Picard version: 1.60(1086)
INFO 2012-01-18 16:42:12 AddOrReplaceReadGroups Created read group ID=garma-fibros PL=illumina LB=toto SM=garma

Ignoring SAM validation error: ERROR: Read name HWI-ST584_81:4:1106:7158:91967, CIGAR M operator maps off end of reference
Ignoring SAM validation error: ERROR: Read name HWI-ST584_81:4:1205:8058:144770, CIGAR M operator maps off end of reference
Ignoring SAM validation error: ERROR: Read name HWI-ST584_81:4:1206:20528:185225, CIGAR M operator maps off end of reference
Ignoring SAM validation error: ERROR: Read name HWI-ST584_81:4:1203:4551:95140, CIGAR M operator maps off end of reference
Ignoring SAM validation error: ERROR: Read name HWI-ST584_81:4:2205:4442:123188, CIGAR M operator maps off end of reference
Ignoring SAM validation error: ERROR: Read name HWI-ST584_81:4:1204:17464:45698, CIGAR M operator maps off end of reference
Ignoring SAM validation error: ERROR: Read name HWI-ST584_81:4:2208:16832:136911, CIGAR M operator maps off end of reference
Ignoring SAM validation error: ERROR: Read name HWI-ST584_81:4:1107:17717:4065, CIGAR M operator maps off end of reference
Ignoring SAM validation error: ERROR: Read name HWI-ST584_81:4:2104:7277:38433, CIGAR M operator maps off end of reference
Ignoring SAM validation error: ERROR: Read name HWI-ST584_81:4:1105:16587:151639, CIGAR M operator maps off end of reference
Ignoring SAM validation error: ERROR: Read name HWI-ST584_81:4:2108:18278:149545, CIGAR M operator maps off end of reference
Ignoring SAM validation error: ERROR: Read name HWI-ST584_81:4:1104:12598:60315, CIGAR M operator maps off end of reference
Ignoring SAM validation error: ERROR: Read name HWI-ST584_81:4:1105:1489:49925, CIGAR M operator maps off end of reference
Ignoring SAM validation error: ERROR: Read name HWI-ST584_81:4:2205:16036:86626, CIGAR M operator maps off end of reference
Ignoring SAM validation error: ERROR: Read name HWI-ST584_81:4:1204:9783:76003, CIGAR M operator maps off end of reference
Ignoring SAM validation error: ERROR: Read name HWI-ST584_81:4:1103:8346:98868, CIGAR M operator maps off end of reference
Ignoring SAM validation error: ERROR: Read name HWI-ST584_81:4:1105:8354:181686, CIGAR M operator maps off end of reference
Ignoring SAM validation error: ERROR: Read name HWI-ST584_81:4:1108:9333:173330, CIGAR M operator maps off end of reference
Ignoring SAM validation error: ERROR: Read name HWI-ST584_81:4:2106:5867:60527, CIGAR M operator maps off end of reference
Ignoring SAM validation error: ERROR: Read name HWI-ST584_81:4:2202:1737:149838, CIGAR M operator maps off end of reference
Ignoring SAM validation error: ERROR: Read name HWI-ST584_81:4:1203:14352:92986, CIGAR M operator maps off end of reference
[Wed Jan 18 17:23:48 CET 2012] net.sf.picard.sam.AddOrReplaceReadGroups done. Elapsed time: 41,60 minutes.
Runtime.totalMemory()=733675520
[merlevede@U1009-PCJane picard-tools-1.60]$
et la commande de zee donne cette fois quelque chose :
Quote:
[merlevede@U1009-PCJane patient1]$ samtools view -H picard_s_garma-fibros_converted_sorted.bam | grep ^@RG
@RG ID:garma-fibros PL:illumina PU:tata LB:toto SM:garma
To be sure that my BAM files are correct now, I'm running Picard on my "(tumoral) BAM file".
Then, I will rerun GATK on my two files to do the SomaticIndelDetector analysis.
I will keep you informed if it's definitely ok. I hope so !
Jane M is offline   Reply With Quote
Old 01-19-2012, 01:19 PM   #17
Carlos Borroto
Member
 
Location: Baltimore, MD

Join Date: Mar 2011
Posts: 19
Default

Quote:
Originally Posted by Jane M View Post
Code:
[merlevede@U1009-PCJane patient1]$ samtools view -H picard_s_garma-fibros_converted_sorted.bam | grep ^@RG
@RG	ID:garma-fibros	PL:illumina	PU:tata	LB:toto	SM:garma
I think you need to be a little more careful than just putting gibberish on some of these fields. I'm in the process of getting familiar with GATK, there are some parts and don't quite understand yet, like the use of these fields.

This two links are the principal source of my information:
Best Practice Variant Detection with the GATK v3 [broadinstitute.org]
Exome sequencing analysis manual [seqanswers.com]

From what I can understand, several of the tools in the GATK pipeline use these fields in order to do their magic. Again I'm not close to be sure what are the dos and don'ts here, but my rule of thumb is to set these three to the same value, sample name(ex. control01, treatment01, etc):
RGLB=String Read Group Library Required.
RGPU=String Read Group platform unit (eg. run barcode) Required.
RGSM=String Read Group sample name Required.

And this one to my platform(ex. illumina):
RGPL=String Read Group platform (e.g. illumina, solid) Required.

This might not be completely correct, but this way at least I'm not confusing the tools telling it two samples are from the same sequencer lane or library preparation, when they are actually not. Which you might be doing if you reuse the same 'gibberish' for more than one BAM file.

I would actually love to find a clear documentation on how to set the correct values. Mainly what information I need to get from the sequencing center to set these values. Are there any documentation I could take a look at?

Regards,
Carlos
Carlos Borroto is offline   Reply With Quote
Old 01-20-2012, 02:24 AM   #18
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default

Thanks for the information Carlos. If you find a clear documentation for setting these parameters, I am also interested !


I ran picard on my two datasets and my BAM files seem to be correctly formatted now.
I reran GATK, but I still have an error. It seems that it doesn't accept the mitochondrial chromosome. That's a pity because I was finally at the end of the analysis, after one week of trials

Quote:
/opt/jdk1.7.0_02/bin/java -Xmx10g -jar GenomeAnalysisTK.jar -R ~/fasta/hg19.fasta -T SomaticIndelDetector --minCoverage 10 -o ~/../../../data/patient1/garma_indels.vcf -verbose indels.txt -I:normal ~/../../../data/patient1/picard_s_garma-fibros_converted_sorted.bam -I:tumor ~/../../../data/patient1/picard_s_garma-296_converted_sorted.bam

INFO 09:28:04,178 HelpFormatter - ---------------------------------------------------------------------------------
INFO 09:28:04,180 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.4-15-gcd43f01, Compiled 2012/01/12 16:14:10
INFO 09:28:04,180 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 09:28:04,180 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
INFO 09:28:04,181 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
INFO 09:28:04,181 HelpFormatter - Program Args: -R /home/merlevede/fasta/hg19.fasta -T SomaticIndelDetector --minCoverage 10 -o /home/merlevede/../../../data/patient1/garma_indels.vcf -verbose indels.txt -I:normal /home/merlevede/../../../data/patient1/picard_s_garma-fibros_converted_sorted.bam -I:tumor /home/merlevede/../../../data/patient1/picard_s_garma-296_converted_sorted.bam
INFO 09:28:04,182 HelpFormatter - Date/Time: 2012/01/20 09:28:04
INFO 09:28:04,182 HelpFormatter - ---------------------------------------------------------------------------------
INFO 09:28:04,182 HelpFormatter - ---------------------------------------------------------------------------------
INFO 09:28:04,195 GenomeAnalysisEngine - Strictness is SILENT
INFO 09:28:04,237 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 09:28:04,261 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02
INFO 09:28:04,343 SomaticIndelDetectorWalker - No gene annotations available
INFO 09:28:09,169 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING]
INFO 09:28:09,169 TraversalEngine - Location processed.reads runtime per.1M.reads completed total.runtime remaining
INFO 09:28:34,684 TraversalEngine - chr1:28598356 2.83e+06 30.0 s 10.6 s 0.9% 54.1 m 53.6 m
INFO 09:29:04,690 TraversalEngine - chr1:60378092 6.04e+06 60.0 s 9.9 s 2.0% 51.3 m 50.3 m
INFO 09:29:34,692 TraversalEngine - chr1:104116800 9.18e+06 90.0 s 9.8 s 3.4% 44.6 m 43.1 m
INFO 09:30:04,697 TraversalEngine - chr1:155165725 1.29e+07 2.0 m 9.3 s 5.0% 39.9 m 37.9 m
INFO 09:30:34,708 TraversalEngine - chr1:186157226 1.64e+07 2.5 m 9.1 s 6.0% 41.6 m 39.1 m
INFO 09:31:04,714 TraversalEngine - chr1:230810753 1.99e+07 3.0 m 9.0 s 7.5% 40.2 m 37.2 m
INFO 09:31:34,716 TraversalEngine - chr2:15519846 2.22e+07 3.5 m 9.5 s 8.6% 40.9 m 37.4 m
INFO 09:32:04,720 TraversalEngine - chr2:61729349 2.57e+07 4.0 m 9.4 s 10.0% 39.8 m 35.8 m
INFO 09:32:34,728 TraversalEngine - chr2:113404649 2.91e+07 4.5 m 9.3 s 11.7% 38.4 m 33.9 m
INFO 09:33:04,729 TraversalEngine - chr2:170063504 3.24e+07 5.0 m 9.3 s 13.5% 36.9 m 31.9 m
INFO 09:33:34,734 TraversalEngine - chr2:204820624 3.59e+07 5.5 m 9.2 s 14.7% 37.5 m 32.0 m
INFO 09:34:04,740 TraversalEngine - chr3:3111817 3.89e+07 6.0 m 9.3 s 16.0% 37.5 m 31.5 m
INFO 09:34:34,743 TraversalEngine - chr3:48501244 4.22e+07 6.5 m 9.2 s 17.5% 37.2 m 30.7 m
INFO 09:35:04,744 TraversalEngine - chr3:108117422 4.54e+07 7.0 m 9.2 s 19.4% 36.1 m 29.1 m
INFO 09:35:34,748 TraversalEngine - chr3:148577626 4.88e+07 7.5 m 9.2 s 20.7% 36.2 m 28.7 m
INFO 09:36:04,754 TraversalEngine - chr3:197748351 5.21e+07 8.0 m 9.2 s 22.3% 35.9 m 27.9 m
INFO 09:36:34,757 TraversalEngine - chr4:56325162 5.50e+07 8.5 m 9.3 s 24.1% 35.2 m 26.7 m
INFO 09:37:04,764 TraversalEngine - chr4:106861604 5.84e+07 9.0 m 9.3 s 25.8% 34.9 m 25.9 m
INFO 09:37:34,773 TraversalEngine - chr4:169215011 6.16e+07 9.5 m 9.3 s 27.8% 34.2 m 24.7 m
INFO 09:38:04,775 TraversalEngine - chr5:37665146 6.44e+07 10.0 m 9.3 s 29.7% 33.7 m 23.7 m
INFO 09:38:34,784 TraversalEngine - chr5:94859344 6.77e+07 10.5 m 9.3 s 31.5% 33.3 m 22.8 m
INFO 09:39:04,794 TraversalEngine - chr5:141334425 7.11e+07 11.0 m 9.3 s 33.0% 33.3 m 22.3 m
INFO 09:39:34,806 TraversalEngine - chr6:5109692 7.40e+07 11.5 m 9.3 s 34.5% 33.4 m 21.8 m
INFO 09:40:04,817 TraversalEngine - chr6:41773298 7.73e+07 12.0 m 9.3 s 35.7% 33.6 m 21.6 m
INFO 09:40:34,825 TraversalEngine - chr6:91364545 8.07e+07 12.5 m 9.3 s 37.3% 33.5 m 21.0 m
INFO 09:41:04,826 TraversalEngine - chr6:147249706 8.40e+07 13.0 m 9.3 s 39.1% 33.3 m 20.3 m
INFO 09:41:34,831 TraversalEngine - chr7:23650833 8.69e+07 13.5 m 9.3 s 40.6% 33.2 m 19.7 m
INFO 09:42:04,836 TraversalEngine - chr7:87046856 9.00e+07 14.0 m 9.3 s 42.7% 32.8 m 18.8 m
INFO 09:42:34,838 TraversalEngine - chr7:128220270 9.34e+07 14.5 m 9.3 s 44.0% 33.0 m 18.5 m
INFO 09:43:04,839 TraversalEngine - chr8:16032718 9.64e+07 15.0 m 9.3 s 45.5% 33.0 m 18.0 m
INFO 09:43:34,848 TraversalEngine - chr8:70978649 9.97e+07 15.5 m 9.3 s 47.3% 32.8 m 17.3 m
INFO 09:44:04,849 TraversalEngine - chr8:133141853 1.03e+08 16.0 m 9.3 s 49.3% 32.5 m 16.5 m
INFO 09:44:34,861 TraversalEngine - chr9:36607567 1.06e+08 16.5 m 9.4 s 50.9% 32.4 m 15.9 m
INFO 09:45:04,863 TraversalEngine - chr9:111848126 1.09e+08 17.0 m 9.3 s 53.3% 31.9 m 14.9 m
INFO 09:45:34,870 TraversalEngine - chr10:5788378 1.12e+08 17.5 m 9.4 s 54.5% 32.1 m 14.6 m
INFO 09:46:04,881 TraversalEngine - chr10:63662048 1.15e+08 18.0 m 9.4 s 56.3% 32.0 m 14.0 m
INFO 09:46:34,884 TraversalEngine - chr10:102740355 1.19e+08 18.5 m 9.3 s 57.6% 32.1 m 13.6 m
INFO 09:47:04,885 TraversalEngine - chr11:4967349 1.22e+08 19.0 m 9.4 s 58.8% 32.3 m 13.3 m
INFO 09:47:34,886 TraversalEngine - chr11:47843747 1.25e+08 19.5 m 9.4 s 60.2% 32.4 m 12.9 m
INFO 09:48:04,894 TraversalEngine - chr11:83180407 1.28e+08 20.0 m 9.3 s 61.3% 32.6 m 12.6 m
INFO 09:48:34,902 TraversalEngine - chr11:125325718 1.32e+08 20.5 m 9.3 s 62.7% 32.7 m 12.2 m
INFO 09:49:04,904 TraversalEngine - chr12:25232248 1.35e+08 21.0 m 9.4 s 63.8% 32.9 m 11.9 m
INFO 09:49:34,911 TraversalEngine - chr12:56646266 1.38e+08 21.5 m 9.4 s 64.9% 33.2 m 11.7 m
INFO 09:50:04,920 TraversalEngine - chr12:102038451 1.41e+08 22.0 m 9.4 s 66.3% 33.2 m 11.2 m
INFO 09:50:34,922 TraversalEngine - chr13:21429694 1.44e+08 22.5 m 9.4 s 68.0% 33.1 m 10.6 m
INFO 09:51:04,924 TraversalEngine - chr13:77736127 1.47e+08 23.0 m 9.4 s 69.9% 32.9 m 9.9 m
INFO 09:51:34,930 TraversalEngine - chr14:34395037 1.50e+08 23.5 m 9.4 s 72.2% 32.6 m 9.1 m
INFO 09:52:04,936 TraversalEngine - chr14:76644221 1.54e+08 24.0 m 9.4 s 73.5% 32.6 m 8.6 m
INFO 09:52:34,947 TraversalEngine - chr15:32450575 1.57e+08 24.5 m 9.4 s 75.6% 32.4 m 7.9 m
INFO 09:53:04,951 TraversalEngine - chr15:62991093 1.60e+08 25.0 m 9.4 s 76.6% 32.7 m 7.7 m
INFO 09:53:35,534 TraversalEngine - chr16:81725 1.64e+08 25.5 m 9.4 s 77.8% 32.8 m 7.3 m
INFO 09:54:05,545 TraversalEngine - chr16:48204121 1.67e+08 26.0 m 9.3 s 79.4% 32.8 m 6.7 m
INFO 09:54:35,749 TraversalEngine - chr17:69409 1.70e+08 26.5 m 9.4 s 80.8% 32.8 m 6.3 m
INFO 09:55:05,756 TraversalEngine - chr17:28791767 1.74e+08 27.0 m 9.3 s 81.7% 33.1 m 6.1 m
INFO 09:55:35,767 TraversalEngine - chr17:56399591 1.77e+08 27.5 m 9.3 s 82.6% 33.3 m 5.8 m
INFO 09:56:05,781 TraversalEngine - chr18:8370781 1.80e+08 28.0 m 9.3 s 83.7% 33.5 m 5.5 m
INFO 09:56:35,800 TraversalEngine - chr18:70149611 1.83e+08 28.5 m 9.3 s 85.7% 33.3 m 4.8 m
INFO 09:57:05,803 TraversalEngine - chr19:31039619 1.87e+08 29.0 m 9.3 s 86.9% 33.4 m 4.4 m
INFO 09:57:35,805 TraversalEngine - chr19:55119393 1.90e+08 29.5 m 9.3 s 87.7% 33.7 m 4.1 m
INFO 09:58:05,806 TraversalEngine - chr20:34599131 1.93e+08 30.0 m 9.3 s 88.9% 33.8 m 3.7 m
INFO 09:58:35,812 TraversalEngine - chr21:34721693 1.97e+08 30.5 m 9.3 s 91.0% 33.5 m 3.0 m
INFO 09:59:05,817 TraversalEngine - chr22:38336813 2.00e+08 31.0 m 9.3 s 92.6% 33.5 m 2.5 m
INFO 09:59:35,822 TraversalEngine - chrX:64028096 2.03e+08 31.5 m 9.3 s 95.1% 33.1 m 96.7 s
INFO 10:00:05,834 TraversalEngine - chrY:5605887 2.06e+08 32.0 m 9.3 s 98.3% 32.6 m 34.0 s
INFO 10:00:10,787 GATKRunReport - Uploaded run statistics report to AWS S3
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 1.4-15-gcd43f01):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
##### ERROR
##### ERROR MESSAGE: Badly formed genome loc: Parameters to GenomeLocParser are incorrect:Unknown contig chrMt
##### ERROR
I don't know if I should remove this chromosome from my fasta file or my bam files. I guess that it's present in both...

Here is my fasta.fai file:
Quote:
...
chr21 48129895 2837230949 50 51
chr22 51304566 2886323449 50 51
chrX 155270560 2938654113 50 51
chrY 59373566 3097030091 50 51
chrM 16571 3157591135 50 51
Do you know what should be done from this mitochondrial chromosome?
Jane M is offline   Reply With Quote
Old 01-20-2012, 02:28 AM   #19
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 833
Default

Picard is reporting 'chrMt', while your fasta index file suggests 'chrM'. These are different, but they shouldn't be. You should be using exactly the same FASTA file that was used for the mapping.
gringer is offline   Reply With Quote
Old 01-20-2012, 02:35 AM   #20
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default

Thanks for pointing it out ! I haven't noticed that the name are different
I cannot use the same FASTA file that was used for the mapping, because the mapping was done by a allowance, and they didn't give us this file... So, I will change the name chrM into chrMt in my fasta files.
Jane M is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:41 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO