Seqanswers Leaderboard Ad

**Heisman** · 06-05-2012, 09:18 PM

Your pipeline is fine. Pretty much identical to the one you'll see if you search "exome sequencing manual". You can use samtools/GATK to call snps or look into Varscan; lots of options.

**shyam_la** · 06-06-2012, 08:53 AM

Thanks for your reply Heisman.

I don't think GATK/Samtools can analyse tumour-normal pairs, can they? Varscan2 sounds good.. I will check it out.. Also found MuTect (beta) from Broad Institute. Any experience with it??

**shyam_la** · 06-06-2012, 09:46 AM

Just ran MuTect with my data:

E:\Exome>java -Xmx1g -jar MuTect\mutect.jar --analysis_type MuTect --reference_s
equence UCSChg19\ucsc.hg19.fasta -B:cosmic,VCF Mutect\hg19_cosmic.vcf -B:dbsnp,V
CF ucschg19\dbsnp_135.hg19.vcf --input_file:normal P01_normal_ready.bam --input_
file:tumor P01_cancer_ready.bam --out call_stats.out --coverage_file coverage.wi
g.txt
INFO 10:38:26,672 HelpFormatter - ---------------------------------------------
------------------------------------
INFO 10:38:26,682 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.1-37-g5
cedb2d, Compiled 2011/09/14 10:01:32
INFO 10:38:26,683 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 10:38:26,683 HelpFormatter - Please view our documentation at http://www.b
roadinstitute.org/gsa/wiki
INFO 10:38:26,683 HelpFormatter - For support, please view our support site at

503 Service Temporarily Unavailable

http://getsatisfaction.com/gsa

INFO 10:38:26,684 HelpFormatter - Program Args: --analysis_type MuTect --refere
nce_sequence UCSChg19\ucsc.hg19.fasta -B:cosmic,VCF Mutect\hg19_cosmic.vcf -B:db
snp,VCF ucschg19\dbsnp_135.hg19.vcf --input_file:normal P01_normal_ready.bam --i
nput_file:tumor P01_cancer_ready.bam --out call_stats.out --coverage_file covera
ge.wig.txt
INFO 10:38:26,684 HelpFormatter - Date/Time: 2012/06/06 10:38:26
INFO 10:38:26,684 HelpFormatter - ---------------------------------------------
------------------------------------
INFO 10:38:26,686 HelpFormatter - ---------------------------------------------
------------------------------------
INFO 10:38:26,707 GenomeAnalysisEngine - Strictness is SILENT
INFO 10:38:27,308 RMDTrackBuilder - Loading Tribble index from disk for file Mu
tect\hg19_cosmic.vcf
INFO 10:38:27,862 RMDTrackBuilder - Loading Tribble index from disk for file uc
schg19\dbsnp_135.hg19.vcf
##### ERROR --------------------------------------------------------------------
----------------------
##### ERROR A USER ERROR has occurred (version 1.1-37-g5cedb2d):
##### ERROR The invalid arguments or inputs must be corrected before the GATK ca
n proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowabl
e command-line arguments.
##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute
.org/gsa/wiki
##### ERROR Visit our forum to view answers to commonly asked questions http://g
etsatisfaction.com/gsa
##### ERROR
##### ERROR MESSAGE: Input files cosmic and reference have incompatible contigs:
No overlapping contigs found.
##### ERROR cosmic contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1
5, 16, 17, 18, 19, 20, 21, 22]
##### ERROR reference contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr
7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, ch
r19, chr20, chr21, chr22, chrX, chrY, chr1_gl000191_random, chr1_gl000192_random
, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr6_apd_hap1, chr
6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ss
to_hap7, chr7_gl000195_random, chr8_gl000196_random, chr8_gl000197_random, chr9_
gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_rando
m, chr11_gl000202_random, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204
_random, chr17_gl000205_random, chr17_gl000206_random, chr18_gl000207_random, ch
r19_gl000208_random, chr19_gl000209_random, chr21_gl000210_random, chrUn_gl00021
1, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl00021
6, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl00022
1, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl00022
6, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl00023
1, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl00023
6, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl00024
1, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl00024
6, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249]
##### ERROR --------------------------------------------------------------------
----------------------

Any ideas on how to solve this error??

**SeekAnswers** · 06-06-2012, 10:55 AM

One of the variant files has chr1, chr2... as contigs and other has 1,2... without the "chr". Other than that it could also complain if the number of contigs in one variant file doesn't match the other.

**shyam_la** · 06-06-2012, 11:59 AM

Seek Answers - yeah, but how to FIX it? Without starting from scratch that is..

I am doing my analysis on a laptop pc and reprocessing my reads with a new reference genome to match the reference genome that MuTect understands is going to take like forever. Looking for a easy way, if anybody knows one..

**SeekAnswers** · 06-06-2012, 12:15 PM

You could try modifying the file using sed to get rid of the 'chr' character.

samtools view -h <Input.bam> | sed 's/chr//g' > modified.bam

Not 100% sure though, you could try on one chromosome to see if it works.

**swbarnes2** · 06-06-2012, 01:10 PM

Why is your reference sequence not the same all the way throuough out?

Whatever you used for the alignment, it only had chr 1-22. So use that reference all the way throughout, rather that switching to a new one that has all those other partial chromosomes.

**shyam_la** · 06-06-2012, 02:24 PM

swbarnes - My reference has been the same all the way from the beginning. Its the UCSC hg19 build got from here ftp://ftp.broadinstitute.org/bundle/1.5/hg19/ (GATK resource bundle).. So, what I used for alignment did not have only chr 1-22, as you seem to have understood. It had all the haps and randoms too. Everything ran perfectly through my processing pipeline (described in the first post in this thread)..

But when I attempted to do variant calling with MuTect (beta) this error showed up, because one of the input files that they provided for use with the caller (hg19_cosmic.vcf, as seen in the java command line above) has different contigs, from the reference I used..

One way to solve the issue is to redo my pipeline with the reference build (also hg19, strangely) that MuTect provides, which presumably has contigs named 1, 2, 3... 22. But I don't have the computing power to do that without wasting a lot of time..

**shyam_la** · 06-06-2012, 02:56 PM

SeekAnswers - that would modify my input files, right? What about the extra contigs? Is there are a way to delete/remove, the M, X, Y, haps and randoms from .bam files, selectively?? (Though I doubt thats recommended, even if possible)

**swbarnes2** · 06-06-2012, 03:11 PM

Fix the cosmic vcf chromosome nomenclature, don't try to fix your .bam

**shyam_la** · 06-06-2012, 03:17 PM

I am new to NGS. How??

**xhyuo** · 07-03-2012, 11:27 AM

Originally posted by shyam_la View Post

I am new to NGS. How??

i came across the same problem!

GATK requires consistency in the reference ordering and names.
Using the Broad reference genome for alignments:

ftp://ftp.broadinstitute.org/pub/seq...sembly19.fasta

Guess you will be fine!

**shyam_la** · 07-03-2012, 11:38 AM

Originally posted by xhyuo View Post

i came across the same problem!

GATK requires consistency in the reference ordering and names.
Using the Broad reference genome for alignments:

ftp://ftp.broadinstitute.org/pub/seq...sembly19.fasta

Guess you will be fine!

Thank you! I have made considerable progress now.. That was a month back!!
I finally used GRCh37.67 from ensembl and "cat" the chromosomes together - got rid of the haps and randoms, given that they are not particularly useful..

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 45 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 46 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 39 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Need advice on whole exome sequence analysis..

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News