Seqanswers Leaderboard Ad

**lh3** · 08-12-2011, 05:36 AM

Three suggestions:

a) map your reads to:

ftp://ftp-trace.ncbi.nih.gov/1000gen...mbly_sequence/

b) use two distinct mappers (e.g. bwaaln and bwa-sw), call mutations separately and then take the intersection.

c) (optional) you may consider samtools contrast caller.

**sdvie** · 08-16-2011, 04:53 AM

I think the question was more referring to how to discover the differences between paired normal and tumor samples from one single patient.

The multi-sample vcf allows the following information:

- It lists the variant positions and changes detected in any of the two samples.
- In the columns after the FORMAT column, normal and tumor in the example, it displays the stats for each of the variants in the two samples. Obviously, there are variants that are recorded in both normal and tumor, and there are also those that are recorded in the normal sample only and not in the tumor and vice versa:
- Variants present in both "normal" and "tumoral" may be interpreted as "germline" variants that are present in all cells of the patient, tumor or not.
- Then, we can obtain a rough selection of "tumor-specific" variants (those that are recorded only in "tumor").
- Variants that appear in "normal" only, may point at deletions of this region in the tumor sample, given sufficient read coverage at the region of interest.

However, the multi-sample vcf does not give the statistical information described in FORMAT for the respective sample that does NOT contain the variant (entry is ./. ).

Maybe it would help to run the Unified Genotyper without distinguishing between the two samples, in order to get the overall genotype stats, but I am not sure about that.

Any opinion on / experience with that?

**LI Jia** · 01-21-2013, 04:58 AM

There are my command lines for calling snps and indels with GATK. It did work well for me and might be useful for you.

bwa aln -t 4 ucsc.hg19 /home/li/prostate_cancer_RNA/ERR031030_1.fastq > ERR031030_1.sai

bwa aln -t 4 ucsc.hg19 /home/li/prostate_cancer_RNA/ERR031030_2.fastq > ERR031030_2.sai

bwa sampe -r "@RG\tID:ERR031030\tLB:ERR031030\tSM:ERR031030\tPL:ILLUMINA" ucsc.hg19 ERR031030_1.sai ERR031030_2.sai /home/li/prostate_cancer_RNA/ERR031030_1.fastq /home/li/prostate_cancer_RNA/ERR031030_2.fastq > ERR031030.sam

java -Xmx8g -Djava.io.tmpdir=/tmp -jar /home/li/prostate_cancer_RNA/picard-tools-1.82/SortSam.jar SO=coordinate INPUT=ERR031030.sam OUTPUT=ERR031030.bam VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true

java -Xmx8g -Djava.io.tmpdir=/tmp -jar /home/li/prostate_cancer_RNA/picard-tools-1.82/MarkDuplicates.jar INPUT=ERR031030.bam OUTPUT=ERR031030.marked.bam METRICS_FILE=metrics CREATE_INDEX=true VALIDATION_STRINGENCY=LENIENT

java -Xmx8g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T RealignerTargetCreator -R ucsc.hg19.fasta -o ERR031030.bam.list -I ERR031030.marked.bam
java -Xmx8g -Djava.io.tmpdir=/tmp -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -I ERR031030.marked.bam -R ucsc.hg19.fasta -T IndelRealigner -targetIntervals ERR031030.bam.list -o ERR031030.marked.realigned.bam

java -Xmx8g -Djava.io.tmpdir=/tmp -jar /home/li/prostate_cancer_RNA/picard-tools-1.82/FixMateInformation.jar INPUT=ERR031030.marked.realigned.bam OUTPUT=ERR031030.marked.realigned.fixed.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true

java -Xmx8g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T BaseRecalibrator -R ucsc.hg19.fasta -I ERR031030.marked.realigned.fixed.bam -knownSites Mills_and_1000G_gold_standard.indels.hg19.vcf -knownSites 1000G_phase1.indels.hg19.vcf -knownSites dbsnp_135.hg19.vcf -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov ContextCovariate -o ERR031030.recal_data.grp

java -Xmx8g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T PrintReads -R ucsc.hg19.fasta -I ERR031030.marked.realigned.fixed.bam -BQSR ERR031030.recal_data.grp --out ERR031030.marked.realigned.fixed.recal.bam

java -Xmx16g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -glm BOTH -R ucsc.hg19.fasta -T UnifiedGenotyper -I ERR031029.marked.realigned.fixed.recal.bam -I ERR031030.marked.realigned.fixed.recal.bam -D dbsnp_135.hg19.vcf -o ERR031030.raw.snps.indels.vcf -metrics snps.metrics -stand_call_conf 50.0 -stand_emit_conf 10.0 -dcov 1000

java -Xmx8g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T SelectVariants --variant ERR031030.raw.snps.indels.vcf -o ERR031030.snpsonly.vcf -selectType SNP

java -Xmx8g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T SelectVariants --variant ERR031030.raw.snps.indels.vcf -o ERR031030.indelsonly.vcf -selectType INDEL

java -Xmx8g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T VariantRecalibrator -R ucsc.hg19.fasta -input ERR031030.snpsonly.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.hg19.vcf -resource

mni,known=false,training=true,truth=false,prior=12.0 1000G_phase1.indels.hg19.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=6.0 dbsnp_135.hg19.vcf -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an MQ -mode SNP -recalFile ERR031030.snp.recal.vcf -tranchesFile ERR031030.snp.tranches.vcf -rscriptFile ERR031030.plots.R

java -Xmx8g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T ApplyRecalibration -input ERR031030.snpsonly.vcf -tranchesFile ERR031030.snp.tranches.vcf -recalFile ERR031030.snp.recal.vcf -o ERR031030.snps.filtered.vcf

java -Xmx16g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T VariantFiltration --variant ERR031030.snps.filtered.vcf -o ERR031030.final.filtered.vcf --filterName "Nov28filters && QD < 2.0 && ReadPosRankSum < -8.0 && MQ < 40.0 && FS > 60.0 && MQRandkSum < -12.5" --filterExpression "HaplotypeScore > 13.0"

By the way, to obtain a rough selection of "tumor-specific" variants (those that are recorded only in "tumor"), what command line do you use? any suggestion?

**david.tamborero** · 01-21-2013, 12:47 PM

Hi,

I'm far (honestly) from being an expert in this issue, but as far as I know, if you perform the mutation call for the tumor sample and for the normal sample separately, once you have the final set of mutations for each then you simply substract one from the other to get the somatic ones.

Of course, there is the option to use a variant caller which is already intended to deal with normal/tumor sample experiments. I've used Varscan, and seems pretty good for SNPs (not so much for indels), altough as I told you I'm not an expert in the field and for instance I was not able to use some of the filters that the second release of the tool made available. I also heard good things about SomaticSniper, but I did not use this one. At the end of the day, I think that this issue is far from being solved, and what many people do is to combine the results of several methods to obtain a list of highly confident somatic mutations.

Hope it helps,
best
David

**id0** · 08-08-2013, 09:46 AM

Doing variant calls separately and then looking at the different will produce a large number of false positives. You really should be using somatic variant callers like MuTect or SomaticSniper that look at both the tumor and normal samples at the same time.

**ymc** · 08-09-2013, 02:05 AM

Originally posted by id0 View Post

Doing variant calls separately and then looking at the different will produce a large number of false positives. You really should be using somatic variant callers like MuTect or SomaticSniper that look at both the tumor and normal samples at the same time.

Do I still need to use two aligners as suggested by lh3?

Topics	Statistics	Last Post
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, Today, 06:57 AM	0 responses 9 views 0 likes	Last Post by seqadmin Today, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, Yesterday, 07:17 AM	0 responses 14 views 0 likes	Last Post by seqadmin Yesterday, 07:17 AM
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, 04-30-2024, 12:17 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-30-2024, 12:17 PM

Seqanswers Leaderboard Ad

Announcement

Pipeline to find somatic mutations

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News