![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Copy number variation from whole genome data | tahamasoodi | Bioinformatics | 4 | 05-05-2014 02:35 AM |
Copy number variation and synteny mapping for bacterial genomes | Bgansw | Bioinformatics | 0 | 11-11-2012 08:18 PM |
Copy number variation..on chromosome level...or ploidy with sequencing | antu82 | Illumina/Solexa | 6 | 09-21-2012 08:19 PM |
Copy number variation: read depth algorithms and BAF | shuteo | Bioinformatics | 0 | 07-27-2012 09:19 AM |
A question regarding copy number variation | JackieBadger | Bioinformatics | 0 | 07-16-2012 05:11 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: San Francisco, CA Join Date: Aug 2011
Posts: 91
|
![]()
Hi guys,
I am looking at some whole genome sequencing (WGS) for tumor-normal pairs, and I want to find somatic copy number alteration in the tumors. What tools do you guys recommend for these? I have read about a few, e.g., BIC-Seq, OncoSNP-SEQ, CREST, etc., but have no experience. Any recommendations? Thanks in advance. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Budapest Join Date: Mar 2010
Posts: 329
|
![]()
Control-FreeC is a good choice.
And an interesting article: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3875755/ |
![]() |
![]() |
![]() |
#3 |
Member
Location: San Francisco, CA Join Date: Aug 2011
Posts: 91
|
![]()
Thanks. Yeah I've read over that article just recently, and I like to see what are people's experiences with different software.
Control-FREEC seems pretty good. What's your experience with it? I also want to try out BIC-Seq, OncoSNP-SEQ, and CNAnorm. I want to see how they perform. Does anyone have any experience with those? |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: Budapest Join Date: Mar 2010
Posts: 329
|
![]()
Control-FreeC is easy to use if you are working with human or mouse model. If you working with non model-organisms it is a bit tricky, because you need to generate the mappability files. Read carefully the documentation because there is a lots of settings. But the we were happy with the results.
|
![]() |
![]() |
![]() |
#5 | |
Member
Location: San Francisco, CA Join Date: Aug 2011
Posts: 91
|
![]() Quote:
I ran Control-FREEC with a pair of simulated tumor-normal data from this Syapse thing: https://www.synapse.org/#!Synapse:syn312572/wiki/60702 It took 3 hours to complete. The *_ratio.txt file is 113M. It took 3 minutes to produce the graph. For the real tumor-normal pair (data file is slightly larger), it took 14 hours to complete. The The *_ratio.txt file is 169M. It took about an hour to finish plotting. Why that discrepancy in time? Last edited by lethalfang; 03-17-2014 at 10:43 AM. |
|
![]() |
![]() |
![]() |
#6 |
Member
Location: San Francisco, CA Join Date: Aug 2011
Posts: 91
|
![]()
Just checking, it's a bad idea to split Copy Number Analysis jobs into chromosome by chromosome, right? It would be bad for the sample power.
|
![]() |
![]() |
![]() |
#7 |
Member
Location: San Francisco, CA Join Date: Aug 2011
Posts: 91
|
![]()
I'm wondering for tumor-normal paired WGS, what are the commended settings for some of the following:
coefficientOfVariation: The test file had 0.062. The manual has an "example" of 0.05. What's a good parameter from people's experience? forceGCcontentNormalization: Does GC content normalization improve the results for tumor-normal pairs? If so, is 1 or 2 better? 1: normalize GC content first, and then calculate sample/control ratio. 2: calculate sample/control ratio first, and then normalize GC content. intercept: 1 - with GC content 0 - with a control dataset. What if I have both? minCNAlength: What's a good setting based on people's experience? Any other setting that people find particularly better for tumor-normal pairs? Thanks. |
![]() |
![]() |
![]() |
#8 | |
Senior Member
Location: Budapest Join Date: Mar 2010
Posts: 329
|
![]()
It can be anything: slow hard disk, old computer, etc.
Quote:
Last edited by TiborNagy; 03-18-2014 at 07:29 AM. Reason: spell correction |
|
![]() |
![]() |
![]() |
#9 | |
Member
Location: London Join Date: Apr 2014
Posts: 12
|
![]() Quote:
Is composed by a python script that generate a suitable file for the R-package, or alternatively VarScan2 output could be used. It was developed for exome sequencing, but with whole-genome it works even better. It's available from the institute page or from CRAN, the python script is bundled with the R package, as the documentation and example data. As usual, higher depth, and higher tumor content are a goo thing, but I managed to analyse tumor sample with relatively low depth (10x) as well samples around ~20% of tumor content with satisfying results. |
|
![]() |
![]() |
![]() |
#10 | |
Member
Location: San Francisco, CA Join Date: Aug 2011
Posts: 91
|
![]() Quote:
Is it possible to see the pre-print of your submitted paper, or do you have to wait until it's published? |
|
![]() |
![]() |
![]() |
#11 |
Member
Location: London Join Date: Apr 2014
Posts: 12
|
![]()
:-).
Well, I could but I'd like to see what the reviewers/editors have to say first. We just describe the algorithm and compare the results runing on exome with respective SNP array from TCGA, as well compare the results of other similar algirithms. Sequenza was, when not perfecly the same, was pretty close to the SNP array prediction... |
![]() |
![]() |
![]() |
#12 | |
Member
Location: San Francisco, CA Join Date: Aug 2011
Posts: 91
|
![]() Quote:
![]() |
|
![]() |
![]() |
![]() |
#13 |
Member
Location: London Join Date: Apr 2014
Posts: 12
|
![]()
Well oncoSNP-seq have a different inference implementation, plus it uses dbSNP to set heteozygous positions, while we use information from the germline.
It's written in MATLAB while we have implemented it in R, which should make it easier to use I suppose. Anyway I've started the sofware from scratch , to have something working properly with exome, whitout borrow any concept from oncoSNP, they are ptetty different from each other, I would say. I've tried to use oncoSNP-seq on exome, but it doesn't works well, as warned in the manual. Although sequenza works pretty well with wgs. If you want to perform some testing it it should be difficult to try them both and benchmark the difference. |
![]() |
![]() |
![]() |
#14 | |
Member
Location: San Francisco, CA Join Date: Aug 2011
Posts: 91
|
![]() Quote:
It's much better to have a software without having to install a proprietary 3rd-party language. |
|
![]() |
![]() |
![]() |
#15 | |
Member
Location: San Francisco, CA Join Date: Aug 2011
Posts: 91
|
![]()
Was trying
Quote:
And I got an empty output file, with only the header. No error message. The two pileups have chromosomes labeled as 1, 2, 3, ..., X, Y, M. Any idea why? Thanks. |
|
![]() |
![]() |
![]() |
#16 |
Member
Location: London Join Date: Apr 2014
Posts: 12
|
![]()
Hi lethalfang,
the chromosomes in the 2 pileups and in the GC file are in the same order, right? Also note that the pileup have to be generated with the fasta reference (-f argument), otherwise there might be problems. you could try to diminish the '-n' parameter to allow consensus position with less depth to be taken into account. The default is 20, so to be included you need to have at least 10 reads in the normal and 10 in the tumor at a given position (or any other configuration where the sum is 20). This might might be too high for low pass WGS. If you have a chance to paste part of the content of your 3 files (eg in pastebin or similar) I could have a look and see if there is something clearly wrong. EDIT: additionally you could have a look here https://bitbucket.org/ffavero/sequen...Sequenza_Utils, for tips on how to use sequenza-utils. Last edited by ffavero; 05-02-2014 at 08:22 AM. |
![]() |
![]() |
![]() |
#17 | |||
Member
Location: San Francisco, CA Join Date: Aug 2011
Posts: 91
|
![]()
Yes, they're in the same order. They were generated using the following command (the pileups were first created for VarScan2).
The reference used here is the Broad Institute's version B37, i.e., the chromosomes written as 1, 2, 3, ..., X, Y, MT. Quote:
The GC content file was generated using the hg19 version, i.e., chromosomes written as chr1, chr2, chr3, ..., chrX, chrY, chrM. I tried to create the GC content file with the B37.fa, but that b37 fasta file has a smaller number (maybe no more than 1000) of characters that are not G, C, T, A, or N. The script failed when it tried to count M, R, etc. due to dictionary key error. In any case, the GC content file shouldn't be the cause of an empty seqz file. First 5 lines of the normal pileup.gz Quote:
First 5 lines of the gc content file: Quote:
All the pileups and gc-content files are bgzipped (into gz). Hope that isn't the problem. Never mind, it may be due to the gc-content file with wrong chromosome orders. Let me fix that and try again. Last edited by lethalfang; 05-02-2014 at 02:09 PM. |
|||
![]() |
![]() |
![]() |
#18 |
Member
Location: San Francisco, CA Join Date: Aug 2011
Posts: 91
|
![]()
Update:
Simply re-ordering the chromosomes wasn't enough. I converted chrom=chr1 into chrom=1, etc, and now it's working. Because the pileup files were generated using the b37 format (i.e., chromosomes were named 1, 2, 3, ..., X, Y, MT), and the gc-content file was generated using the hg19 format (i.e., chromosomes were named chr1, chr2, chr3, ..., chrX, chrY, chrM), the chromosome names did not match. Two things: 1) The b37's fasta file has characters like M and R in the sequence. There aren't many of those, you can simply consider those as "N" in the script. Due to that, the python script failed trying to generate a gc-content file from b37 fasta file. 2) I guess you can modify the python script, so it doesn't matter which chromosome formats were used. Last edited by lethalfang; 05-02-2014 at 02:09 PM. |
![]() |
![]() |
![]() |
#19 |
Member
Location: San Francisco, CA Join Date: Aug 2011
Posts: 91
|
![]()
By the way, in the user guide (http://cran.r-project.org/web/packag...s/sequenza.pdf), you had "-r" to flag normal.pileup and "-s" to flag tumor.pileup.
|
![]() |
![]() |
![]() |
#20 |
Member
Location: London Join Date: Apr 2014
Posts: 12
|
![]()
Ops, you are right!
That's because from version 1.* to 2.0 I've changed all the arguments from reference/sample (-r/-s) to, more on-topic withe cancer research, normal/tumor (-n/-t). I was carefull to change it everywere, but clearly not there. I haven't test it with b37 fasta, I will add a way to handle M and R. Thanks for taking this to my attention! Both your points were really relevant. |
![]() |
![]() |
![]() |
Tags |
cancer, cna, cnv, copy number, somatic |
Thread Tools | |
|
|