SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   ONCOCNV: a method to extract CNAs from amplicon (or targeted) sequencing data (http://seqanswers.com/forums/showthread.php?t=50211)

valeu 02-10-2015 08:30 AM

ONCOCNV: a method to extract CNAs from amplicon (or targeted) sequencing data
 
We are happy to present ONCOCNV, a method to detect copy number alterations in amplicon or targeted sequencing data. The method can be applied to exome-seq data as well, but it will not adjust the profiles for contamination by normal cells or evaluate genotypes (LOH).

ONCOCNV was developed by OncoDNA with the collaboration with the Bioinformatics Laboratory of Institut Curie (Paris). It automatically computes, normalizes, segments copy number profiles, then calls copy number alterations. The user can provide any number of control samples in order to construct the baseline. However, we recommend to use at least three control samples. The more the better :)

Webpage: http://oncocnv.curie.fr/
Publication: Boeva,V. et al. (2014) Multi-factor data normalization enables the detection of copy number aberrations in amplicon sequencing data. Bioinformatics, 30(24):3443-3450. Link

Input for CNA detection: aligned single-end or paired-end data in the BAM format.
Output: Annotation of genes with copy number changes + visualization of the profile (.png).

Paper abstract:
MOTIVATION:
Because of its low cost, amplicon sequencing, also known as ultra-deep targeted sequencing, is now becoming widely used in oncology for detection of actionable mutations, i.e. mutations influencing cell sensitivity to targeted therapies. Amplicon sequencing is based on the polymerase chain reaction amplification of the regions of interest, a process that considerably distorts the information on copy numbers initially present in the tumor DNA. Therefore, additional experiments such as single nucleotide polymorphism (SNP) or comparative genomic hybridization (CGH) arrays often complement amplicon sequencing in clinics to identify copy number status of genes whose amplification or deletion has direct consequences on the efficacy of a particular cancer treatment. So far, there has been no proven method to extract the information on gene copy number aberrations based solely on amplicon sequencing.
RESULTS:
Here we present ONCOCNV, a method that includes a multifactor normalization and annotation technique enabling the detection of large copy number changes from amplicon sequencing data. We validated our approach on high and low amplicon density datasets and demonstrated that ONCOCNV can achieve a precision comparable with that of array CGH techniques in detecting copy number aberrations. Thus, ONCOCNV applied on amplicon sequencing data would make the use of additional array CGH or SNP array experiments unnecessary.

xxqtony 08-24-2015 09:16 AM

Hi Valeu,
I wonder if you can help me on this. I tried to run your ONCOCNV v6.1, with the test running, I got the error of Error in file(file, "rt") : cannot open the connection. Then the program quits.
More details are shown below.
Thanks!
-Tony

=====================================================
$ ./RUNME.sh
Package 'mclust' version 5.0.2
Type 'citation("mclust")' for citing this R package in publications.
Warning: you have both male and female samples in the control. We will try to assign sex using read coverage on chrX
0.5 0.5 0.5 1 1 0.5 0.5 1 1 0.5 1 1 0.5 1 1
Centering
Whitening
Symmetric FastICA using logcosh approx. to neg-entropy function
Iteration 1 tol=0.354678
Iteration 2 tol=0.401774
Iteration 3 tol=0.485327
Iteration 4 tol=0.644948
Iteration 5 tol=0.960465
Iteration 6 tol=0.518261
Iteration 7 tol=0.071013
Iteration 8 tol=0.006314
Iteration 9 tol=0.004754
Iteration 10 tol=0.004012
Iteration 11 tol=0.003472
Iteration 12 tol=0.004472
Iteration 13 tol=0.005501
Iteration 14 tol=0.005915
Iteration 15 tol=0.005593
Iteration 16 tol=0.004960
Iteration 17 tol=0.004012
Iteration 18 tol=0.002834
Iteration 19 tol=0.001732
Iteration 20 tol=0.001043
Iteration 21 tol=0.000649
Iteration 22 tol=0.000395
Iteration 23 tol=0.000245
Iteration 24 tol=0.000161
Iteration 25 tol=0.000133
Iteration 26 tol=0.000131
Iteration 27 tol=0.000133
Iteration 28 tol=0.000138
Iteration 29 tol=0.000146
Iteration 30 tol=0.000157
Iteration 31 tol=0.000171
Iteration 32 tol=0.000186
Iteration 33 tol=0.000203
Iteration 34 tol=0.000221
Iteration 35 tol=0.000239
Iteration 36 tol=0.000256
Iteration 37 tol=0.000272
Iteration 38 tol=0.000285
Iteration 39 tol=0.000294
Iteration 40 tol=0.000299
Iteration 41 tol=0.000298
Iteration 42 tol=0.000290
Iteration 43 tol=0.000276
Iteration 44 tol=0.000257
Iteration 45 tol=0.000233
Iteration 46 tol=0.000206
Iteration 47 tol=0.000179
Iteration 48 tol=0.000151
Iteration 49 tol=0.000128
Iteration 50 tol=0.000107
Iteration 51 tol=0.000088
Iteration 52 tol=0.000072
Iteration 53 tol=0.000058
Iteration 54 tol=0.000047
Iteration 55 tol=0.000037
Iteration 56 tol=0.000030
Iteration 57 tol=0.000024
Iteration 58 tol=0.000019
Iteration 59 tol=0.000015
Iteration 60 tol=0.000012
Iteration 61 tol=0.000010
Iteration 62 tol=0.000008
Iteration 63 tol=0.000006
Iteration 64 tol=0.000005
Iteration 65 tol=0.000004
Iteration 66 tol=0.000003
Iteration 67 tol=0.000003
Iteration 68 tol=0.000002
Iteration 69 tol=0.000002
Iteration 70 tol=0.000001
Iteration 71 tol=0.000001
Iteration 72 tol=0.000001
Explained variance by the first pronicpal components of PCA:Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10 Comp.11 Comp.12 Comp.13 Comp.14 Comp.150.7874551 0.8498553 0.9053619 0.9284231 0.9443718 0.9596778 0.9682324 0.9741502 0.9794208 0.983909 0.9881928 0.991948 0.9950968 0.9978144 1null device
1
Package 'mclust' version 5.0.2
Type 'citation("mclust")' for citing this R package in publications.
PSCBS v0.44.0 (2015-02-22) successfully loaded. See ?PSCBS for help.

Attaching package: ‘PSCBS’

The following objects are masked from ‘package:base’:

append, load

R.cache v0.10.0 (2014-06-10) successfully loaded. See ?R.cache for help.
Loading required package: lattice
Loading required package: grid
Loading required package: parallel
Error in file(file, "rt") : cannot open the connection
Calls: read.table -> file
In addition: Warning message:
In file(file, "rt") :
cannot open file './Test.stats.txt': No such file or directory
Execution halted

xxqtony 08-24-2015 09:49 AM

I also tried with my own data, and finally managed to get the program run, however the results are not expected. I have 4x CNV regions/amplicons, but they get 2x prediction. I wonder if there's anything I missed.
Thanks.

valeu 08-26-2015 04:48 AM

For the test dataset, do you see that './Test.stats.txt' has been created?

For the second dataset, I don't understand what is wrong.

Please, contact me by email.

bioWizz 03-30-2016 05:23 AM

Hi Valeu,

I have 10 sample 8 test and 2 control for which I am trying to run oncocnv v6.4. I have configured ONCOCNV.sh file as per instructions given. It is throwing following error.

Quote:

Detected 2 control sample(s)
reading 11.bam
sample name: 11
read 100000 reads
read 200000 reads
read 300000 reads
read 400000 reads
read 500000 reads
reading 12.bam
sample name: 12
read 100000 reads
read 200000 reads
read 300000 reads
read 400000 reads
Total target length: 272944
processed 2 controls, 11 12
Illegal division by zero at
/san2/mallya/exome_cnv/anantha/unmapped/oncocnv/ONCOCNV//ONCOCNV_getCounts.v6.4.pl line 466 (#1)
(F) You tried to divide a number by 0. Either something was wrong in
your logic, or you need to put a conditional in to guard against
meaningless input.

Uncaught exception from user code:
Illegal division by zero at /san2/mallya/exome_cnv/anantha/unmapped/oncocnv/ONCOCNV//ONCOCNV_getCounts.v6.4.pl line 466.
at /san2/mallya/exome_cnv/anantha/unmapped/oncocnv/ONCOCNV//ONCOCNV_getCounts.v6.4.pl line 466.

------------------------


--Coordinates are read--


------------------------

Total target length: 0
Detected 8 tumor sample(s)
reading 1.bam
reading 2.bam
reading 3.bam
reading 4.bam
reading 5.bam
reading 6.bam
reading 7.bam
reading 8.bam
Error: The requested bed file (/san2/mallya/exome_cnv/anantha/unmapped/oncocnv/result//target.bed) could not be opened. Exiting!
Any suggestions to proceed further?

Thanks

valeu 04-25-2016 01:39 AM

I believe something is wrong with your .bed file with regions. Please check the readme.

arnoldliao 08-16-2016 01:51 PM

span is too small
 
Anyone experience this issue
Error in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
span is too small

I assume this is because my bed file contain sections where there are no reads? Outputs at https://drive.google.com/file/d/0B4x...ew?usp=sharing

*** rest of stdout ***
Calls: loess -> simpleLoess
Execution halted
Package 'mclust' version 5.2
Type 'citation("mclust")' for citing this R package in publications.
PSCBS v0.61.0 (2016-02-03) successfully loaded. See ?PSCBS for help.

Attaching package: 'PSCBS'

The following objects are masked from 'package:base':

append, load

R.cache v0.12.0 (2015-11-12) successfully loaded. See ?R.cache for help.
Loading required package: lattice
Loading required package: grid
Loading required package: parallel
Error in file(file, "rt") : cannot open the connection
Calls: read.table -> file
In addition: Warning message:
In file(file, "rt") :
cannot open file '/apps/outputDEEPCNA//Control.stats.Processed.txt': No such file or directory
Execution halted

arnoldliao 08-16-2016 02:08 PM

perhaps switching out https://stat.ethz.ch/R-manual/R-deve...tml/loess.html
for https://stat.ethz.ch/R-manual/R-deve.../html/rlm.html ?

sachin 01-20-2017 01:25 AM

Quote:

Originally Posted by arnoldliao (Post 197848)
Anyone experience this issue
Error in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, :
span is too small

I assume this is because my bed file contain sections where there are no reads? Outputs at https://drive.google.com/file/d/0B4x...ew?usp=sharing

*** rest of stdout ***
Calls: loess -> simpleLoess
Execution halted
Package 'mclust' version 5.2
Type 'citation("mclust")' for citing this R package in publications.
PSCBS v0.61.0 (2016-02-03) successfully loaded. See ?PSCBS for help.

Attaching package: 'PSCBS'

The following objects are masked from 'package:base':

append, load

R.cache v0.12.0 (2015-11-12) successfully loaded. See ?R.cache for help.
Loading required package: lattice
Loading required package: grid
Loading required package: parallel
Error in file(file, "rt") : cannot open the connection
Calls: read.table -> file
In addition: Warning message:
In file(file, "rt") :
cannot open file '/apps/outputDEEPCNA//Control.stats.Processed.txt': No such file or directory
Execution halted

Hi,
I got the same error. Resolved, problem was with the bam file.

Sachin A

arnoldliao 06-14-2017 08:14 AM

mclust error
 
I figured it out. My be file contain only chr,start,end while ONCOCONV needed chr,start,end,name,score,geneName

I'm getting an error
Error in if (minFrac < minFractionOfShortOrLongAmplicons & maxFrac < minFractionOfShortOrLongAm
missing value where TRUE/FALSE needed

Any idea where I can start to debug? Use a smaller bed file?

stderr


/outputDEEPCNA//Test.stats.txt was created
-rw-rw-rw- 1 root root 29 Jun 14 06:26 /outputDEEPCNA//Test.stats.txt
creating target.bed
-rw-rw-rw- 1 root root 0 Jun 14 06:26 /outputDEEPCNA//target.bed
creating target.GC.txt
..Oops.. File /outputDEEPCNA//target.fasta is empty!
..It seems that there is not 'chr' prefixes in your reference genome fasta file..
..But no worries! OncoCNV will adjust for it
-rw-rw-rw- 1 root root 10 Jun 14 06:27 /outputDEEPCNA//target.GC.txt
running processControl.R
running processSamples.R

Package 'mclust' version 5.3
Type 'citation("mclust")' for citing this R package in publications.
Error in if (minFrac < minFractionOfShortOrLongAmplicons & maxFrac < minFractionOfShortOrLongAm
missing value where TRUE/FALSE needed
Execution halted
ls: cannot access '/outputDEEPCNA//Control.stats.Processed.txt': No such file or directory
Package 'mclust' version 5.3
Type 'citation("mclust")' for citing this R package in publications.
PSCBS v0.62.0 (2016-11-10) successfully loaded. See ?PSCBS for help.

valeu 07-05-2017 05:55 AM

Does your new (tab-delimited) .bed file satisfy the requirements listed in the OncoCNV manual?

Check formats:
o reads should be given in .BAM format
o amplicon coordinates should be given in .bed format (with or without the headline) and have amplicon ID in column 4 and gene symbol in column 6, e.g.: chr1 2488068 2488201 AMPL223847 0 TNFRSF14

It is mandatory to provide gene names in the 6th column.

VERY IMPORTANT

Please make sure that:
- There is no duplicates in the coordinates
- Coordinates are sorted
- Gene names are gene names in the sense that corresponding amplicons fall in the same genomic locus and not on different chromosomes
- Gene names cannot be the same as amplicon names or IDs because ONCOCNV assumes to have several amplicons per gene

arnoldliao 07-05-2017 08:20 AM

Thank you
 
Merci for the reply, I got it to work with a correct bed file. I did get many Na . I will email you separately on the issues.

bbz 01-16-2018 06:29 PM

oncoCNV trainner samples
 
should the control samples do the same library prep with samples which need to call CNV?
if I had an amplicon library want to call CNV then whether the control samples should also do an amplicon sequencing? or just use the database samples.

thank you!

aron_iral 02-24-2018 01:03 AM

What does the copy.number column in the summary.txt file represent? That is what does the 1,2 and 2.5 copy.number represent in the summary.txt file?


All times are GMT -8. The time now is 10:43 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.