![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
FREEC GC content file | jorge | Bioinformatics | 9 | 08-27-2014 01:27 AM |
Copy number analysis using 454 data | ps376 | Bioinformatics | 3 | 10-08-2011 05:56 AM |
Webinar on Quality Control of NGS Data - FREE | Strand SI | Events / Conferences | 0 | 09-09-2011 07:33 PM |
PubMed: Control-free calling of copy number alterations in deep-sequencing data using | Newsbot! | Literature Watch | 0 | 04-08-2011 02:10 AM |
way to normalize copy number data for small RNAs/miRNAs? | vebaev | Bioinformatics | 2 | 03-28-2011 03:18 AM |
![]() |
|
Thread Tools |
![]() |
#41 |
Junior Member
Location: Liege, Belgium Join Date: Jun 2011
Posts: 4
|
![]()
Hello everyone,
I have been trying out Control-FREEC with some test data (exome samples), and I encountered an error when trying to specify a target BED file. Basically, Control-FREEC seems to run fine, whether I use a control sample or not (I tried both options), but when I add these lines : Code:
[target] captureRegions = /home/volatile/swe/exomes/TruSeq-for-FREEC.bed Code:
FREEC v5.9 (Control-FREEC v2.9) : calling copy number alterations and LOH regions using deep-sequencing data ..Using 1 process(es) ..Minimal CNA length (in windows) was set to 4 ..consider the sample being male ..breakPointThreshold set to 0.8 ..Polynomial degree for "ReadCount ~ GC-content" or "Sample ReadCount ~ Control ReadCount" is 3 ..FREEC is not going to output normalized copy number profiles into a BedGraph file. Use "[general] BedGraphOutput=TRUE" if you want a BedGraph file ..FREEC is not going to adjust profiles for a possible contamination by normal cells ..Output directory: /home/volatile/swe/2013-01-10/Test-FREEC5 ..Directory with files containing chromosome sequences: /home/genmol/genomes/homo_sapiens/hg19/chromosomes ..Sample file: /home/volatile/swe/exomes/exome2.bam ..Sample input format: BAM ..will use this instance of samtools: samtools to read BAM files ..Control file: /home/volatile/swe/exomes/exome1.bam ..Input format for the control file: BAM ..File with chromosome lengths: hg19.len ..Coefficient Of Variation set equal to 0.062 ..Note, this coefficient won't be used if "window" is set ..File hg19.len was read total genome size: 3.09568e+09 ..samtools should be installed to be able to read BAM files read number: 76963934 coefficientOfVariation: 0.062 evaluated window size: 10464 ..Starting reading /home/volatile/swe/exomes/exome2.bam ..samtools should be installed to be able to read BAM files; will use the following command for samtools: samtools view /home/volatile/swe/exomes/exome2.bam 76963934 lines read.. 75080830 reads used to compute copy number profile printing counts into /home/volatile/swe/2013-01-10/Test-FREEC5/exome2.bam_sample.cpn ..Window size: 10464 ..Will use hg19.len to calculate RC for the control sample ..File hg19.len was read ..Starting reading /home/volatile/swe/exomes/exome1.bam ..samtools should be installed to be able to read BAM files; will use the following command for samtools: samtools view /home/volatile/swe/exomes/exome1.bam 51311982 lines read.. 50082356 reads used to compute copy number profile printing counts into /home/volatile/swe/2013-01-10/Test-FREEC5/exome1.bam_control.cpn ..FREEC will take into account only regions from /home/volatile/swe/exomes/TruSeq-for-FREEC.bed ..Mappability and GC-content won't be used ..Control-FREEC won't use minimal mappability. All windows overlaping capture regions will be considered ..Reading /home/volatile/swe/exomes/TruSeq-for-FREEC.bed ..Your file must be in .BED format, and it must be sorted ..Reading capture for chromosome 1 ..Reading capture for chromosome 2 ..Reading capture for chromosome 3 ..Reading capture for chromosome 4 ..Reading capture for chromosome 5 ..Reading capture for chromosome 6 ..Reading capture for chromosome 7 ..Reading capture for chromosome 8 ..Reading capture for chromosome 9 ..Reading capture for chromosome 10 ..Reading capture for chromosome 11 ..Reading capture for chromosome 12 ..Reading capture for chromosome 13 ..Reading capture for chromosome 14 ..Reading capture for chromosome 15 ..Reading capture for chromosome 16 ..Reading capture for chromosome 17 ..Reading capture for chromosome 18 ..Reading capture for chromosome 19 ..Reading capture for chromosome 20 ..Reading capture for chromosome 21 ..Reading capture for chromosome 22 ..Reading capture for chromosome X ..Reading capture for chromosome Y file /home/volatile/swe/exomes/TruSeq-for-FREEC.bed is read ..Setting read counts to Zero for all windows outside of capture ..Total size of captured regions 6.18842e+07bp ..processing chromosome 1 ..processing chromosome 2 ..processing chromosome 3 ..processing chromosome 4 ..processing chromosome 5 ..processing chromosome 6 ..processing chromosome 7 ..processing chromosome 8 ..processing chromosome 9 ..processing chromosome 10 ..processing chromosome 11 ..processing chromosome 12 ..processing chromoso..At this point you need to profide window size, option 'window' in group of parameters [general] in your config file me 13 ..processing chromosome 14 ..processing chromosome 15 ..processing chromosome 16 ..processing chromosome 17 ..processing chromosome 18 ..processing chromosome 19 ..processing chromosome 20 ..processing chromosome 21 ..processing chromosome 22 ..processing chromosome X ..processing chromosome Y ..telocenromeric set to 1 since it is a minimal capture region I formatted my BED file as follows: chr start end (tab-delimited), and it's ordered by chr (chr1, chr2, ... chr22, chrX, chrY), and then by start position. Am I doing something wrong here? Thanks in advance. Regards, Stephane PS : Since samtools' pileup function is now deprecated, it's not possible to generate pileup files anymore. Do you plan on supporting BAM or VCF files as input for the BAF calculation function? Or do you know how I can work around this limitation? Thanks. Last edited by stephwen; 01-10-2013 at 05:08 AM. Reason: added question about BAM or VCF support for BAF calculation |
![]() |
![]() |
![]() |
#42 |
Member
Location: Paris Join Date: Sep 2008
Posts: 69
|
![]()
You need to define window size (window=1000) and you have to run it with a control dataset when you use the "target" option
|
![]() |
![]() |
![]() |
#43 |
Member
Location: Melbourne (Victoria) Australia Join Date: Sep 2011
Posts: 30
|
![]()
Hi Valeu,
This is Fernando again. I have re-run Freec on one of my samples where I previously run CNA analysis from a SAM file (unsorted, I use the FR mateOrientation parameter). The difference this time was that I wanted to run CNA + BAF analyses. To run BAF I first created a pileup from the sample SAM file and then run it using exactly the same parameters. Even though that the results look graphically the same (R created plots), when I compared the CNVs text files produced by both analyses the results look slightly different. The differences are seen in the start and end position (the regions are roughfly the same) and in terms the copy number predicted. Are there any reasons why this could be happening? Which one should be more reliable? Thanks in advance. Cheers, Fernando Last edited by fjrossello; 01-17-2013 at 07:45 PM. Reason: typo |
![]() |
![]() |
![]() |
#44 |
Member
Location: Paris Join Date: Sep 2008
Posts: 69
|
![]()
Hi Fernando,
I think running FREEC on a pileup should be more or less identical to running it on a BAM files with "mateOrientation=0". In this case, all reads are taken into account during calculation of read count per window. When you select "mateOrientation=FR" for a BAM file, FREEC will keep only pairs mapped in the correct orientation and insert size. Also, in some cases having BAF info can improve predictions (e.g., when float copy number is 2.5 and FREEC hesitates between assigning 2 or 3 copies to the region) Also, in the version 5.9 and before there was a bug that did not allowed FREEC to get correct read count in window with extremely high coverage (> 1000x per position) when using .pileup files. This bug is fixed in 6.0 which must be available the next week. Also, the new version works ~10x faster on an 8 core computer. It can process 30x genome (with control, BAF, in pileup.gz) in one hour ![]() |
![]() |
![]() |
![]() |
#45 | |
Member
Location: Melbourne (Victoria) Australia Join Date: Sep 2011
Posts: 30
|
![]() Quote:
Just to be clear, when you use a pileup file, should the mateOrientation parameter be set to 0? Is that paremeter relevant at all when use this format? Thanks in advance. Cheers, Fernando |
|
![]() |
![]() |
![]() |
#46 |
Member
Location: Paris Join Date: Sep 2008
Posts: 69
|
![]()
No, mateOrientation is not relevant when you use pileup. Still, you need to set this parameter to something
![]() |
![]() |
![]() |
![]() |
#47 |
Member
Location: Melbourne (Victoria) Australia Join Date: Sep 2011
Posts: 30
|
![]()
Hi Valeu,
Sorry to be so insistent in this aspect. I re-run control-freec on an mpileup file of one my samples with and without BAF options and I found a few differences between both runs. First, a simple and rather obvious question, if you have a control match file, does the CNA only analysis output only the somatic gain/loss regions of the sample? This question arises because the CNA+BAF run outputs a CNVs file which reports genotype information and gain/loss/normal in the predicted copy number. When I filter this file to report only somatic gains/losses and compare this output to the CNA only analysis output, the results are not quite the same. Is this a fair comparison? Am I missing something which prevents me from understanding these results? Thanks in advance. Cheers, Fernando Ps: find below the parameters of my config file. As I said, I run it plus and minus BAF, i.e., BAF commented. [general] chrLenFile = hg19.len coefficientOfVariation = 0.05 outputDir = ./ch209_cnv_CNA_only degree = 3 ploidy = 2 samtools = /usr/local/biotools/bin/samtools sex = XY chrFiles = /home/fernandr/biotools/references/iGenomes/Homo_sapiens/UCSC/hg19/Sequence/Chromosomes # step = 5000 # window = 20000 [sample] mateFile = /media/data/projects/wg_fr_20121024/sample_mpileup_files/sample_bwa_wg.mpileup inputFormat = pileup mateOrientation = FR [control] mateFile = /media/data/projects/wg_fr_20121024/sample_mpileup_files/control_bwa_wg.mpileup inputFormat = pileup mateOrientation = FR # [BAF] # # SNPfile = /home/fernandr/biotools/references/freec/hg19/hg19_snp131.SingleDiNucl.1based.txt # minimalCoveragePerPosition = 1 # minimalQualityPerPosition = 0 # shiftInQuality = 33 |
![]() |
![]() |
![]() |
#48 | |
Junior Member
Location: Portugal Join Date: Jun 2013
Posts: 3
|
![]() Quote:
I´m having problems finding the most recent version of this. Thanks in advance |
|
![]() |
![]() |
![]() |
#49 |
Member
Location: Atlanta Join Date: Apr 2011
Posts: 32
|
![]()
First of, I must say FREEC is a great tool for CNV detection in exome seq data!
I have a few questions about the output files I obtained. I have two _cnv, _ratio, _BAF files. For instance, how is *_mpileup_CNV different from *mpileup_normal_CNV? and depending on the file I use my R plots are so different! why would this be? PS: I have paired end tumor-normal illumina data from exome sequencing. ~Thanks for your help, Rini |
![]() |
![]() |
![]() |
#50 | |
Member
Location: Paris Join Date: Sep 2008
Posts: 69
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#51 | ||
Member
Location: Paris Join Date: Sep 2008
Posts: 69
|
![]() Quote:
Quote:
Another reason it can be different: imagine you have a region present in 3 copies in the normal and in 9 copies in the tumor. If you don't use BAF, you will get ratio 3 for this region. Since it is 3>1, this region will be called "gain". If you use [BAF], this region will be identified as "gain" for both normal and tumor samples and thus this gain will be called germline. |
||
![]() |
![]() |
![]() |
#52 |
Junior Member
Location: California Join Date: Mar 2011
Posts: 3
|
![]()
Hi all. First, I wanted to say thanks to valeu for taking the time to answer questions on here, I've found some of the advice to be very useful.
Second, I've been having a problem with segmentation faults when trying to use Freec to compute BAF. I had previously been using Freec to analyze exome samples without calculating BAF, and had success calling CNVs from the .bam files. However, when I tried to compute BAF (which required converting the .bams into pileup files, as well as using a file of known SNPs) I ran into some problems. Specifically, the program dies after about 1 second of runtime with the following as the specific error: line 13: 269410 Segmentation fault This occurs after the program outputs the following: ..Starting reading /home/sf062971/resources/ucsc_snps/snp137.no_dashes.freec_baf.txt to get SNP positions Which suggests that it might be something with my SNP file. However, I have omitted this file and still get a segfault when it tries to read my sample pileups. My configuration file is below. Any help on what may be causing this would be appreciated. [general] window = 5000 step = 1000 ploidy = 2 samtools = /home/sf062971/programs/samtools-0.1.18/samtools minCNAlength = 4 BedGraphOutput = TRUE chrLenFile = /home/sf062971/resources/freec_resources/mm10.len chrFiles = /data/sf062971/data/reference/chr_files noisyData = TRUE printNA=FALSE maxThreads=6 sex=XX breakPointType=4 outputDir = 1148T_1205N_V3 contamination = 0.5 contaminationAdjustment = TRUE [sample] mateFile = /data/sf062971/LUNG_BAMS/SC_GCIM5351148/1148_ALIGN_RECAL_V3/1148_EXOME.mpileup inputFormat = pileup mateOrientation = 0 [control] mateFile = /data/sf062971/LUNG_BAMS/SC_GCIM5351205/1205_ALIGN_RECAL_V3/1205_EXOME.mpileup inputFormat = pileup mateOrientation = 0 [BAF] SNPfile = /home/sf062971/resources/ucsc_snps/snp137.no_dashes.freec_baf.txt minimalCoveragePerPosition = 10 [target] captureRegions = /home/sf062971/resources/agilent_data/covered_regions_mm10_merged_sorted.bed |
![]() |
![]() |
![]() |
#53 | |
Member
Location: Paris Join Date: Sep 2008
Posts: 69
|
![]() Quote:
To create my dbSNP files, I downloaded a file with SNPs from the UCSC genome browser (http://genome.ucsc.edu/cgi-bin/hgTables?command=start), from “Variation and Repeats”/”All SNPs” table. And I kept columns 2, 4, 10, 7, 8 and 5. And I kept only entries with “genomic single”. When you are sure you use the correct SNPfile, check you pileups. They should look like this (http://samtools.sourceforge.net/pileup.shtml): seq1 272 T 24 ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<& seq1 273 T 23 ,.....,,.,.,...,,,.,..A <<<;<<<<<<<<<3<=<<<;<<+ seq1 274 T 23 ,.$....,,.,.,...,,,.,... 7<7;<;<<<<<<<<<=<;<;<<6 seq1 275 A 23 ,$....,,.,.,...,,,.,...^l. <+;9*<<<<<<<<<=<<:;<<<< seq1 276 G 22 ...T,,.,.,...,,,.,.... 33;+<<7=7<<7<&<<1;<<6< seq1 277 T 22 ....,,.,.,.C.,,,.,..G. +7<;<<<<<<<&<=<<:;<<&< seq1 278 G 23 ....,,.,.,...,,,.,....^k. %38*<<;<7<<7<=<<<;<<<<< seq1 279 C 23 A..T,,.,.,...,,,.,..... ;75&<<<<<<<<<=<<<9<<:<< |
|
![]() |
![]() |
![]() |
#54 |
Junior Member
Location: California Join Date: Mar 2011
Posts: 3
|
![]()
Thanks for the reply valeu. I can't use the hg19 file because my exomes are from mice. I downloaded and generated a SNP file that, I believe, matches the formatting of the human files:
chr1 3000568 A/G + T rs29444956 chr1 3000621 A/C + C rs31439779 chr1 3001490 A/C + C rs31521921 chr1 3001579 A/T + A rs30468828 chr1 3001712 C/G + C rs32793997 chr1 3003268 A/G + A rs30748911 chr1 3003414 A/G + A rs31953890 chr1 3003449 C/T + T rs32186899 chr1 3003464 A/G + G rs31079645 chr1 3003508 C/T + C rs32044173 My .pileup files are formatted as follows: chr1 3216016 T 44 ......................,.........,..,,,...,^].^];=>>?>?<=??>>???>=>?>>??>>??>>>>@=>?<<::=;<< chr1 3216017 G 45 ......................,.........,..,,,...,..^].=@@ACCC?ACC@ACCC<BAAB5?CCBCCAAABAAB?:?>?=;?>< chr1 3216018 T 47 ......................,.........,..,,,...,...^].^], :9?=>@9==@>>>@>?=><@>=<==>@?>>=?>>?=;<===9;;;<9 chr1 3216019 A 48 .$.....................,.........,..,,,...,....,^]. 99><<??3<??<<???<<;><<<=?=?><<<>?>>=8<<<=6;:9;:< chr1 3216020 T 47 .....................,.........,..,,,...,....,.:==???<=?>==???===?=4?<?=??===?@>??<??=?>=;;:;< I believe this is standard .pileup format. Despite appearances both of the above are actually tab-delimited. In the event that I can't get BAF calculation to work, what is are the repercussions? I know there are a few options which are explicitly dependent upon BAF (like noisyData). How much will it impact the analysis if these options are disabled? |
![]() |
![]() |
![]() |
#55 | |
Member
Location: Paris Join Date: Sep 2008
Posts: 69
|
![]() Quote:
If you disable [BAF] you may get less accurate calls. However, the result should be almost the same as the one you will obtain with [BAF] and noisyData=FALSE. |
|
![]() |
![]() |
![]() |
#56 |
Junior Member
Location: California Join Date: Mar 2011
Posts: 3
|
![]()
I ended up figuring out what was going on. I had some multiallelic variants in the .snp file that were causing it to fail to load, and my sex variable in the configuration file didn't match up with the actual sample sex which caused problems as well. I ended up dropping the sex argument and using the following general configuration file for my samples:
[general] window = 8000 step = 2500 samtools = samtools minCNAlength = 4 BedGraphOutput = TRUE chrLenFile = NCBIM37_um.fa.len chrFiles = chrfiles outputDir = 31208T_31668N_FREEC_V1 printNA = FALSE maxThreads = 6 ploidy = 2 breakPointType = 4 contaminationAdjustment = TRUE noisyData = TRUE [sample] mateFile = 31208_EXOME.pileup.gz inputFormat = pileup mateOrientation = 0 [control] mateFile = 31668_EXOME.pileup.gz inputFormat = pileup mateOrientation = 0 [target] captureRegions = S0276129_Merged_Sorted_Probes.bed [BAF] SNPfile = snp128.singlebases.monoalleleic.freec_baf.txt minimalCoveragePerPosition = 5 If anyone is interested I also have the commands I used to generate the pileups from the .bams, as well as the script I used to generate a working Mm9 and Mm10 .snp file. |
![]() |
![]() |
![]() |
#57 |
Senior Member
Location: Hong Kong Join Date: Mar 2010
Posts: 498
|
![]()
Does Control-FREEC allows normal and tumor with different coverage? (ExomeCNV doesn't allow different coverage, that's why I ask)
Also, does it allow estimation of contamination rate in the tumor sample? (Probably via LOH route like ExomeCNV?) |
![]() |
![]() |
![]() |
#58 | |
Member
Location: Milan Join Date: May 2013
Posts: 40
|
![]()
Hi ,
I am using control-FREEC with exome sequencing data, so far I have been successful in implementing it on my normal control tumor pairs for CNA detection. I am now curious to apply it further for CNA-LOH detection , how ever when am trying to run it, it undergoes segmentation fault. I checked the files and everything is fine. Below am attaching the config file. Anyone who has already applied it and overcome this problem can give me some suggestions. Quote:
|
|
![]() |
![]() |
![]() |
#59 |
Member
Location: Paris Join Date: Sep 2008
Posts: 69
|
![]()
Hi,
The config looks good. Could you please share the complete output into the command line with me? freec@curie.fr Thank you Valentina |
![]() |
![]() |
![]() |
#60 | |
Member
Location: Milan Join Date: May 2013
Posts: 40
|
![]()
Dear Velntina,
Please find the output for the above call, I have made 6 such calls but all had the same problem. Below am attaching the log of the output run and at what stage I get the segmentation fault. Quote:
|
|
![]() |
![]() |
![]() |
Tags |
cna, copy number, loh, whole genome sequencing |
Thread Tools | |
|
|