SEQanswers

Go Back   SEQanswers > Literature Watch



Similar Threads
Thread Thread Starter Forum Replies Last Post
FREEC GC content file jorge Bioinformatics 9 08-27-2014 12:27 AM
Copy number analysis using 454 data ps376 Bioinformatics 3 10-08-2011 04:56 AM
Webinar on Quality Control of NGS Data - FREE Strand SI Events / Conferences 0 09-09-2011 06:33 PM
PubMed: Control-free calling of copy number alterations in deep-sequencing data using Newsbot! Literature Watch 0 04-08-2011 01:10 AM
way to normalize copy number data for small RNAs/miRNAs? vebaev Bioinformatics 2 03-28-2011 02:18 AM

Reply
 
Thread Tools
Old 01-12-2012, 02:19 AM   #1
valeu
Member
 
Location: Paris

Join Date: Sep 2008
Posts: 69
Default Control-FREEC: a tool for assessing copy number and allelic content using NGS data

Control-FREEC enables automatic calculation of copy number and allelic content profiles from next generation sequencing data, and consequently predicts regions of genomic alteration such as gains, losses, and loss of heterozygosity (LOH).

Taking as input aligned reads, Control-FREEC constructs copy number and B-allele frequency profiles. The profiles are then normalized, segmented and analyzed in order to assign genotype status (copy number and allelic content) to each genomic region. When a matched normal sample is provided, Control-FREEC discriminates somatic from germline events.

Control-FREEC is able to analyze over-diploid tumor samples and samples contaminated by normal cells.

Low mappability regions can be excluded from the analysis using provided mappability tracks.

Publications:


Boeva V, Zinovyev A, Bleakley K, Vert JP, Janoueix-Lerosey I, Delattre O, Barillot E. (2011) Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization. Bioinformatics 2011; 27(2):268-9. PMID: 21081509.

Boeva V, Popova T, Bleakley K, Chiche P, Cappo J, Schleiermacher G, Janoueix-Lerosey I, Delattre O, Barillot E. (2011) Control-FREEC: a tool for assessing copy number and allelic content using next generation sequencing data. Bioinformatics. 2011 Dec 6. [Epub ahead of print] PMID: 22155870.

Input for detection of copy number alterations (CNAs):

Aligned single-end, paired-end or mate-pair data in SAM, BAM, SAMtools pileup, Eland, BED, SOAP, arachne, psl (BLAT) and Bowtie formats. Control-FREEC accepts .GZ files.

Input for CNA+LOH detection:

Aligned reads in SAMtools pileup format. The file can be GZipped.

Output:

Regions of gains, lossed and LOH, copy number and BAF profiles.

Availability:

http://bioinfo.curie.fr/projects/freec/
valeu is offline   Reply With Quote
Old 02-07-2012, 01:49 PM   #2
dmacmillan
Member
 
Location: British Columbia

Join Date: Jan 2012
Posts: 49
Default

I have a couple questions about Control-FREEC,
  1. How does it use the normal sample if provided?
  2. If one provides a normal sample as well as chromosome fasta files, does it account for gc-content bias using the normal sample as well as the fasta files? Or just one of them.
  3. Does Control-FREEC get its bin size from genomic coordinates, or does it use read density?
dmacmillan is offline   Reply With Quote
Old 02-08-2012, 12:20 AM   #3
valeu
Member
 
Location: Paris

Join Date: Sep 2008
Posts: 69
Default

Hi!

Quote:
How does it use the normal sample if provided?
If you do not want to predict allelic status (the [BAF] group of parameters is empty), then the normal sample will be used instead of GC-content to normalize the read count in the tumor sample.
If you want to calculate BAF and allelic status, then the normalization is done using GC-content but CNVs will be annotated as somatic or germline using information from the normal sample.

Quote:
If one provides a normal sample as well as chromosome fasta files, does it account for gc-content bias using the normal sample as well as the fasta files? Or just one of them.
No, if you only look for CNVs. However, it accounts for both of them if you look for allelic status.
In the first case, you can force GC-content normalization using option "forceGCcontentNormalization" (see http://bioinfo-out.curie.fr/projects...al.html#CONFIG)

Quote:
Does Control-FREEC get its bin size from genomic coordinates, or does it use read density?
It uses genomic coordinates.
BTW, if you are not sure about a good value for window size use option "coefficientOfVariation" to evaluate it.
valeu is offline   Reply With Quote
Old 02-08-2012, 09:27 AM   #4
dmacmillan
Member
 
Location: British Columbia

Join Date: Jan 2012
Posts: 49
Default

Thank you for the reply, that is helpful to know.

Another question,
How would one go about getting or creating a text file with SNPs such as the one you provide on the website? So as to use the BAF parameter.
dmacmillan is offline   Reply With Quote
Old 02-08-2012, 10:41 AM   #5
valeu
Member
 
Location: Paris

Join Date: Sep 2008
Posts: 69
Default

I downloaded it from UCSC. It should be their standard format.
valeu is offline   Reply With Quote
Old 02-08-2012, 12:37 PM   #6
dmacmillan
Member
 
Location: British Columbia

Join Date: Jan 2012
Posts: 49
Default

Could you point out where you found it? When I look at hg18 downloads all I see are fasta files corresponding to SNPs.
dmacmillan is offline   Reply With Quote
Old 02-08-2012, 01:15 PM   #7
valeu
Member
 
Location: Paris

Join Date: Sep 2008
Posts: 69
Default

I download it through "Tables".
valeu is offline   Reply With Quote
Old 02-09-2012, 01:32 PM   #8
dmacmillan
Member
 
Location: British Columbia

Join Date: Jan 2012
Posts: 49
Default

Could you be more specific? I have looked at the tables and gone through a lot of SNP's but none of them match the formatting of the file you include on the tutorial page where it gives the description of the BAF parameters. Can you go through step by step?
dmacmillan is offline   Reply With Quote
Old 02-10-2012, 12:29 PM   #9
aggp11
Member
 
Location: Wisconsin

Join Date: Jun 2011
Posts: 87
Default

Hello valeu,

This is a very nice tool. Just the fact that it can still analyze the CNVs without needing a control sample. However, in your opinion do you think this could be used for certain Custom capture experiments? If so would it be better to instead of chromosome lengths provide the lengths of the targeted regions (within chromosomes).

Thanks,
Praful
aggp11 is offline   Reply With Quote
Old 02-10-2012, 12:34 PM   #10
dmacmillan
Member
 
Location: British Columbia

Join Date: Jan 2012
Posts: 49
Default

I managed to find out what file you were using, it turns out it is under tables > your assembly > all tracks > SNP130 (if your using hg18, for hg19 its 131) > hg18.snp130OrthoPt2Pa2Rm2

You then have to pick the columns you want which are:
chrom, chromstart, humanObserved, humanAllele, humanStrand

Even if you do this the file is not the same size as the one provided on the controlfreec website, also the columns are in a slightly different order.
dmacmillan is offline   Reply With Quote
Old 02-10-2012, 01:35 PM   #11
valeu
Member
 
Location: Paris

Join Date: Sep 2008
Posts: 69
Default

You are right, indeed, I changed the order of columns. It should be 2, 4, 10, 7, 8 and 5.

Sorry, I should have mentioned it.
valeu is offline   Reply With Quote
Old 02-27-2012, 08:38 AM   #12
dmacmillan
Member
 
Location: British Columbia

Join Date: Jan 2012
Posts: 49
Default

Hi, so I have been using this software for multiple analyses. However there are some quirks.

Here is a template of my config file:
Code:
[general]

chrLenFile = /projects/copy_num_ana/x07_controlfreec/hg18/res_all_genome.len
coefficientOfVariation = 0.062
ploidy = 2
outputDir = /projects/copy_num_ana/apollo_freec/4.24/RG/RG014/output
chrFiles = /projects/copy_num_ana/x07_controlfreec/hg18_fastas
forceGCcontentNormalization = 2

[sample]

mateFile = /projects/DLBCL/CNV/RG014/tumour/A01414_10_lanes_dupsFlagged.bam
inputFormat = BAM
mateOrientation = FR

[control]

mateFile = /projects/DLBCL/CNV/RG014/normal/A01443_9_lanes_dupsFlagged.bam
inputFormat = BAM
mateOrientation = FR
I am getting strange copy number calls, almost point-like gains in the middle of neutral or otherwise regions. Here is an example of what I am talking about (If you consider the red to be a gain, the blue to be neutral, and the green to be a loss):




Notice how the areas seem to correspond to areas with high GC-content and low reads. The IGV diagram above corresponds to chromosome 17 specifically the point right above RG034 at 3.98e10bp and 2.75 ratio, depicted below:


What is causing this? Has it been encountered before? And is there a way to correct this?
dmacmillan is offline   Reply With Quote
Old 02-27-2012, 09:06 AM   #13
valeu
Member
 
Location: Paris

Join Date: Sep 2008
Posts: 69
Default

Hi!

to avoid "point" CNVs you can use "minCNAlength=4" or more.. By default, it is 1.

Does it help?
valeu is offline   Reply With Quote
Old 02-29-2012, 09:18 AM   #14
dmacmillan
Member
 
Location: British Columbia

Join Date: Jan 2012
Posts: 49
Default

Yes this helps. I was also wondering if there was a way to smooth out centromere calls? It seems to call very high or very short CNVs nearing centromeres.
dmacmillan is offline   Reply With Quote
Old 02-29-2012, 09:33 AM   #15
valeu
Member
 
Location: Paris

Join Date: Sep 2008
Posts: 69
Default

do you use option "gemMappabilityFile" + "minMappabilityPerWindow"? usually, this helps since centromeric regions are not uniquely mappable. See http://bioinfo-out.curie.fr/projects...al.html#CONFIG
valeu is offline   Reply With Quote
Old 02-29-2012, 09:35 AM   #16
dmacmillan
Member
 
Location: British Columbia

Join Date: Jan 2012
Posts: 49
Default

No I currently do not, to save time. I will try it with the GEM file though. I'll let you know how it goes!
dmacmillan is offline   Reply With Quote
Old 02-29-2012, 11:06 AM   #17
dmacmillan
Member
 
Location: British Columbia

Join Date: Jan 2012
Posts: 49
Default

Oh I forgot to ask about the circos format script. I plugged in my *ratios.txt file and got some output. But it does not look like circos format to me, how am I supposed to use the output given from this script?
dmacmillan is offline   Reply With Quote
Old 03-01-2012, 01:41 AM   #18
valeu
Member
 
Location: Paris

Join Date: Sep 2008
Posts: 69
Default

If it looks like:

Code:
hs1 1631000 1731000 3.953692
hs1 1731000 1831000 3.953692
hs1 1831000 1931000 3.953692
hs1 1931000 2031000 3.953692
Circos will undestand it. You should add to the Circos config something like:

Code:
<plot>
show    = yes
type    = scatter
file = yourFile.CNP.circos.txt

glyph = rectangle
 glyph_size = 4
 fill_color = dgrey
 stroke_color = dgrey
 stroke_thickness = 1

min   = 0
max   = 6
#r0    = 0.7r
#r1    = 0.975r
r0    = 0.76r
r1    = 0.975r

background       = no
background_color = vvlgrey
background_stroke_color = black
background_stroke_thickness = 1

axis           = yes
axis_color     = lgrey
axis_thickness = 1
axis_spacing   = 1

<rules>

<rule>
importance   = 100
condition    = var(value) >= 6
value=6
glyph = rectangle
 glyph_size = 4
 fill_color = red
 stroke_color = red
 stroke_thickness = 1
</rule>

<rule>
importance   = 100
condition    = var(value) > 2.5 && var(value) < 6
glyph = rectangle
 glyph_size = 4
 fill_color = orange
 stroke_color = orange
 stroke_thickness = 1
</rule>

<rule>
importance   = 85
condition    = var(value) < 1.5
 glyph_size = 4
 fill_color = blue
 stroke_color = blue
 stroke_thickness = 1
</rule>

<rule>
show=0
importance   = 85
condition    = var(value) < 0
color = blue
</rule>
</rules>

</plot>

Last edited by valeu; 07-30-2014 at 07:00 AM.
valeu is offline   Reply With Quote
Old 05-18-2012, 11:52 AM   #19
bw.
Member
 
Location: San Francisco, CA

Join Date: Mar 2012
Posts: 21
Default Calling CNA and BAF for exome seq data without a control

Hi, I'm trying to use FREEC to call CNA and BAF in some exome seq samples, but haven't been able to make it work. The tool ran fine on the chr19 test data.

The problems I'm seeing are
- the tool runs but doesn't generate any of the output files except mpileup.txt_sample.cpn
It exits with the error message:
..There is no control sample!!!
You have to use a matching control sample to get adequite results since GC-bias is not the only bias in targeted sequencing

- even the .cpn file doesn't seem right because it tries to call copy number outside the target regions I provided via the TruSeq_exome_targeted_regions.hg19.bed
and not surprisingly calls them all as 0 copy number.


I'm running the tool on a pileup generated using samtools v.0.1.12. I'm using the following config file:


[general]
chrLenFile=hg19.len

window = 3000
step = 1000
ploidy = 2

#breakPointThreshold = -.001

#GCcontentProfile = GC_profile.cnp

intercept=1
minMappabilityPerWindow = 0.7

outputDir = .

sex=XY
breakPointType=4

#degree=3
#coefficientOfVariation = 0.05
#gemMappabilityFile = /hg19/out76_hg19.gem

chrFiles = FREEC_Linux64/chromosomes/

[sample]

mateFile=mpileup.txt
#mateCopyNumberFile=mpileup.txt_sample.cpn

inputFormat = pileup
mateOrientation = FR

[control]

[BAF]

SNPfile = hg19_snp131.SingleDiNucl.1based.txt
minimalCoveragePerPosition = 5


[target]
captureRegions=TruSeq_exome_targeted_regions.hg19.bed
bw. is offline   Reply With Quote
Old 05-18-2012, 12:51 PM   #20
valeu
Member
 
Location: Paris

Join Date: Sep 2008
Posts: 69
Default

Hi, at this moment there is no option to run FREEC on exome-seq data without control sample. But since you are the second person who wants to do it, I will probably add this option soon.
valeu is offline   Reply With Quote
Reply

Tags
cna, copy number, loh, whole genome sequencing

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:05 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO