SEQanswers

Go Back   SEQanswers > Literature Watch



Similar Threads
Thread Thread Starter Forum Replies Last Post
FREEC GC content file jorge Bioinformatics 9 08-27-2014 12:27 AM
Copy number analysis using 454 data ps376 Bioinformatics 3 10-08-2011 04:56 AM
Webinar on Quality Control of NGS Data - FREE Strand SI Events / Conferences 0 09-09-2011 06:33 PM
PubMed: Control-free calling of copy number alterations in deep-sequencing data using Newsbot! Literature Watch 0 04-08-2011 01:10 AM
way to normalize copy number data for small RNAs/miRNAs? vebaev Bioinformatics 2 03-28-2011 02:18 AM

Reply
 
Thread Tools
Old 05-18-2012, 02:05 PM   #21
bw.
Member
 
Location: San Francisco, CA

Join Date: Mar 2012
Posts: 21
Default

That would really useful!
My backup was to use per-exon mean target coverage generated by Picard HS Metrics, with some added correction for %GC, but I'd rather use your tool.
Thanks,
-Ben
bw. is offline   Reply With Quote
Old 06-07-2012, 11:25 PM   #22
bwt4383
Junior Member
 
Location: Germany

Join Date: Jun 2012
Posts: 2
Default Display of CNV-data in IGV

Hi, I am struggling with displaying CNV data generated with freec in IGV. Can somebody give me some advice how to display CNVs in an inutitive way?

Thanks!
bwt4383 is offline   Reply With Quote
Old 06-08-2012, 01:30 AM   #23
valeu
Member
 
Location: Paris

Join Date: Sep 2008
Posts: 69
Default

Hi!

I think I should right a script to create a .WIG file from _ratio.txt (http://www.broadinstitute.org/igv/WIG)

If you code in perl, you could do it yourself and share the script. Otherwise, I will do it when I have time
valeu is offline   Reply With Quote
Old 06-08-2012, 02:30 AM   #24
bwt4383
Junior Member
 
Location: Germany

Join Date: Jun 2012
Posts: 2
Default

Thanks for your reply! It would be very helpful if you could implement a script for .WIG file creation!
:-)
bwt4383 is offline   Reply With Quote
Old 07-05-2012, 04:18 AM   #25
valeu
Member
 
Location: Paris

Join Date: Sep 2008
Posts: 69
Default

Version 5.7 of Control-FREEC can now create BedGraph tracks on demand. Just set

Code:
[general]
BedGraphOutput=TRUE
This format is supported by the UCSC genome browser as well as IGV (http://www.broadinstitute.org/software/igv/bedgraph)
valeu is offline   Reply With Quote
Old 07-17-2012, 07:44 AM   #26
monkeychild
Junior Member
 
Location: Cambridge

Join Date: Jul 2012
Posts: 1
Default

Hi

I am (slowly) trying to integrate Control-FREEC into a pipeline, and I was wondering if there's a way to generate a GC-content profile outside of running the whole thing once. Like maybe one of the already existing scripts could be called outside of freec first?
monkeychild is offline   Reply With Quote
Old 07-18-2012, 11:06 AM   #27
valeu
Member
 
Location: Paris

Join Date: Sep 2008
Posts: 69
Default

Hi,

I have not foreseen an option to create a GC-content profile outside of FREEC. But you could make a trick: something like providing a read file with 10 reads and setting a window size.

I will think about adding a separate function to calculate this profile.
valeu is offline   Reply With Quote
Old 07-31-2012, 07:06 AM   #28
valeu
Member
 
Location: Paris

Join Date: Sep 2008
Posts: 69
Default

Hi,

so now you can generate a GC-content profile independently from FREEC:
http://bioinfo-out.curie.fr/projects...orial.html#GCP
valeu is offline   Reply With Quote
Old 07-31-2012, 11:45 PM   #29
yuhao
Member
 
Location: beijing

Join Date: Jul 2012
Posts: 33
Default a unusual problem in FREEC

I meet an unusual problem when I run FREEC. If I provide a mateCopyNumberFile, and then qsub, it will run sucessfully, but in fact I normally don't have mateCopyNumberFile, but if I don't provide it, it can't run, the error message is:

sh:samtools:command not found

Error: FREEC was not able to extract reads from /database/chenxi/task/cancerProgram/pipline/bwa/BPMICS4/sortedbam/PD1W_XXL_BPMICS4_NoIndex_modified.bam

Check your parameters: inputFormat and matesOrientation
Use "matesOrientation=0" if you have single end reads
Check the list of possible input formats at http://bioinfo-out.curie.fr/projects...al.html#CONFIG



I can't figure out what happens here. My parameters are absolutely correct, and I have samtools in my working directory and environmental PATH.

Can anyone give me some helps? I am not able to run FREEC because of this.
yuhao is offline   Reply With Quote
Old 08-01-2012, 01:10 AM   #30
valeu
Member
 
Location: Paris

Join Date: Sep 2008
Posts: 69
Default

I think I know what happens. When you provide mateCopyNumberFile, FREEC does not try to read your .BAM file. When there is no mateCopyNumberFile available FREEC tries to read your file by calling "samtools" in the command line. And it seems that "samtools" does not point to anything.

I would suggest two solutions:
  1. in the bash script that you qsub, add something like
    Code:
    export PATH=$PATH:/pathToSamtools
    #check whether samtools exists
    samtools 2>samtools.usage.txt
    #run FREEC
    freec -conf myCofig.txt
  1. transform you .BAM file to .SAM. Then, FREEC will not need samtools.

If you prefer, you can write to me directly to [email protected]
valeu is offline   Reply With Quote
Old 08-02-2012, 10:40 PM   #31
yuhao
Member
 
Location: beijing

Join Date: Jul 2012
Posts: 33
Default

Thank you for your help! I meet another question: I want to plot the graph using makeGraph.R , when I run, it shows:

null device
1
Error in if (type.convert(args[6])) { :
missing value where TRUE/FALSE needed
Execution halted

Can you give me some help ? Thank you !
yuhao is offline   Reply With Quote
Old 08-03-2012, 05:14 AM   #32
yuhao
Member
 
Location: beijing

Join Date: Jul 2012
Posts: 33
Default

May I ask a question, what does the "ratio" mean in FREEC? Thanks!
yuhao is offline   Reply With Quote
Old 08-03-2012, 06:17 AM   #33
valeu
Member
 
Location: Paris

Join Date: Sep 2008
Posts: 69
Default

"ratio" is actually "normalized read count". Values around 1 correspond to the main ploidy of the sample.

If you use a control sample and you set degree=1, then "ratio" is simply the ratio of read count in the sample and read count in the control.
valeu is offline   Reply With Quote
Old 08-05-2012, 06:57 PM   #34
yuhao
Member
 
Location: beijing

Join Date: Jul 2012
Posts: 33
Default

I am very appreciated for your patient help! I have some other questions to see if I can get your help:

The output intervals have some overlaps, e.x., 58000, 8387999, 3 gain, 8386000, 9404999 5 gain , so 8386000 < 8387999, how could this thing happen?

What does control database mean here?Normally we just have a test genome and a reference genome.

As far as I know, there are typically two different methods to call CNV, segmentation based, and hidden markov model, I am wondering if FREEC is based on segmentation based method?

How do we determine the window size and steps parameters? Which parameters can affect the accuracy of the result, that's very crucial for the result so I care much about this?

Finally, aside from FREEC, can you recommend some other softwares which had been widely used for CNV detection in the world (because I have many choices but I don't know which ones are best among all). I also tried CNVnator, but the result seems very different from FREEC.

I appreciate your help!
yuhao is offline   Reply With Quote
Old 08-06-2012, 12:59 AM   #35
valeu
Member
 
Location: Paris

Join Date: Sep 2008
Posts: 69
Default

Hi Hao,

Quote:
Originally Posted by yuhao View Post
The output intervals have some overlaps, e.x., 58000, 8387999, 3 gain, 8386000, 9404999 5 gain , so 8386000 < 8387999, how could this thing happen?
This can happen if you use overlapping windows (e.g., step=1000; window=3000). Most likely the breakpoint occurred in overlapping area of the two windows: (8386000;8386000+window.size) and (8387999-window.size;8387999), e.i. in (8386000;8387999).

Quote:
Originally Posted by yuhao View Post
What does control database mean here?Normally we just have a test genome and a reference genome.
If you analyze a cancer sample, you are interested in somatic gains and losses. In this case you use patient's normal DNA (e.g. from blood) as a control.

Quote:
Originally Posted by yuhao View Post
As far as I know, there are typically two different methods to call CNV, segmentation based, and hidden markov model, I am wondering if FREEC is based on segmentation based method?
The method has been published:

Pubmed links

Both papers are in open access. Have a look!

FREEC uses Lasso-based segmentation.

Quote:
Originally Posted by yuhao View Post
How do we determine the window size and steps parameters? Which parameters can affect the accuracy of the result, that's very crucial for the result so I care much about this?
Window size can be determined automatically, if you use parameter "coefficient of variation". See Supplementary Methods of (the first publication)

Using "step" will help to improve sensitivity and get prettier graphs, but it can be time consuming.

One of the most important parameters is "breakpoint threshold" (positive, default 0.8). Use smaller values to get more segments, if by eye you see that segmentation was not sensitive enough.

Quote:
Originally Posted by yuhao View Post
Finally, aside from FREEC, can you recommend some other softwares which had been widely used for CNV detection in the world (because I have many choices but I don't know which ones are best among all). I also tried CNVnator, but the result seems very different from FREEC.
It is better to ask this question to the community. You need to be more precise about your data: whether you have paired-ends, your coverage, whether it is human data, normal individual or a cancer patient, whether you have control sample, etc.
valeu is offline   Reply With Quote
Old 08-09-2012, 03:13 AM   #36
yuhao
Member
 
Location: beijing

Join Date: Jul 2012
Posts: 33
Default

Hi, valeu,

I am currently have two cancer cells datas(the same cancer) from human, the coverage depth are about 33,39, with a depth statistics for each base. In this case, what is the best software for CNV detection? I use FREEC and get the result with parameters (window=3000, step=1000 and other same parameters as in test config file provided in the website), and I am facing a problem is how to see the CNV? how to compare these two results? In stead of list all the CNVs with CNV type, start and ends positions and copy number, what other statistics do we usually use to anaylze CNV?

I find that the CNV detected for these two cancer cells doesn't share any commons, the break points are different, the copy number are different, it looks like they are different, but it is strange, two cancer cells with the sam cancer their CNV are completely different, I am wondering if there is anything wrong in the case?

Thank you !
yuhao is offline   Reply With Quote
Old 08-09-2012, 03:29 AM   #37
valeu
Member
 
Location: Paris

Join Date: Sep 2008
Posts: 69
Default

Hi Hao,

You know, two cell lines for the same type of cancer can be very different Especially for "non-copy-number" tumors.

But even for "copy-number" tumors, such as neuroblastoma, CNA regions can be different. See, for example, sequencing data for neuroblastoma samples: suppl.figures from Molenaar et al., 2012
valeu is offline   Reply With Quote
Old 12-20-2012, 01:51 PM   #38
fjrossello
Member
 
Location: Melbourne (Victoria) Australia

Join Date: Sep 2011
Posts: 30
Default

Hi Valeu,

I am using control-freec to detect CNV and LOH in normal vs tumor samples (low pass whole genome).
I had no problems to run it at all. However, I would like to ask you a couple of questions in regards to the files outputted and the plotting process.
First, when I run CNV + LOH using SAM pileups, apart from creating the standard _CNVs, _ratio.txt, _BAF.txt _sample.cnp, _control.cnp and GC_profile.cnp output files, it also generates three extra files with suffix _normal_CNVs, _normal_ratio.txt and _normal_BAF.txt. Are they the output obtained when CNV and LOH were calculated on the control sample when using the CG_profile.cnp?
Second, even though it works flawlessly for the ratios CNV data, I cannot make the script makeGraph.R to plot the LOH _BAF.txt file.

I used the following line:

cat /usr/local/biotools/freec/scripts/makeGraph.R | R --slave --args 2 sample_bwa_wg.mpileup_ratio.txt sample_bwa_wg.mpileup_BAF.txt

Any ideas of why is this is happening?

Thanks in advance.

Cheers,

Fernando

Last edited by fjrossello; 12-20-2012 at 01:52 PM. Reason: Typo
fjrossello is offline   Reply With Quote
Old 12-21-2012, 01:55 AM   #39
valeu
Member
 
Location: Paris

Join Date: Sep 2008
Posts: 69
Default

Hi Fernando,

Quote:
Are they the output obtained when CNV and LOH were calculated on the control sample when using the CG_profile.cnp?
Yes, you are right.

Quote:
Any ideas of why is this is happening?
I recently updated makeGraph.R, can you download the latest version from the site and see if it produces the same error?

What does it write into the command line?
valeu is offline   Reply With Quote
Old 12-21-2012, 03:05 PM   #40
fjrossello
Member
 
Location: Melbourne (Victoria) Australia

Join Date: Sep 2011
Posts: 30
Default

Hi Valeu,
Thanks for your explanation and in regards to the R plots, I downloaded the latest makeGraph.R and works perfectly.
Cheers,
Fernando
fjrossello is offline   Reply With Quote
Reply

Tags
cna, copy number, loh, whole genome sequencing

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:41 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO