SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Need help in Breakdancer for indentifying structural variation zhang1000 Bioinformatics 22 11-06-2011 01:10 AM
PubMed: Genome structural variation discovery and genotyping. Newsbot! Literature Watch 1 04-20-2011 12:22 AM
Structural variation in human genome... Patrick Bioinformatics 26 09-14-2009 12:59 AM
Sequence and structural variation in a human genome uncovered by short-read, massivel benimmyeo Literature Watch 0 06-25-2009 02:14 AM
PubMed: Sequence and structural variation in a human genome uncovered by short-read, Newsbot! Literature Watch 0 06-24-2009 05:00 AM

Reply
 
Thread Tools
Old 09-05-2011, 11:32 PM   #1
tez
Junior Member
 
Location: Australia

Join Date: Jul 2011
Posts: 4
Red face Structural variation detection using BreakDancer on Whole Genome SOLiD data

Hello,

I have been struggling for the last few weeks to get Breakdancer to run accross some whole genome data. The data was sequenced on SOLiD machines and aligned using Bioscope.

I have been able to get Breakdancer to build a configuration file using the parameters for SOLiD (the -C color space option), the actual command looks like:

bam2cfg.pl -n 1000000 -g -h -C normal.bam tumor.bam > breakdancer.cfg

I am then able to run breakdancer_max using that cofig file as such:

breakdancer_max breakdancer.cfg -g output.GBrowse -d fast_q_evidence.o

This command runs.. and runs.. and runs... and finally either runs out of memory or computation time.

The last run I did ran for 100 hours, using 48GB of memory before the job was cancelled for running too long. The output of this was about 6.7 million "detected" structural variations. And it only just got up to chromosome 3!

This leads me to believe it would need 1,000 hours or so of computation time to run fully, which is not feasible at the moment (42 days!). At that rate it would also find 67 million SV's, which doesn't quite seem right!

Is this in line with anyone else's experience?

The tumor and normal files are 120GB and 180GB each, so I don't expect it to be a fast process, but 40 days seems excessive.

I have also attempted to run Breakdancer in single chromosome mode, but this fails with a segmentation fault immediately.

Has anyone been able to get the single chromosome version to work? Or know why it would segfault?


Thank you.
tez is offline   Reply With Quote
Old 09-05-2011, 11:37 PM   #2
tez
Junior Member
 
Location: Australia

Join Date: Jul 2011
Posts: 4
Default

I have now also seen that there is a "-r" option for setting the minimum number of read-pairs required to call an SV.

There isn't much mention of this in the manual, but looking through the source code I see it is set to 2, which would explain the huge number of results, poor run time and memory usage.

Does anyone have any experience with this parameter? Our data is supposed to be at ~30x depth. I am now giving it a try at min_read_pair=10, and I'll let you know how it goes.

Cheers
tez is offline   Reply With Quote
Old 12-12-2011, 06:10 AM   #3
aquinom85
Research Bioinformaticist
 
Location: Boston

Join Date: Dec 2011
Posts: 19
Default

How did things turn out by tweaking the results? I'm looking into BreakDancer but also there is no FAQ and it's rather hard to get a clear picture of the limitations of the software. Do you know if BreakDancer jointly calls samples or if you have to run it on each of your samples then cross-validate the results?
aquinom85 is offline   Reply With Quote
Old 12-12-2011, 12:24 PM   #4
tez
Junior Member
 
Location: Australia

Join Date: Jul 2011
Posts: 4
Default

Hello,

The results did not look good at all. Basically it called about 10,000 structural variations in the "normal" sample, and about 1,300 in the "tumour" sample.

The only way I could get these results was to run break dancer with the -r 10 option, and then to break each whole genome down into chromosomes and run each chromosome separately. Even then it was still a 3-4 day process, running them all in parallel on fairly powerful cluster.

Looks like the biggest issue is data quality. The alignment / mapping was not done by us, and it looks like it may contain quite a lot of noise. So we are now experimenting with different ways to "clean" up the data.

Cheers
tez is offline   Reply With Quote
Old 01-12-2012, 01:01 PM   #5
P-Richmond
Member
 
Location: Boulder, Co

Join Date: Oct 2010
Posts: 13
Default

Any luck in "cleaning up the data"? I have a similar problem, but I'm working in S. cerevisiae and keep running across artifacts of the alignements I'm using (read pairs that map to familial genes (genes with very high sequence identity on different chromosomes).

One possible methodology would be to generate reads from a perfect genome, then run through breakdancer and call that the noise model. I have a system in place for this read generation if you are interested in trying that. Then by simply creating an intersect with the calls from your data, you could produce a set that is more likely to be structural variations that aren't simply artifacts of the alignment or the underlying sequence.

-Phil
P-Richmond is offline   Reply With Quote
Old 01-20-2012, 06:46 AM   #6
aquinom85
Research Bioinformaticist
 
Location: Boston

Join Date: Dec 2011
Posts: 19
Default

I just ran breakdancer on 1 human genome sample and got 29,500 SVs called, in my naive opinion this seems outrageously high. I think I'll try raising the -r value higher. Does anyone know what a normal range of SVs are in the human for comparison? Also, how should the confidence score be considered in general?
aquinom85 is offline   Reply With Quote
Reply

Tags
breakdancer, cancer genomics, structural variations

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:59 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO