Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Structural variation detection using BreakDancer on Whole Genome SOLiD data

    Hello,

    I have been struggling for the last few weeks to get Breakdancer to run accross some whole genome data. The data was sequenced on SOLiD machines and aligned using Bioscope.

    I have been able to get Breakdancer to build a configuration file using the parameters for SOLiD (the -C color space option), the actual command looks like:

    bam2cfg.pl -n 1000000 -g -h -C normal.bam tumor.bam > breakdancer.cfg

    I am then able to run breakdancer_max using that cofig file as such:

    breakdancer_max breakdancer.cfg -g output.GBrowse -d fast_q_evidence.o

    This command runs.. and runs.. and runs... and finally either runs out of memory or computation time.

    The last run I did ran for 100 hours, using 48GB of memory before the job was cancelled for running too long. The output of this was about 6.7 million "detected" structural variations. And it only just got up to chromosome 3!

    This leads me to believe it would need 1,000 hours or so of computation time to run fully, which is not feasible at the moment (42 days!). At that rate it would also find 67 million SV's, which doesn't quite seem right!

    Is this in line with anyone else's experience?

    The tumor and normal files are 120GB and 180GB each, so I don't expect it to be a fast process, but 40 days seems excessive.

    I have also attempted to run Breakdancer in single chromosome mode, but this fails with a segmentation fault immediately.

    Has anyone been able to get the single chromosome version to work? Or know why it would segfault?


    Thank you.

  • #2
    I have now also seen that there is a "-r" option for setting the minimum number of read-pairs required to call an SV.

    There isn't much mention of this in the manual, but looking through the source code I see it is set to 2, which would explain the huge number of results, poor run time and memory usage.

    Does anyone have any experience with this parameter? Our data is supposed to be at ~30x depth. I am now giving it a try at min_read_pair=10, and I'll let you know how it goes.

    Cheers

    Comment


    • #3
      How did things turn out by tweaking the results? I'm looking into BreakDancer but also there is no FAQ and it's rather hard to get a clear picture of the limitations of the software. Do you know if BreakDancer jointly calls samples or if you have to run it on each of your samples then cross-validate the results?

      Comment


      • #4
        Hello,

        The results did not look good at all. Basically it called about 10,000 structural variations in the "normal" sample, and about 1,300 in the "tumour" sample.

        The only way I could get these results was to run break dancer with the -r 10 option, and then to break each whole genome down into chromosomes and run each chromosome separately. Even then it was still a 3-4 day process, running them all in parallel on fairly powerful cluster.

        Looks like the biggest issue is data quality. The alignment / mapping was not done by us, and it looks like it may contain quite a lot of noise. So we are now experimenting with different ways to "clean" up the data.

        Cheers

        Comment


        • #5
          Any luck in "cleaning up the data"? I have a similar problem, but I'm working in S. cerevisiae and keep running across artifacts of the alignements I'm using (read pairs that map to familial genes (genes with very high sequence identity on different chromosomes).

          One possible methodology would be to generate reads from a perfect genome, then run through breakdancer and call that the noise model. I have a system in place for this read generation if you are interested in trying that. Then by simply creating an intersect with the calls from your data, you could produce a set that is more likely to be structural variations that aren't simply artifacts of the alignment or the underlying sequence.

          -Phil

          Comment


          • #6
            I just ran breakdancer on 1 human genome sample and got 29,500 SVs called, in my naive opinion this seems outrageously high. I think I'll try raising the -r value higher. Does anyone know what a normal range of SVs are in the human for comparison? Also, how should the confidence score be considered in general?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            51 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X