Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Misunderstanding breakdancer output

    Hello
    I am running breakdancer for my samples and I get some results that I don't understand.
    For example I get these:

    1. scaffold_4 1084242 29+0- scaffold_4 13700966 1+16- DEL 12616730 77 11 /scratch/ralonso/ivia_130/ivia_130_mapped_singlehit_sorted.bam|11 1.992.
    2. scaffold_4 5580548 60+0- scaffold_4 11167464 0+23- DEL 5586874 99 23 /scratch/ralonso/ivia_130/ivia_130_mapped_singlehit_sorted.bam|23 1.88
    3. scaffold_4 6439582 8+0- scaffold_4 10304779 0+8- DEL 3864989 93 8 /scratch/ralonso/ivia_130/ivia_130_mapped_singlehit_sorted.bam|8 1.81
    4. scaffold_4 13059872 40+1- scaffold_4 14799329 0+38- DEL 1694903 99 39 /scratch/ralonso/ivia_130/ivia_130_mapped_singlehit_sorted.bam|39 1.35
    5. scaffold_4 17963169 23+9- scaffold_4 19018791 17+47- DEL 1055818 99 23 /scratch/ralonso/ivia_130/ivia_130_mapped_singlehit_sorted.bam|23 2.16

    In the first result I get a deletion of 12616730, that is the half of the scaffold_4. I know, by seing with IGV and Savant Genome Browser the concret bam (/scratch/ralonso/ivia_130/ivia_130_mapped_singlehit_sorted.bam), that this "deletion" is full of coverage and there are no such deletions, I guess I am misunderstanding something about the output because the score is quite high.
    It happens the same for the others, for example in the result 4 we have an score of 99 and 39 reads supporting this,

    I am quite new in this field, so anyone could help me?

    Thanks!!

  • #2
    I believe Breakdancer determines deletions by inspecting the insert sizes of paired reads in an alignment (in the case of deletions, a greater distance between read pairs) rather than looking at the read depth along the alignment. So, in these cases where you aren't seeing any changes in coverage I'm guessing that perhaps something weird has happened in the alignment.

    I think the Breakdancer quality score comes from the mapping and quality scores for the reads in the bam file

    Perhaps the reference genome is not correctly assembled at scaffold 4...?

    You could try changing the upper and lower bounds of the 'acceptable' insert size and see if that helps?

    Comment


    • #3
      Hi, I am also having problems interpreting the output from Breakdancer. I hope someone can help me. The breakdancer mailing list seems to be silent. I am using version 1.1_2011_02_21

      I found strange that, in most of my results, columns 3 and 6 are equal but then column 10 only shows a very small number of read pairs in comparison with columns 3-6. Please see the following examples:

      chr10 15253864 104+105- chr10 15256621 104+105- INV -84 73 4
      chr10 15257851 10+14- chr10 15258198 10+14- INV -170 49 2
      chr10 15561154 11+10- chr10 15561442 11+10- INV -188 51 2
      chr10 15614060 11+14- chr10 15614763 11+14- INV -131 44 2
      chr10 15645913 10+18- chr10 15646639 10+18- INV -127 95 4
      chr10 15649499 31+22- chr10 15650496 31+22- INV -209 99 5
      chr10 15685568 9+15- chr10 15686258 9+15- DEL 426 46 2
      chr10 24831499 103+64- chr10 24833457 103+64- DEL 460 99 7

      Is, for example, row1 saying that there are in total 209 reads aligning at chr10:15253864 and 209 reads aligning at chr10:15256621 but only 4 pairs are supporting the inversion? Is this because the SV is a short inversion and the reads detected are the same for both positions?

      Also, the last but one result has a confidence score of 99 but still only 5 pairs properly support the inversion but 53 reads aligning on each side of the SV. Am I interpreting it correctly?

      For the last row, a 460bp deletion with 99 confidence, the same is happening. Only 7 pairs are properly supporting the deletion. Is this correct?



      Any comments would be greatly appreciated.

      Dave

      Comment


      • #4
        I have encountered the same issue in the output files, using the same breakdancer version 1.1_2011_02_21, and am curious to know why is there such big discrepancy in the number of supporting reads given in different columns of the output file.

        This is the first time I've ran breakdancer, on a human genome sequenced paired-end on a single lane of HiSeq2000, from a patient with an inter-chromosomal translocation identified through cytogenetics. I've identified the breakpoint easily because I knew where to look, so I ran breakdancer to see if we could have identified it without prior cytogenetic work. Thanks to the previous posts on this forum it only took an hour or so to tweak the cpp and perl files and get it all running, and another 20min to generate the results (for trans-chromosomal rearrangements only)!

        My results: I got 140-odd CTX calls, including the real translocation, but nothing in the output file suggests that it's any more real than most of the other calls: 3/0 supporting reads on the +/- strands (although there are actually 7 supporting reads in total, 4/3 on +/- strands, displayed nicely in IGV with different colours for read pairs on discordant chromosomes), and confidence score of 43 (one of the lowest scores generated):

        chrA posXXX 3+0- chrB posYYY 3+0- CTX -364 43 2 myfile.bam|2

        So in addition to the above question, does anyone have an idea why the output file does not include all 7 supporting reads for this translocation - are there any options that I should change? Also, as per previous post, how should we interpret the confidence score?

        Many thanks.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Advancing Precision Medicine for Rare Diseases in Children
          by seqadmin




          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
          12-16-2024, 07:57 AM
        • seqadmin
          Recent Advances in Sequencing Technologies
          by seqadmin



          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

          Long-Read Sequencing
          Long-read sequencing has seen remarkable advancements,...
          12-02-2024, 01:49 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 12-17-2024, 10:28 AM
        0 responses
        33 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-13-2024, 08:24 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-12-2024, 07:41 AM
        0 responses
        34 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-11-2024, 07:45 AM
        0 responses
        46 views
        0 likes
        Last Post seqadmin  
        Working...
        X