Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Oh, I see now why I never had this problem. I always remove duplicate paired reads before SV detection (which is what the "d" means).

    I can't recommend keeping duplicates in the files for SV analysis. Part of the robustness of SV detection is based on accurately counting the number of unique paired reads across a SV breakpoint. If you're leaving in duplicates, those numbers will be off and you'll potentially end up with additional false positives. A long mate-pair library should not have a large number of paired duplicates anyway unless the library was unfortunately low in complexity.
    Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
    Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
    Projects: U87MG whole genome sequence [Website] [Paper]

    Comment


    • #32
      Hi Michael,
      Thanks much for your reply! Makes sense!
      Uma

      Comment


      • #33
        Collaboration ??

        Hi MJ

        I am working on a megasize project of the order of 1000 Genomes Project, and was wondering if I could collaborate with you and your tool Breakway. What is the status of the tool as I do not see any post after 2010 in this thread at Seqanswers. Have you kept the tool updated and maintained ? Does your Breakway tool work for all variants of SAM/BAM files such as those processed by SAMtools and Piccard latest versions ? What is the rough estimate of memory requirement and execution time for the software tool to detect SVs. Also, what kind of structural variations do you find by your tool - insertion, deletion, duplication, tandem duplication, inversion, novel sequence insertion, CNVs, SNPs, etc. ? Did you make any publication of your tool ?


        Aby

        Comment


        • #34
          Originally posted by narain View Post
          What is the status of the tool as I do not see any post after 2010 in this thread at Seqanswers. Have you kept the tool updated and maintained ?
          Breakway should work as well now as it did the last time I updated it. As far as I know, it's bug free and works as described.

          Does your Breakway tool work for all variants of SAM/BAM files such as those processed by SAMtools and Piccard latest versions ?
          As far as I know, yes. I do not think samtools has had any of the functions required for Breakway's function deprecated or anything. That said, the Breakway Compendium (on the site) tells you which version of samtools is guaranteed to work with Breakway if the most recent one does not.

          What is the rough estimate of memory requirement and execution time for the software tool to detect SVs.
          Depends on the amount of data being processed at one time. For a single whole genome at reasonably high depth (30x), it typically takes on the order of a couple hours to run. It does not require a large amount of RAM.

          Also, what kind of structural variations do you find by your tool - insertion, deletion, duplication, tandem duplication, inversion, novel sequence insertion, CNVs, SNPs, etc. ?
          Breakway reports structural variation breakpoints, and then determines whether that breakpoint is at the boundary of an interchromosomal translocation, intrachromosomal insertion or intrachromosomal deletion. It also provides scores for how likely the event is to be a true positive event. It also includes it's own cross-referencing scripts for comparing to repeatmasker, segmental duplications and self chaining events. Finally, it very accurately estimates the precise base position of breakpoints.

          Did you make any publication of your tool ?
          No, not really. I published it in the U87MG genome sequencing paper. I also published it in my thesis (it is chapter 2) at UCLA. At this time, I do not intend to write a paper on the algorithm in its current incarnation. I feel the Breakway Compendium is adequate explanation of what it does and how to use it.

          I would be more than happy to pursue collaboration!
          Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
          Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
          Projects: U87MG whole genome sequence [Website] [Paper]

          Comment


          • #35
            Thanks MJ for the response. Could you site the paper which you mention of being published at sequencing journal ? If its on the internet can you provide the link to download. The compendium should be good for sure and I will look at it.

            In the meanwhile if you could look at Breakway tool and see if its still in stable position to work with or if it needs any update on README file or minor bug fixing that will be great. There is also this new tool called Piccard for SAM/BAM creation file which is becoming increasingly popular and you might also want to see that Breakway works on the BAM files generated by Piccard when you have time. I will keep you updated for my work and bother you again if I face any trouble using Breakway.

            Aby

            Comment


            • #36
              The paper is "U87MG Decoded", PLoS Genetics, Jan 2010

              As I said, Breakway is stable. It hasn't been updated in a while because it doesn't require any updating. I'm not sure what you mean by a Picard SAM/BAM creation tool, but Breakway is 100% compatible with SAM files that match the SAM spec, which is what Picard generates.
              Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
              Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
              Projects: U87MG whole genome sequence [Website] [Paper]

              Comment


              • #37
                Thanks Michael for your inputs. I will certainly look into the paper you mentioned. You might consider coming up with a more recent version of Breakway which not just finds SVs but also finds SNPs in the same execution run on the alignment BAM files.


                Aby

                Comment


                • #38
                  SNP detection is another type of analysis all together, really. I'm not a big believer in "reinventing the wheel", and I feel both GATK and samtools do a fantastic job at SNP detection.

                  My recommendation if you want to detect SNPs is either GATK or samtools (or both).

                  As for the future of Breakway, in its current form it is, as I said, complete. I have some thoughts of changing it to be completely self-reliant (no dependence on DNAA/samtools/etc) in the future, but it wouldn't change the fundamental way Breakway works (because its analytical approach is still unique and powerful, I think).
                  Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                  Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                  Projects: U87MG whole genome sequence [Website] [Paper]

                  Comment


                  • #39
                    Dear MJ

                    Thank you for pointing this out. I agree with you that GATK does a great job for SNP finding. I am not demanding for re-inventing the wheel, but I just proposed to incorporate the invented wheel into your vehicle , so that people don't have to drive two different vehicles to find SNP and then SVs. But anyways, it was just a suggestion to keep an eye on.

                    I will keep you updated once I have tried Breakway on my data. I read from the paper that you mentioned that for a variant to be determined, it has to be confirmed by at least 4 reads. Can this number be changed as an optional parameter in the tool as I just have 12x coverage with most of my sequenced genomes but some are above 18x. Please pardon me if this option is already specified in the compendium, I have not looked at it yet.

                    Aby

                    Comment


                    • #40
                      Actually, Breakway doesn't have a 4-read limit. It can report an event based on as little as one read if you tell it to.
                      Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                      Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                      Projects: U87MG whole genome sequence [Website] [Paper]

                      Comment


                      • #41
                        Hey MJ,

                        Does Breakway detect copy change, inversions, translocations, or rearrangements? Also does it report on the zygosity of the SVs it detects and is it able to jointly call samples?

                        -Mark

                        Comment


                        • #42
                          Hi Mark,

                          Breakway will call five types of SVs:
                          Deletions
                          Insertions
                          Interchromosomal Translocations
                          Intrachromosomal Translocations
                          Inversions

                          Copy changes are typically going to fall under one of these classes. Breakway does call Indels by looking at the copy state of the region between the ends of an SV call. So in that sense, it calls copy changes. However, it does not call CNVs (nor does any other SV algorithm I know of). Rather, CNVs are typically called with another algorithm or program and then cross-referenced with the SV list. You can do this with Breakway using any CNV program (like CNVnator).

                          As for zygosity, the copy number of indels is determined by comparison with the local sequence context. So yes, zygosity can be determined from it--e.g. a deletion in a diploid region is going to result in a homozygous region.

                          Breakway calls one sample at a time, but of course it can be run in parallel on multiple samples, etc. It does not leverage SV calls across samples (though that would be a fantastic addition to this or another program--I am not aware of a SV program that does this).

                          MJ
                          Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                          Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                          Projects: U87MG whole genome sequence [Website] [Paper]

                          Comment


                          • #43
                            Thanks for getting back to me so quickly and clarifying those points MJ. FYI PennCNV and GASV jointly call samples, it seems like a hard-to-find feature but they do exist!

                            -Mark

                            Comment


                            • #44
                              I don't think PennCNV is a SV algorithm, is it? I mean, CNVs are technically SVs, but it's a CNV program isn't it? Or does it call things like translocations and indels?

                              GASV I haven't used, so I can't comment beyond their paper. It does indeed work with multiple samples (in fact, that's a major benefit to it), but my understanding is that it doesn't empirically determine SV boundaries. I don't really know, though, so definitely look into it!

                              To the question of how to generally approach SV detection, my advice to you based on what we do here at Stanford is to use multiple algorithms, weight their scores based on their strengths and strategies, and combine the results.
                              Last edited by Michael.James.Clark; 12-12-2011, 02:02 PM. Reason: Added some clarification.
                              Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                              Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                              Projects: U87MG whole genome sequence [Website] [Paper]

                              Comment


                              • #45
                                I believe you are correct about PennCNV, I just briefly checked what it's input is (probe intensity data) and based on that I can't imagine it's able to actually call anything but copy number variation. My mistake.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                18 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                22 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                17 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X