SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Structural variations using Newbler mapper Soni Bioinformatics 2 12-06-2011 09:58 PM
finding structural variations without paired reads mike.t Bioinformatics 3 05-05-2011 10:23 AM
inGAP-sv: a new tool to identify and visualize structural variations biofqzhao Bioinformatics 0 02-20-2011 08:22 PM
Structural Variations sparks Bioinformatics 0 10-30-2008 02:16 PM

Reply
 
Thread Tools
Old 07-22-2010, 02:56 PM   #21
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

Quote:
Originally Posted by Lee Sam View Post
I'm very interested in using this with RNA-Seq. I figure aligning against transcriptome is an issue because it limits the size of the indel that you can have (e.g. no 2-transcript mappings, where one end maps to one transcript and the other maps to a completely different transcript).
True, it would be blind to fusion transcripts if you were to use transcriptome.

An alternative might be using all possible fusions as a reference.

I believe Tophat/Cufflink are very popular for this type of analysis, so you may want to take a look at them!
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]
Michael.James.Clark is offline   Reply With Quote
Old 07-22-2010, 11:40 PM   #22
megnetz
Junior Member
 
Location: sweden

Join Date: Jul 2010
Posts: 4
Default

I'll try that, thanks!
megnetz is offline   Reply With Quote
Old 09-09-2010, 02:56 PM   #23
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 279
Default

Hi Michael,

Shouldn't Breakway.ReadCluster.pl find both clusters of reads implicating insertions or deletions exceeding the floor-pe-length and ceiling-pe-length and translocations? In a quick test of some Illumina mate-pair data you only see the intra-chromosomal events but not the inter-chromosomal events event though a quick parsing of the dtranslocations table clearly identifies positive control events that should meet the -mincs and -maxcs options used.
Jon_Keats is offline   Reply With Quote
Old 09-19-2010, 10:14 AM   #24
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

Quote:
Originally Posted by Jon_Keats View Post
Hi Michael,

Shouldn't Breakway.ReadCluster.pl find both clusters of reads implicating insertions or deletions exceeding the floor-pe-length and ceiling-pe-length and translocations? In a quick test of some Illumina mate-pair data you only see the intra-chromosomal events but not the inter-chromosomal events event though a quick parsing of the dtranslocations table clearly identifies positive control events that should meet the -mincs and -maxcs options used.
Hi Jon,

Sorry for the late reply, I've been otherwise occupied, but I hope I can help solve this with you.

I'm a little bit unclear on what you're seeing. Are you observing that an event that should pass your parameters is not being reported by Breakway? If so, would it be possible to provide the library design (insert size, read length, sequence depth, etc.), parameters you used in dtranslocations and Breakway and the segment of the dtranslocations file in question?

Usually if this type of thing happens, I find it's due to the dtranslocations spot being sporadic to the point that the event doesn't meet the minimum requirements for Breakway. These minimums are determined by mincs/maxcs, so you can decrease mincs and increase maxcs and often they will then come through.
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]
Michael.James.Clark is offline   Reply With Quote
Old 10-28-2010, 06:55 AM   #25
shu
Junior Member
 
Location: India

Join Date: Jan 2010
Posts: 6
Default

Dear Michael,

We are trying to instal BreakAway. Did successfully install BFast, SAMTools in the root as suggested but are having issues during installation of DNAA. During ./configure, it shows fatal: Not a git repository and when we make it it gives the error;

make all-recursive
make[1]: Entering directory `/storage/Software/dnaa-0.1.2'
Making all in dkbaseencoding
make[2]: Entering directory `/storage/Software/dnaa-0.1.2/dkbaseencoding'
make[2]: *** No rule to make target `all'. Stop.
make[2]: Leaving directory `/storage/Software/dnaa-0.1.2/dkbaseencoding'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/storage/Software/dnaa-0.1.2'
make: *** [all] Error 2

We are using 64bit Debian.

Could you pl help?
shu is offline   Reply With Quote
Old 10-28-2010, 03:02 PM   #26
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

Hm, not sure what's going on. I'm not the author of DNAA, I'm afraid, but I have gotten it to install successfully myself.

I assume you got the tar.gz from here:
http://sourceforge.net/projects/dnaa/files/
Then obviously followed the INSTALL.
If you got it through git, maybe that is a problem and you should try making it from the tarball.

A search on google for the error "fatal: Not a git repository" has a number of hits that you might want to look at.

Just to let you know, I just successfully installed DNAA from scratch on my Mac Pro here.
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]

Last edited by Michael.James.Clark; 10-29-2010 at 12:59 PM.
Michael.James.Clark is offline   Reply With Quote
Old 10-29-2010, 11:45 AM   #27
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

The most common mistake I find people making is forgetting to index their BAM file. Always index your BAM file! Breakway will look in the same folder as the BAM file for a file with the same exact name with the ".bai" appended to the end, which is the standard output from the samtools index program.
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]
Michael.James.Clark is offline   Reply With Quote
Old 11-09-2010, 04:45 PM   #28
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

Hi all,

Breakway has been updated to version 0.7.

In this update:

-The breakway.parameters.pl script has been improved. It no longer requires the dbampairedenddist program from DNAA to run. Now BAM files can be directly passed to breakway.parameters.pl along with insert size range and the program will report mean, standard deviation and 95% bounds of the entire BAM file. See The Breakway Compendium at breakway.sf.net for usage.

-A bug in breakway.sharpenedges.pl has been fixed. Though it was supposed to default the --score parameter to zero, it was actually undefined, so if one ran the program with this optional parameter, it would crash. Now the script can be run with --score default parameter successfully.
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]
Michael.James.Clark is offline   Reply With Quote
Old 04-01-2011, 10:58 AM   #29
unagaswamy
Member
 
Location: Texas

Join Date: May 2010
Posts: 13
Default

Hi,
We have the same problem, Breakwasy chokes at :
Quote:
samtools view -X sample.bam chr1:56-230|egrep "pPUr[0-9]d"| head -5
286_89_1940 pPUr1d chr1 97 16 50M ...
since the string
Quote:
pPUr1d
is not captured in its entirity by line in load_alignments function
Quote:
if($line =~ m/^(\S+)\s+([pPrRuU12]*)\s+(\S+)\s+(\d+)\s+\d+\s+\S+\s+\S+\s+(\d+)\s+-?([0-9]+)\s+(\w+)/)
in the breakway.sharpenedges.pl

Is there a particular reason for accepting only srings of type "pPrR1" ?
unagaswamy is offline   Reply With Quote
Old 04-01-2011, 11:29 AM   #30
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

Thanks for pointing that out! I honestly was at a loss for what this bug was as I hadn't seen the "d" before.

Can I ask what version of Samtools you've been using? I have only tested it against an old version that Breakway was designed to work with (v0.1.6 (r453) as stated in the Breakway script headers).

This quick fix should work. You can change that line to the following:

Code:
if($line =~ m/^(\S+)\s+(.*[pPrRuU12]*.*)\s+(\S+)\s+(\d+)\s+\d+\s+\S+\s+\S+\s+(\d+)\s+-?([0-9]+)\s+(\w+)/)
That way, it should be robust against anything else in the flag field that might get added subsequently.

I have uploaded the program with that bug fix to the Breakway site, so alternatively you can just download and extract it (the only difference is that line!).

http://breakway.sf.net
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]
Michael.James.Clark is offline   Reply With Quote
Old 04-01-2011, 04:12 PM   #31
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

Oh, I see now why I never had this problem. I always remove duplicate paired reads before SV detection (which is what the "d" means).

I can't recommend keeping duplicates in the files for SV analysis. Part of the robustness of SV detection is based on accurately counting the number of unique paired reads across a SV breakpoint. If you're leaving in duplicates, those numbers will be off and you'll potentially end up with additional false positives. A long mate-pair library should not have a large number of paired duplicates anyway unless the library was unfortunately low in complexity.
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]
Michael.James.Clark is offline   Reply With Quote
Old 04-04-2011, 11:38 AM   #32
unagaswamy
Member
 
Location: Texas

Join Date: May 2010
Posts: 13
Default

Hi Michael,
Thanks much for your reply! Makes sense!
Uma
unagaswamy is offline   Reply With Quote
Old 10-15-2011, 10:20 AM   #33
narain
Member
 
Location: Washington DC

Join Date: Aug 2011
Posts: 78
Post Collaboration ??

Hi MJ

I am working on a megasize project of the order of 1000 Genomes Project, and was wondering if I could collaborate with you and your tool Breakway. What is the status of the tool as I do not see any post after 2010 in this thread at Seqanswers. Have you kept the tool updated and maintained ? Does your Breakway tool work for all variants of SAM/BAM files such as those processed by SAMtools and Piccard latest versions ? What is the rough estimate of memory requirement and execution time for the software tool to detect SVs. Also, what kind of structural variations do you find by your tool - insertion, deletion, duplication, tandem duplication, inversion, novel sequence insertion, CNVs, SNPs, etc. ? Did you make any publication of your tool ?


Aby
narain is offline   Reply With Quote
Old 10-15-2011, 06:07 PM   #34
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

Quote:
Originally Posted by narain View Post
What is the status of the tool as I do not see any post after 2010 in this thread at Seqanswers. Have you kept the tool updated and maintained ?
Breakway should work as well now as it did the last time I updated it. As far as I know, it's bug free and works as described.

Quote:
Does your Breakway tool work for all variants of SAM/BAM files such as those processed by SAMtools and Piccard latest versions ?
As far as I know, yes. I do not think samtools has had any of the functions required for Breakway's function deprecated or anything. That said, the Breakway Compendium (on the site) tells you which version of samtools is guaranteed to work with Breakway if the most recent one does not.

Quote:
What is the rough estimate of memory requirement and execution time for the software tool to detect SVs.
Depends on the amount of data being processed at one time. For a single whole genome at reasonably high depth (30x), it typically takes on the order of a couple hours to run. It does not require a large amount of RAM.

Quote:
Also, what kind of structural variations do you find by your tool - insertion, deletion, duplication, tandem duplication, inversion, novel sequence insertion, CNVs, SNPs, etc. ?
Breakway reports structural variation breakpoints, and then determines whether that breakpoint is at the boundary of an interchromosomal translocation, intrachromosomal insertion or intrachromosomal deletion. It also provides scores for how likely the event is to be a true positive event. It also includes it's own cross-referencing scripts for comparing to repeatmasker, segmental duplications and self chaining events. Finally, it very accurately estimates the precise base position of breakpoints.

Quote:
Did you make any publication of your tool ?
No, not really. I published it in the U87MG genome sequencing paper. I also published it in my thesis (it is chapter 2) at UCLA. At this time, I do not intend to write a paper on the algorithm in its current incarnation. I feel the Breakway Compendium is adequate explanation of what it does and how to use it.

I would be more than happy to pursue collaboration!
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]
Michael.James.Clark is offline   Reply With Quote
Old 10-16-2011, 02:19 AM   #35
narain
Member
 
Location: Washington DC

Join Date: Aug 2011
Posts: 78
Smile

Thanks MJ for the response. Could you site the paper which you mention of being published at sequencing journal ? If its on the internet can you provide the link to download. The compendium should be good for sure and I will look at it.

In the meanwhile if you could look at Breakway tool and see if its still in stable position to work with or if it needs any update on README file or minor bug fixing that will be great. There is also this new tool called Piccard for SAM/BAM creation file which is becoming increasingly popular and you might also want to see that Breakway works on the BAM files generated by Piccard when you have time. I will keep you updated for my work and bother you again if I face any trouble using Breakway.

Aby
narain is offline   Reply With Quote
Old 10-16-2011, 03:09 AM   #36
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

The paper is "U87MG Decoded", PLoS Genetics, Jan 2010

As I said, Breakway is stable. It hasn't been updated in a while because it doesn't require any updating. I'm not sure what you mean by a Picard SAM/BAM creation tool, but Breakway is 100% compatible with SAM files that match the SAM spec, which is what Picard generates.
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]
Michael.James.Clark is offline   Reply With Quote
Old 10-16-2011, 06:32 AM   #37
narain
Member
 
Location: Washington DC

Join Date: Aug 2011
Posts: 78
Default

Thanks Michael for your inputs. I will certainly look into the paper you mentioned. You might consider coming up with a more recent version of Breakway which not just finds SVs but also finds SNPs in the same execution run on the alignment BAM files.


Aby
narain is offline   Reply With Quote
Old 10-16-2011, 11:07 AM   #38
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

SNP detection is another type of analysis all together, really. I'm not a big believer in "reinventing the wheel", and I feel both GATK and samtools do a fantastic job at SNP detection.

My recommendation if you want to detect SNPs is either GATK or samtools (or both).

As for the future of Breakway, in its current form it is, as I said, complete. I have some thoughts of changing it to be completely self-reliant (no dependence on DNAA/samtools/etc) in the future, but it wouldn't change the fundamental way Breakway works (because its analytical approach is still unique and powerful, I think).
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]
Michael.James.Clark is offline   Reply With Quote
Old 10-16-2011, 11:25 AM   #39
narain
Member
 
Location: Washington DC

Join Date: Aug 2011
Posts: 78
Default

Dear MJ

Thank you for pointing this out. I agree with you that GATK does a great job for SNP finding. I am not demanding for re-inventing the wheel, but I just proposed to incorporate the invented wheel into your vehicle , so that people don't have to drive two different vehicles to find SNP and then SVs. But anyways, it was just a suggestion to keep an eye on.

I will keep you updated once I have tried Breakway on my data. I read from the paper that you mentioned that for a variant to be determined, it has to be confirmed by at least 4 reads. Can this number be changed as an optional parameter in the tool as I just have 12x coverage with most of my sequenced genomes but some are above 18x. Please pardon me if this option is already specified in the compendium, I have not looked at it yet.

Aby
narain is offline   Reply With Quote
Old 10-17-2011, 02:47 PM   #40
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

Actually, Breakway doesn't have a 4-read limit. It can report an event based on as little as one read if you tell it to.
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]
Michael.James.Clark is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:45 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO