SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Structural variations using Newbler mapper Soni Bioinformatics 2 12-06-2011 10:58 PM
finding structural variations without paired reads mike.t Bioinformatics 3 05-05-2011 11:23 AM
inGAP-sv: a new tool to identify and visualize structural variations biofqzhao Bioinformatics 0 02-20-2011 09:22 PM
Structural Variations sparks Bioinformatics 0 10-30-2008 03:16 PM

Reply
 
Thread Tools
Old 04-28-2010, 03:32 AM   #1
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default Breakway: Identify Structural Variations in Genomic Data

I would like to announce the release of Breakway, a program for identifying structural variations in genomic data!

http://breakway.sourceforge.net

Breakway is a suite of programs (written in PERL) that take aligned genomic data and report structural variation breakpoints. Features include:
  • Takes in BAM formatted input, the current standard for genomic alignments.
  • Compatible with standard output from major alignment algorithms such as BFAST, BWA, MAQ, et cetera.
  • Capable of analyzing data from any major platform--Solexa, SOLiD, 454, et cetera.
  • Empirically identifies structural variation breakpoints.
  • Highly specific analysis generates very few false positives.
  • Includes a suite of downstream tools for annotating identified breakpoints and reducing false positives.
  • Intuitive output tells you the type of event (INT, DEL, or INS), scores, inversion status, and more.

I've made Breakway so that it will be compatible with pipelines as well.There is the potential for Breakway to be plugged into your genome analysis pipeline to automatically generate a Breakway report.

Development of Breakway started during analysis of the U87MG whole genome sequence and continued to mature throughout analysis of subsequent genome sequencing projects in the Stanley F. Nelson Lab at UCLA. Since that first project, Breakway has become significantly more powerful, and I feel has evolved (through concerted effort!) into something that the community would benefit from.

I hope that Breakway can help others easily identify structural variation breakpoints in their genomic data. Please try it out!

Last edited by Michael.James.Clark; 04-28-2010 at 10:39 AM.
Michael.James.Clark is offline   Reply With Quote
Old 04-28-2010, 10:38 AM   #2
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

Note, the URL should be http://breakway.sourceforge.net
krobison is offline   Reply With Quote
Old 04-28-2010, 10:40 AM   #3
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

Oh, thanks! Serves me right for posting at 3:30AM right after I finished setting up the webpage.
Michael.James.Clark is offline   Reply With Quote
Old 04-28-2010, 01:11 PM   #4
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 279
Default

Nice looking application, do you think it can be used on mRNA-seq and exon capture datasets or just in whole genome sequencing?
Jon_Keats is offline   Reply With Quote
Old 04-28-2010, 08:31 PM   #5
townway
Member
 
Location: Rockville

Join Date: May 2009
Posts: 40
Default

I 'd like to try breakway, but before that it needs both bfast and DNAA in the path. when I install DNAA, I meet some problems. would you help me to fix it .
the error shows like this.

$ make
make all-recursive
make[1]: Entering directory `/gs1/users/tangwei/dnaa-0.1.1/dnaa-0.1.1'
Making all in dkbaseencoding
make[2]: Entering directory `/gs1/users/tangwei/dnaa-0.1.1/dnaa-0.1.1/dkbaseencoding'
if gcc -DHAVE_CONFIG_H -I. -I. -I.. -Wall -g -O2 -pthread -D_IOLIB=2 -D_FILE_OFFSET_BITS=64 -m64 -MT RGIndex.o -MD -MP -MF ".deps/RGIndex.Tpo" -c -o RGIndex.o `test -f '../bfast/bfast/RGIndex.c' || echo './'`../bfast/bfast/RGIndex.c; \
then mv -f ".deps/RGIndex.Tpo" ".deps/RGIndex.Po"; else rm -f ".deps/RGIndex.Tpo"; exit 1; fi
../bfast/bfast/RGIndex.c:20:26: error: RGIndexExons.h: No such file or directory
make[2]: *** [RGIndex.o] Error 1
make[2]: Leaving directory `/gs1/users/tangwei/dnaa-0.1.1/dnaa-0.1.1/dkbaseencoding'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/gs1/users/tangwei/dnaa-0.1.1/dnaa-0.1.1'
make: *** [all] Error 2
townway is offline   Reply With Quote
Old 04-28-2010, 09:51 PM   #6
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by townway View Post
I 'd like to try breakway, but before that it needs both bfast and DNAA in the path. when I install DNAA, I meet some problems. would you help me to fix it .
the error shows like this.

$ make
make all-recursive
make[1]: Entering directory `/gs1/users/tangwei/dnaa-0.1.1/dnaa-0.1.1'
Making all in dkbaseencoding
make[2]: Entering directory `/gs1/users/tangwei/dnaa-0.1.1/dnaa-0.1.1/dkbaseencoding'
if gcc -DHAVE_CONFIG_H -I. -I. -I.. -Wall -g -O2 -pthread -D_IOLIB=2 -D_FILE_OFFSET_BITS=64 -m64 -MT RGIndex.o -MD -MP -MF ".deps/RGIndex.Tpo" -c -o RGIndex.o `test -f '../bfast/bfast/RGIndex.c' || echo './'`../bfast/bfast/RGIndex.c; \
then mv -f ".deps/RGIndex.Tpo" ".deps/RGIndex.Po"; else rm -f ".deps/RGIndex.Tpo"; exit 1; fi
../bfast/bfast/RGIndex.c:20:26: error: RGIndexExons.h: No such file or directory
make[2]: *** [RGIndex.o] Error 1
make[2]: Leaving directory `/gs1/users/tangwei/dnaa-0.1.1/dnaa-0.1.1/dkbaseencoding'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/gs1/users/tangwei/dnaa-0.1.1/dnaa-0.1.1'
make: *** [all] Error 2
That's my fault. I was in the middle of trying to put a tarball up for distribution and it was not being created correctly. Please try again.

Nils

Last edited by nilshomer; 04-28-2010 at 09:51 PM. Reason: esl
nilshomer is offline   Reply With Quote
Old 04-28-2010, 09:59 PM   #7
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

Quote:
Originally Posted by Jon_Keats View Post
Nice looking application, do you think it can be used on mRNA-seq and exon capture datasets or just in whole genome sequencing?
While I haven't tested it on such datasets, it ought to work on them. The key will be in the reference genome used.

Breakway functions by looking for clusters of aberrantly spaced paired reads, so the key is to have an appropriate reference genome for it to compare to.

For exon capture, it should work with the normal reference genome just as well as it will with whole genomes.

For RNAseq, and I'm not an expert so I welcome other suggestions, the transcriptome will probably be best used as the reference genome.
Michael.James.Clark is offline   Reply With Quote
Old 04-28-2010, 10:04 PM   #8
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

Quote:
Originally Posted by nilshomer View Post
That's my fault. I was in the middle of trying to put a tarball up for distribution and it was not being created correctly. Please try again.

Nils
Thanks, Nils.

townway, I just successfully installed DNAA from the current tarball on Sourceforge without any problems following the directions in the INSTALL file, so just try again and hopefully it'll work for you.
Michael.James.Clark is offline   Reply With Quote
Old 04-29-2010, 06:38 PM   #9
orcy
Junior Member
 
Location: Brisbane

Join Date: Jan 2010
Posts: 8
Default

Does this work with mate pair data generated by the SOLiD platform. All I see in the manual are references to Paired End data, and the DNAA manual seems a little sparse on handling mate pair data too.

cheers
orcy is offline   Reply With Quote
Old 04-29-2010, 06:44 PM   #10
orcy
Junior Member
 
Location: Brisbane

Join Date: Jan 2010
Posts: 8
Default

sorry, i'm the idiot. just found it after a closer look
orcy is offline   Reply With Quote
Old 04-29-2010, 08:10 PM   #11
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

Breakway should work with any paired data--paired-end or mate pair or even split long reads.
Michael.James.Clark is offline   Reply With Quote
Old 05-05-2010, 12:24 PM   #12
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

Hi all,
A significant bug fix was just implemented such that breakway.run.pl will now function properly. Please update to Breakway 0.5.1!

Please let me know if you find any more!
MJ
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]

Last edited by Michael.James.Clark; 05-05-2010 at 12:27 PM.
Michael.James.Clark is offline   Reply With Quote
Old 05-10-2010, 11:22 PM   #13
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

A request was made for a filtering script that allows one to use another Breakway file in order to cross-check the Breakway file being created for events present in both. This is particularly useful for two things (and maybe more):

1) Comparing a tumor with its germline genome when both have been aligned to the same reference. This is useful because the germline will often contain variants from the reference, as the reference is unrelated. The expectation is that since the tumor is derived from the germline, we expect the tumor to contain these unless there is a mutation. It should allow one to identify tumor-specific mutations.

2) Removing native events that are detected in the reference from the genome in question. This is because some structural events can be detected in the reference (for example, segmental duplications) and therefore may be worth marking in the sequenced genome.

If you want to use this function, please go to the Breakway website and download Breakway 0.6. The script is in the scripts folder and is called "breakway.bwfilter.pl". You can also use the usual breakway.run.pl with the --bwfile option and it will work.
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]

Last edited by Michael.James.Clark; 05-10-2010 at 11:24 PM.
Michael.James.Clark is offline   Reply With Quote
Old 06-24-2010, 06:57 PM   #14
orcy
Junior Member
 
Location: Brisbane

Join Date: Jan 2010
Posts: 8
Default

I'm getting errors on BAM files that have string flag fields instead of numerical flag fields.


ie, a read starting with this

Code:
1155_400_505   pP1     chr1    2571    255     8M6D17M =
results in a "problem processing reads. See reads file:" error

is this a known bug, or somthing that can be worked around.

cheers
orcy is offline   Reply With Quote
Old 06-24-2010, 07:05 PM   #15
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by orcy View Post
I'm getting errors on BAM files that have string flag fields instead of numerical flag fields.


ie, a read starting with this

Code:
1155_400_505   pP1     chr1    2571    255     8M6D17M =
results in a "problem processing reads. See reads file:" error

is this a known bug, or somthing that can be worked around.

cheers
The string flag field is not up to the SAM spec, and is only meant for viewing. Try using the numerical flag field since your usage is currently non-standard.
nilshomer is offline   Reply With Quote
Old 06-25-2010, 12:48 AM   #16
orcy
Junior Member
 
Location: Brisbane

Join Date: Jan 2010
Posts: 8
Default

OK. I think that problem was simply the viewer telling me that read was a problem. I've got past that, but now get a

[main_samview] fail to get the reference name. Continue anyway.

error, and nothing in the output.

Does anyone know what that means? It happens during the sharpenedges part of the script.

cheers
orcy is offline   Reply With Quote
Old 07-21-2010, 06:28 AM   #17
megnetz
Junior Member
 
Location: sweden

Join Date: Jul 2010
Posts: 4
Default

Hello!

I'm trying to make structural variation calls from 1000 genomes data. I thought I might try breakway but ran into problems :/. When calculating PED values with the dnaa script dbampairedenddist I need to specify a certain range based on predicted PED from library generation. As far as I know the 1000 genomes bam-files take input from several different raw read files so how can I know which range to choose? Or does this make 1000 genomes data incompatible with breakway SV detection?

Thank you very much!
megnetz is offline   Reply With Quote
Old 07-21-2010, 11:57 AM   #18
Lee Sam
Member
 
Location: Ann Arbor, MI

Join Date: Oct 2008
Posts: 57
Default

Quote:
Originally Posted by Michael.James.Clark View Post
While I haven't tested it on such datasets, it ought to work on them. The key will be in the reference genome used.

Breakway functions by looking for clusters of aberrantly spaced paired reads, so the key is to have an appropriate reference genome for it to compare to.

For exon capture, it should work with the normal reference genome just as well as it will with whole genomes.

For RNAseq, and I'm not an expert so I welcome other suggestions, the transcriptome will probably be best used as the reference genome.
I'm very interested in using this with RNA-Seq. I figure aligning against transcriptome is an issue because it limits the size of the indel that you can have (e.g. no 2-transcript mappings, where one end maps to one transcript and the other maps to a completely different transcript).
Lee Sam is offline   Reply With Quote
Old 07-22-2010, 03:48 PM   #19
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

Quote:
Originally Posted by orcy View Post
OK. I think that problem was simply the viewer telling me that read was a problem. I've got past that, but now get a

[main_samview] fail to get the reference name. Continue anyway.

error, and nothing in the output.

Does anyone know what that means? It happens during the sharpenedges part of the script.

cheers
Sorry for the late reply on this.

Sharpenedges uses samtools as part of its activity, and this is a samtools error.

Make sure that you've properly indexed the BAM file, and that the file is in BAM format.

If you still have a problem, please run samtools view and post an example read here for me to look at.
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]
Michael.James.Clark is offline   Reply With Quote
Old 07-22-2010, 03:53 PM   #20
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

Quote:
Originally Posted by megnetz View Post
Hello!

I'm trying to make structural variation calls from 1000 genomes data. I thought I might try breakway but ran into problems :/. When calculating PED values with the dnaa script dbampairedenddist I need to specify a certain range based on predicted PED from library generation. As far as I know the 1000 genomes bam-files take input from several different raw read files so how can I know which range to choose? Or does this make 1000 genomes data incompatible with breakway SV detection?

Thank you very much!
Breakway works on a library-by-library basis. One can combine libraries with very similar PEDs in a single analysis and it will still function.

If you have libraries with very different PEDs, it will have difficulty working correctly. You can isolate reads with very different PEDs from each other and run it independently on each one, then combine the results, though. This is what I have done.

I'm not very familiar with 1000 genomes data, but if they use the read group flag in their BAM files with the library field clarifying which library specific RGs are sourced from, you can use that to isolate the reads.

Sorry I can't be more help--Breakway was designed to function optimally on a sample-by-sample basis, not on a batch of samples.
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]
Michael.James.Clark is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:32 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO