Seqanswers Leaderboard Ad

**nilshomer** · 03-30-2010, 11:51 AM

Originally posted by quinlana View Post

Hi all,

I recently released HYDRA, a new algorithm I've developed for detecting structural variation (SV) breakpoints using paired-end mapping. Similar to other algorithms, Hydra detects SV breakpoints by clustering discordant paired-end alignments whose "signatures" corroborate the same putative breakpoint. Hydra can detect breakpoints caused by all classes of structural variation. It was also designed to detect variation in both unique and duplicated genomic regions (e.g., mutations in segmental duplications and transposon insertions); therefore, it will examine paired-end reads having multiple discordant alignments.

Hydra does not align paired-end data itself. It relies upon very sensitive alignment in order to eliminate all concordant pairs prior to SV discovery. Accordingly, please look at the suggested workflow.

This algorithm is under constant development and evolves as the aligners and sequencing technologies mature, so please check the site for updates.

If interested, HYDRA was used in recent study published inGenome Research.

Google Code Archive - Long-term storage for Google Code Project Hosting.

http://code.google.com/p/hydra-sv/

Best,
Aaron

Are there plans for a paper? It would be very useful for the community to have a detailed explanation. Great work!

**quinlana** · 03-30-2010, 01:10 PM

Hi Nils,

The general method is covered in the Methods/SuppMeth section of the GR paper. That said, a proper paper is in the works but will probably be held up until some of the newer approaches I am working on are complete. I'd also like to add direct BAM support, as well as a split-read component.

The major bottleneck with the approach is sensitive alignment. If the alignments are crap the calls will be too. Perhaps not surprisingly (not surprising to you, given that you have apparently written a sensitive aligner), 99% of the time spent in the outlined workflow is on alignments screening for concordant pairs that are difficult to find owing to polymorphism and sequence error.

Take care,
Aaron

**bawee** · 05-10-2010, 09:00 PM

Hi Aaron,

In your documentation, the first step seems to be to determine the median insert size and M.A.D. How do you do this? Should I run a preliminary alignment with say, MAQ and find the summary from the .log file? Then again, this only gives the mean and S.D and not the median.

Cheers

Bryan

**Margarida** · 05-11-2010, 01:38 PM

Hi Aaron,

I am very impressed about how Hydra deals with finding CNVs in duplicate regions. You made some very cool novel observations in your Genome Research paper.

Can Hydra also work with Illumina's mate-pair reads? If not, would this be something you would try to implement in the future?

Thanks!

P.S. Huge fan of BEDTools...

**quinlana** · 05-11-2010, 04:25 PM

Originally posted by bawee View Post

Hi Aaron,

In your documentation, the first step seems to be to determine the median insert size and M.A.D. How do you do this? Should I run a preliminary alignment with say, MAQ and find the summary from the .log file? Then again, this only gives the mean and S.D and not the median.

Cheers

Bryan

Hi Bryan,
Yes, I gloss over this step though it is important. The main reason for the vagueness is that each lab seems to have different opinions about how this should be done. One should extract the insert sizes from a BAM file, export file or MAQ output and then use R, Matlab, Perl, whathaveyou, for calculating median and MAD. I use slightly different criteria for which pairs to include in this calculation, depending on the experiment. In my mind, the main point here is that standard deviation can often grossly overstate the variability in a library owing to large outliers (as stdev is the sqrt of the variability). MAD is more stable in this regard.

I hope this helps.
Aaron

**quinlana** · 05-11-2010, 04:34 PM

Originally posted by Margarida View Post

Hi Aaron,

I am very impressed about how Hydra deals with finding CNVs in duplicate regions. You made some very cool novel observations in your Genome Research paper.

Can Hydra also work with Illumina's mate-pair reads? If not, would this be something you would try to implement in the future?

Thanks!

P.S. Huge fan of BEDTools...

Hi Margarida,
Hydra makes no assumptions about the protocol used to generate your sequence data. Hydra requires that you exclude all concordant paired-ends or matepairs prior to execution. What should be input to Hydra are all of the non-duplicate discordant mappings: from these mappings Hydra will cluster similarly discordant mappings and report aberrant breakpoints. The onus then falls on the user to determine if a given breakpoint suggests a deletion (i.e., excessive distance with +/- for paired-end, -/+ for matepair), duplication, etc.

That said, currently the hard part with matepairs is alignment. Some aligners handle matepairs, some don't. The approach I have been using is to reverse complement the fastq files (reverse the qualities) prior to alignment. Upon doing so, all aligners are fair game as the reads now smell like paired-end. In the case of Illumina reads, the downside is that the lower quality bases are now the "seed" for BW aligners. To get around this, I use a tiered approach as described on the Hydra website. First align with BWA or Bowtie and assume that truly concordant pairs will be missed for the reason above. Then take all remaining discordant matepairs and align them with more sensitive aligners that examine all possible kmer seeds (e.g., Mosaik, Novoalign, BFAST, others).

I hope this helps.
Aaron

**Michael.James.Clark** · 05-12-2010, 06:26 AM

Originally posted by quinlana View Post

Hi Bryan,
Yes, I gloss over this step though it is important. The main reason for the vagueness is that each lab seems to have different opinions about how this should be done. One should extract the insert sizes from a BAM file, export file or MAQ output and then use R, Matlab, Perl, whathaveyou, for calculating median and MAD. I use slightly different criteria for which pairs to include in this calculation, depending on the experiment. In my mind, the main point here is that standard deviation can often grossly overstate the variability in a library owing to large outliers (as stdev is the sqrt of the variability). MAD is more stable in this regard.

I hope this helps.
Aaron

Do you think 99% threshold cut-offs (or whatever value you choose) rather than MAD would be even better?

That was my feeling, at least. Neither MAD nor stdev is going to do a particularly good job with non-normal skewed/bimodal distributions, which is what LMP SOLiD datasets, at least, usually look like (though MAD should do better than stdev). I'm not sure what your test sets have been with Hydra, but MAD was probably fine if the distribution of paired-end distances approached a normal distribution/wasn't heavily skewed or bimodal in those sets. However, in LMP data like that I've dealt with, I think you're really going hurt your accuracy using MAD (or stdev, for that matter). Food for thought.

**quinlana** · 08-20-2010, 10:03 AM

Hi all,

I just posted version 0.5.3 of Hydra (http://code.google.com/p/hydra-sv/). This release corrects a couple of bugs and provides an improved tool for removing duplicate mappings --- especially those arising from repetitive sequence such as segmental duplications and transposons. As a result, there are fewer false positive breakpoint calls.

I am now focusing on new features such as support for multiple sequence libraries/samples and split-read breakpoint detection.

Best,
Aaron

**tinacai** · 08-24-2010, 07:21 PM

Hi ,all
I have use hydra software
Tier 2 alignment. Grab the discordant alignments from the tier 1 BAM files and create FASTQ files for the discordant pairs. Align the tier 1 discordant pairs with a more sensitive aligner such as Novoalign or Mosaik,I use Mosaik software ,but when I build human genome index using MosaikBuild I can't run it ,so I want to see if it run the human genome ,who have the Novoalign software,if can send me a copy

**zee** · 08-25-2010, 02:29 AM

Hi Everybody,

Just a heads up that novoalign as of V2.07.00 now handles Illumina mate pairs so you wont be needing to reverse complement those FASTQ files. Get the latest version at www.novocraft.com

That said, currently the hard part with matepairs is alignment. Some aligners handle matepairs, some don't. The approach I have been using is to reverse complement the fastq files (reverse the qualities) prior to alignment. Upon doing so, all aligners are fair game as the reads now smell like paired-end. In the case of Illumina reads, the downside is that the lower quality bases are now the "seed" for BW aligners. To get around this, I use a tiered approach as described on the Hydra website. First align with BWA or Bowtie and assume that truly concordant pairs will be missed for the reason above. Then take all remaining discordant matepairs and align them with more sensitive aligners that examine all possible kmer seeds (e.g., Mosaik, Novoalign, BFAST, others).

**tinacai** · 08-25-2010, 03:02 AM

Hi ZEE,
I want to download novoalign software from www.novocraft.com ,but I can't open the website,so if you have a local version if can send me a copy ,thanks very much.My E-mail "[email protected]"

**tinacai** · 08-25-2010, 04:27 AM

Hi,
I use http://code.google.com/p/hydra-sv/wiki/TypicalWorkflow linkage,and run the hydra ,when I arrived at the last step which use BEDtools to annotation the structure variation that I have no idea.I really hope that somebody can help me ,I shoud use which one program in BEDtools to annotation my hydra result if I only want to fing the deletion and insertion

**quinlana** · 08-25-2010, 04:59 AM

Originally posted by tinacai View Post

Hi,
I use http://code.google.com/p/hydra-sv/wiki/TypicalWorkflow linkage,and run the hydra ,when I arrived at the last step which use BEDtools to annotation the structure variation that I have no idea.I really hope that somebody can help me ,I shoud use which one program in BEDtools to annotation my hydra result if I only want to fing the deletion and insertion

Unfortunately, there is a not a simple answer to this. The basics are as follows (assuming paired-end Illumina reads):

1. Intra-chromosomal Hydra breakpoints with +/- orientation and a size >> library size are typically deletions in the test genome or transposon insertions that occurred in the reference genome. One can discern the two by using the pairToBed program and asking for substantial overlap b/w the inner span of a breakpoint and existing transposon annotations.

2. Intra-chromosomal Hydra breakpoints with +/- orientation and a size << library size are typically insertions in the test genome or transposon insertions that occurred in the reference genome.

3. Intra-chromosomal Hydra breakpoints with -/- or +/+ orientation are typically inversion breakpoints in the test genome.

4. Inter-chromosomal Hydra breakpoints are typically either transposon insertions in the test genome, translocations in the test genome, chimeric molecules, or mapping artifacts. Transposon insertions can be identified with pairToBed while asking that one end of the breakpoint overlaps with an annotated and recent transposon. Reciprocal translocations should have a symmetric breakpoints between the two chromosomes involved. Chimeras are unavoidable but most folks require at least two supporting readpairs to eliminate them under the assumption that they occur randomly (yet they don't always). Mapping artifacts also happen, but this is why we suggest aligning with the most sensitive settings possible to avoid pairs that appear to be discordant solely because the aligner could not find the concordant alignment.

I hope this helps. I am planning to write a script this fall that will help with this process. The difficulty is that the structure of many SV loci are not easily described by the rules above (see the Hydra GR paper for examples). In short, many SV loci are incredibly complex.

Aaron

**tinacai** · 08-25-2010, 06:24 PM

Hi,Quinlana;
I have no idea about the last step yet,in your paper workflow,incluing use BWA NOVOALIGN ,HYDRA and BEDTOOLS ,in http://code.google.com/p/hydra-sv/wiki/TypicalWorkflow linkage,I think it is not a good guidelines

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 26 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 28 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

Hydra: SV discovery in unique and duplicated sequence

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News