![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Targeted Genome Assembly for region poorly represented in reference genome? | gumbos | Bioinformatics | 1 | 01-09-2012 05:01 PM |
De novo assembly | mihir.karnik | General | 1 | 09-07-2011 02:49 PM |
de novo assembly vs. reference assembly | fadista | General | 3 | 02-16-2011 12:11 AM |
De novo assembly strategy | Wiseone | De novo discovery | 0 | 11-18-2010 09:30 AM |
de novo 454 assembly | strob | Bioinformatics | 8 | 01-21-2009 11:26 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: San Mateo, CA Join Date: Feb 2010
Posts: 17
|
![]()
Hello,
I have an interesting problem I am looking for some advice. I have whole genome resequence PE illumina data and I am interested in doing a de novo assembly of three particular genes. Thus far, we have done assemblies with bwa against the genome sequence of a closely related species (4-5% divergence). However, the genes I am interested in are newly inserted in our species so are absent from our current assembly. These genes are also rapidly evolving and I expect a lot of structural rearrangements relative to genes sequences I have already. My plan had been this: 1. filter my reads for quality using the FASTX tool kit and build a blast database of the reads. 2. blast reads against a reference sequence of my genes to identify the subset of reads that map to this region (and their mates) 3. do a de novo assembly of those reads (we have used SOAPdenovo in our lab, other suggestions??) However, simply building the blast database of the reads is taking more than 12 hours and I imagine the blast itself will be even slower. Is there a better way to pull down reads that map to my gene of interest? Should I just do a bwa alignment using my three genes as a reference instead of blast? Thanks! Sarah Kingan |
![]() |
![]() |
![]() |
#2 |
Member
Location: Heidelberg Join Date: Feb 2011
Posts: 69
|
![]()
so you have references of your genes you are looking for and what %-identity you expect in the sequence? Blasting all your reads against your reference genes seems not to be the smartest way. ;-) using bwa or vmatch might be a lot faster but of course your results depends on your sequence identity.
|
![]() |
![]() |
![]() |
#3 |
Member
Location: San Mateo, CA Join Date: Feb 2010
Posts: 17
|
![]()
Hi Thorondor,
The % identity should be very high, <3% divergence for the orthologous sequences. The problem is that there are many repeat elements in and around the genes so the structure is not conserved. Right now I am pulling the reads that align to the flanking sequence in my bwa alignment and will do a deNovo assembly of those reads using mira. Mira claims to be good at assembling repetitive sequence and difficult to align regions. I may then do another iteration. Using bwa, I will map the previously unmapped reads to the contig(s) I built with mira. Then pull the singletons whose mates mapped and do another devovo assembly with mira. Sarah |
![]() |
![]() |
![]() |
#4 | |
Member
Location: Denmark Join Date: Aug 2010
Posts: 26
|
![]() Quote:
E.g. using only the reads in vicinity to where you expect your gene to be. We normally use CLC as it is extremely fast and memory efficient (and expensive..). However most assemblers should be able to handle the repeats if it is just locally. In my experience the problem is when you have the same repeat regions in multiple area's of the genome and that is solved by doing the local assembly. rgds Mads Last edited by MadsAlbertsen; 05-02-2011 at 09:11 AM. Reason: Clarifying.. |
|
![]() |
![]() |
![]() |
#5 |
Member
Location: Heidelberg Join Date: Feb 2011
Posts: 69
|
![]()
well if your genes of interested are not well covered you might also take a look at LOCAS for your assembly:
http://ab.inf.uni-tuebingen.de/software/locas/ |
![]() |
![]() |
![]() |
#6 | |
Junior Member
Location: melbourne Join Date: Aug 2010
Posts: 2
|
![]() Quote:
Sue Last edited by shiva; 06-20-2011 at 09:37 PM. |
|
![]() |
![]() |
![]() |
#7 |
Junior Member
Location: rennes, france Join Date: Nov 2008
Posts: 7
|
![]()
Dear all,
I'd like to mention a tool called mapsembler. It takes some sequence fragments and a set of (illumina) reads. It tries to reconstruct each sequence fragment using the reads (authorizing some substitutions) and for each sequence it reconstructed it extends it left and right by targetted assembly. The output may be either a fasta file (contig containing the sequence) or a graph that shows indels, SNPS, or more complex events like gene fusion, exon skipping... The tool and documentation are accessible here: http://alcovna.genouest.org/mapsembler/ Any comment / feedback welcome. Pierre |
![]() |
![]() |
![]() |
#8 |
Junior Member
Location: texas Join Date: Feb 2011
Posts: 5
|
![]()
Pierre
mapsembler sounds like it may work for one of my projects. have you used it before? Can i import the output into a viewer so that I can see how it attempted to assemble the sequences around the 'starter'? thanks |
![]() |
![]() |
![]() |
#9 |
Junior Member
Location: rennes, france Join Date: Nov 2008
Posts: 7
|
![]() |
![]() |
![]() |
![]() |
Tags |
assembly, blast, bwa, denovo, illumina |
Thread Tools | |
|
|