Hello,
I have an interesting problem I am looking for some advice.
I have whole genome resequence PE illumina data and I am interested in doing a de novo assembly of three particular genes. Thus far, we have done assemblies with bwa against the genome sequence of a closely related species (4-5% divergence). However, the genes I am interested in are newly inserted in our species so are absent from our current assembly. These genes are also rapidly evolving and I expect a lot of structural rearrangements relative to genes sequences I have already. My plan had been this:
1. filter my reads for quality using the FASTX tool kit and build a blast database of the reads.
2. blast reads against a reference sequence of my genes to identify the subset of reads that map to this region (and their mates)
3. do a de novo assembly of those reads (we have used SOAPdenovo in our lab, other suggestions??)
However, simply building the blast database of the reads is taking more than 12 hours and I imagine the blast itself will be even slower. Is there a better way to pull down reads that map to my gene of interest? Should I just do a bwa alignment using my three genes as a reference instead of blast?
Thanks!
Sarah Kingan
I have an interesting problem I am looking for some advice.
I have whole genome resequence PE illumina data and I am interested in doing a de novo assembly of three particular genes. Thus far, we have done assemblies with bwa against the genome sequence of a closely related species (4-5% divergence). However, the genes I am interested in are newly inserted in our species so are absent from our current assembly. These genes are also rapidly evolving and I expect a lot of structural rearrangements relative to genes sequences I have already. My plan had been this:
1. filter my reads for quality using the FASTX tool kit and build a blast database of the reads.
2. blast reads against a reference sequence of my genes to identify the subset of reads that map to this region (and their mates)
3. do a de novo assembly of those reads (we have used SOAPdenovo in our lab, other suggestions??)
However, simply building the blast database of the reads is taking more than 12 hours and I imagine the blast itself will be even slower. Is there a better way to pull down reads that map to my gene of interest? Should I just do a bwa alignment using my three genes as a reference instead of blast?
Thanks!
Sarah Kingan
Comment