Seqanswers Leaderboard Ad

**Jeremy** · 08-06-2012, 11:03 PM

Is that a typo or is the genome really only about 28 mb? A genome that small is considered pretty easy as far as assembling goes.
With that much data you should have several hundred fold coverage from each one of the data sets listed. I would leave the RNA-seq data out of the genome assembly.
Anyway, this thread should help you.

**Wallysb01** · 08-06-2012, 11:41 PM

First off, that thread Jeremy linked too is very good. I've found some guidance in that same place.

Personally, my first strategy would probably be to leave the 454 alone for now, you have plenty of illumina coverage. So, first assemble the illumina data with something like ABySS or SOAP, including doing scaffolding. Then, throw the 454 data in to fill gaps (BASE clear has a stand alone that I believe takes 454).

If that doesn't work out as you need, which I doubt, you could assembly only contigs from both the illumina and 454 separately, merge them with something like CAP3. Then scaffold and gap fill again using stand alone programs.

Alternatively you could give all types of data to Ray and assemble them together. Ideally, you'd do all three methods and compare what you get. Don't just trust simple stats like N50 or NG50. I'd suggest aligning your genome assemblies to what ever is the most closely related species with a high quality genome and visualizing it some how. BWA-SW could help you with this, as could something like lastz or MUMmer. With a genome that small you should be able to get a decent sense of how the assembly is going by just scrolling along the alignments in IGV and checking for any sort of funny business (yes, that's the technical term).

Ignore the RNA seq data until you have a genome that you like, then align the reads to that genome to aid in the annotation process. You could also de novo assembly the RNA into transcripts and align to the genome, or do both. Maker is a nice program to guiding your though annotating your genome. Incorporating RNA-seq into a genome assembly could prove useful one day, but its pretty difficult to do now. Though, the RNA-seq alignments and/or de novo assembled transcript alignments will also help you in determining the quality of your assemblies. Ie. gaps or misassembles in the genome will interfere with transcript and raw read alignments, which you can also visualize in IGV. So you may want to carry through this far with all major versions of your assembly to see which ones contain the most complete genes.

Good luck!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 44 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 43 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 38 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Genome Assembly

Comment

Comment

Latest Articles

ad_right_rmr

News