Improving Illumina assembly with 454 reads?

Linnea

Member

Join Date: Mar 2010

Posts: 23
- Share
- Tweet
#1

Improving Illumina assembly with 454 reads?

10-19-2010, 06:05 AM

Hi all,

I'm working on a large genome assembly (~1Gbp) with Illumina paired-end reads, and currently I'm down to ~90 000 scaffolds (N50=26kb). Now I've got some additional 454 data (single end), and would like to use that for improving my assembly. I've heard about people assembling the two sets separately, and then try to merge them into one, and also people trying to do one big assembly with all reads.
I would instead like to map the 454 reads onto my Illumina assembly, and
see if I can get rid of NNNs in the scaffolds, or even link some scaffolds to each other. I tried the Roche GSReferenceMapper, and most reads mapped fully within scaffolds, but some are marked as "Chimeric". It seems like these reads map to more then one scaffold - possibly exactly what I'm looking for! But there seems to be no way to get the information on what scaffolds they map to (and to what positions) - I guess the software discards them as wrongly mapped?
Does anyone more familiar to this software know if this information can be retrieved? Or is there a better software for this purpose?

Any input would be appreciated!
Tags: None
colindaven

Senior Member

Join Date: Oct 2008

Posts: 417
- Share
- Tweet
#2

10-20-2010, 05:02 AM

I haven't used it extensively myself but Mira can apparently deal with 454 and Illumina reads at once.

Wouldn't it be better to use the 454 first and use the Illumina to get the coverage depth up ? According to data I've seen (bacterial genomes) Illumina PE doesn't actually contribute that much to improving a 454 assembly - provided you have decent 454 coverage.
Comment
Linnea

Member

Join Date: Mar 2010

Posts: 23
- Share
- Tweet
#3

10-20-2010, 05:12 AM

Originally posted by colindaven View Post

I haven't used it extensively myself but Mira can apparently deal with 454 and Illumina reads at once.

Wouldn't it be better to use the 454 first and use the Illumina to get the coverage depth up ? According to data I've seen (bacterial genomes) Illumina PE doesn't actually contribute that much to improving a 454 assembly - provided you have decent 454 coverage.

The problem is that I don't have a decent 454 coverage...
The Illumina coverage is ~30X, while for the 454 I only have 1X (it's only a test run, and no more runs are planned.. When I assembled this data separately with newbler, I covered about one third of the genome). Hence the Illumina data must be the foundation of the assembly, and the 454 can only be used for improving. Unfortunately MIRA seems to be insufficient for this huge amount of data.
Comment
colindaven

Senior Member

Join Date: Oct 2008

Posts: 417
- Share
- Tweet
#4

10-21-2010, 12:03 AM

Ok, this sounds tricky. I don't know how much information you'll gain, but you could try
Mummer and Bambus, a scaffolder. Apparently Bambus can use output directly from Mummer, and Mummer is a good and fast aligner. I think you have to specify reference names though, which in your case with 90000 is going to be prohibitive. Perhaps another aligner like Novoalign might be effective - you'll have to see what works for you.

A large number of contigs is the norm for eukaryotic projects - I think the Panda assembly is a good example.

I don't know if this is accurate, but it is fairly astounding!

De novo assembly of short sequence reads - PubMed

http://www.ncbi.nlm.nih.gov/pubmed/20724458

A new generation of sequencing technologies is revolutionizing molecular biology. Illumina's Solexa and Applied Biosystems' SOLiD generate gigabases of nucleotide sequence per week. However, a perceived limitation of these ultra-high-throughput technologies is their short read-lengths. De novo assem …

""This has led to the generation of several draft genome sequences based exclusively on short sequence Illumina sequence reads, recently culminating in the assembly of the 2.25-Gb genome of the giant panda from Illumina sequence reads with an average length of just 52 nucleotides.""
Comment
themerlin

Member

Join Date: Feb 2010

Posts: 51
- Share
- Tweet
#5

10-22-2010, 04:46 AM

Have you tried using GapCloser from the Soap package? I'm not sure how well it will work on one direction 454 reads, but it works great with PE illumina data at resolving internal Ns in scaffolds.

J
Comment
Linnea

Member

Join Date: Mar 2010

Posts: 23
- Share
- Tweet
#6

10-25-2010, 10:39 PM

Originally posted by themerlin View Post

Have you tried using GapCloser from the Soap package? I'm not sure how well it will work on one direction 454 reads, but it works great with PE illumina data at resolving internal Ns in scaffolds.

Could you provide GapCloser with 454 reads, even if the original assembly is done by illumina?? I have never tried it, but if this is possible I'll certainly give it a shot!
Comment
themerlin

Member

Join Date: Feb 2010

Posts: 51
- Share
- Tweet
#7

10-26-2010, 04:42 AM

I believe so. You can also map back your illumina reads to the scaffolds, which often resolves some internal Ns.

J
Comment
Linnea

Member

Join Date: Mar 2010

Posts: 23
- Share
- Tweet
#8

10-28-2010, 04:04 AM

SOAP GapCloser not for 454 reads

I just tested GapCloser on my 454 reads, and it failed saying "read max length should be less than 188bp". So actually it is not working with 454 reads at all (since most of them are much longer than 188bp). Too bad

I'm also for the moment running it on the original Illumina data, hopefully this will work better!
Comment
flxlex

Moderator

Join Date: Nov 2008

Posts: 414
- Share
- Tweet
#9

10-28-2010, 11:18 PM

If you can get your assembly in ace file format, perhaps you can use consed? It has a way to add 454 reads from an sff file.
Comment
Linnea

Member

Join Date: Mar 2010

Posts: 23
- Share
- Tweet
#10

10-29-2010, 05:41 AM

Originally posted by flxlex View Post

If you can get your assembly in ace file format, perhaps you can use consed? It has a way to add 454 reads from an sff file.

So you can actually align 454 reads (that has not been aligned before) to an assembly with consed? I thought it was more for viewing. Sounds interesting! I don't have the assembly in ace, but it shouldn't be too hard to fix.
Do you know if consed can handle very large assemblys (>1Gb, made with 600,000,000 illumina reads)?
Comment
flxlex

Moderator

Join Date: Nov 2008

Posts: 414
- Share
- Tweet
#11

11-01-2010, 06:21 AM

Originally posted by Linnea View Post

Do you know if consed can handle very large assemblys (>1Gb, made with 600,000,000 illumina reads)?

You never know, but I guess this could be tricky... Perhaps best to ask the phui mailing list?
Comment
kbushley

Member

Join Date: Jan 2010

Posts: 22
- Share
- Tweet
#12

11-18-2010, 09:08 AM

Supply MIRA with assembled Illumina contigs

Hi,

MIRA is a great package and will give you the info you're looking for in terms of contig connections. It can't handle supplying raw Illumina reads but I've heard of people imputting assembled Illumina contigs...it has a size limitation on how large a contig it can handle but I think it's in the range of 50kb or so...you could then map the 454 reads to this. Haven't tried this but would be worth a shot. You might also want to post to the MIRA listserve...the author may have some useful suggestions.
Comment
roshanbernard

Member

Join Date: Feb 2011

Posts: 31
- Share
- Tweet
#13

02-28-2011, 10:40 PM

hi,

i am having 454 sequncing data and i have got 6 contigs with that.

also i am having illumina solexa sequences....

Can i use this illumina data for gap filling of the 6 contigs..

i am using geneious software...is there any better plat form for the assembly

(p.S: when i assemble illumina reads,it gives me 361 contigs...)

please help me with this..
Comment
ProfYorke

Junior Member

Join Date: May 2011

Posts: 1
- Share
- Tweet
#14

05-06-2011, 06:06 PM

As reported in the paper below, the turkey ~ 1Gb was assembled using 454 and Illumina by feeding all the data into the Celera assembler. A nice assembly emerged. Having only 1x coverage by 454 is not a problem. It should help the Illumina coverage. For the turkey over 2% of the assembly was covered only by 454 and over 2% was covered only by Illumina. The Celera assembler is not easy to use.

Rami Dalloul, Julie Long, Aleksey Zimin, ... James Yorke, Liqing Zhang, Hong-Bin Zhang, Xiaojun Zhang, Yang Zhang, and Kent Reed;
Multi-platform Next Generation Sequencing of the Domestic Turkey (Meleagris gallopavo): Genome Assembly and Analysis,
PLoS Biology. Published Sept 7 2010
Comment

Previous template Next

Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM
Strategies for Sequencing Challenging Samples

by seqadmin

Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
- Channel: Articles
03-22-2024, 06:39 AM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 17 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 46 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Improving Illumina assembly with 454 reads?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News