SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Improving contig sizes for denovo GC-rich assembly allthestairs Illumina/Solexa 2 10-17-2011 12:50 PM
Improving 454 assembly with Illumina clostridium40 454 Pyrosequencing 9 09-13-2011 08:17 AM
Improving de novo assembly Anelda Bioinformatics 4 09-12-2011 06:09 PM
hybrid assembly Illumina/454 Robby Bioinformatics 1 09-01-2011 12:54 AM
Discussion about MIRA hybrid assembly of 454 reads with Illumina unpaired data edge De novo discovery 5 11-16-2009 01:17 AM

Reply
 
Thread Tools
Old 10-19-2010, 06:05 AM   #1
Linnea
Member
 
Location: Uppsala, Sweden

Join Date: Mar 2010
Posts: 23
Default Improving Illumina assembly with 454 reads?

Hi all,

I'm working on a large genome assembly (~1Gbp) with Illumina paired-end reads, and currently I'm down to ~90 000 scaffolds (N50=26kb). Now I've got some additional 454 data (single end), and would like to use that for improving my assembly. I've heard about people assembling the two sets separately, and then try to merge them into one, and also people trying to do one big assembly with all reads.
I would instead like to map the 454 reads onto my Illumina assembly, and
see if I can get rid of NNNs in the scaffolds, or even link some scaffolds to each other. I tried the Roche GSReferenceMapper, and most reads mapped fully within scaffolds, but some are marked as "Chimeric". It seems like these reads map to more then one scaffold - possibly exactly what I'm looking for! But there seems to be no way to get the information on what scaffolds they map to (and to what positions) - I guess the software discards them as wrongly mapped?
Does anyone more familiar to this software know if this information can be retrieved? Or is there a better software for this purpose?

Any input would be appreciated!
Linnea is offline   Reply With Quote
Old 10-20-2010, 05:02 AM   #2
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 415
Default

I haven't used it extensively myself but Mira can apparently deal with 454 and Illumina reads at once.

Wouldn't it be better to use the 454 first and use the Illumina to get the coverage depth up ? According to data I've seen (bacterial genomes) Illumina PE doesn't actually contribute that much to improving a 454 assembly - provided you have decent 454 coverage.
colindaven is offline   Reply With Quote
Old 10-20-2010, 05:12 AM   #3
Linnea
Member
 
Location: Uppsala, Sweden

Join Date: Mar 2010
Posts: 23
Default

Quote:
Originally Posted by colindaven View Post
I haven't used it extensively myself but Mira can apparently deal with 454 and Illumina reads at once.

Wouldn't it be better to use the 454 first and use the Illumina to get the coverage depth up ? According to data I've seen (bacterial genomes) Illumina PE doesn't actually contribute that much to improving a 454 assembly - provided you have decent 454 coverage.
The problem is that I don't have a decent 454 coverage...
The Illumina coverage is ~30X, while for the 454 I only have 1X (it's only a test run, and no more runs are planned.. When I assembled this data separately with newbler, I covered about one third of the genome). Hence the Illumina data must be the foundation of the assembly, and the 454 can only be used for improving. Unfortunately MIRA seems to be insufficient for this huge amount of data.
Linnea is offline   Reply With Quote
Old 10-21-2010, 12:03 AM   #4
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 415
Default

Ok, this sounds tricky. I don't know how much information you'll gain, but you could try
Mummer and Bambus, a scaffolder. Apparently Bambus can use output directly from Mummer, and Mummer is a good and fast aligner. I think you have to specify reference names though, which in your case with 90000 is going to be prohibitive. Perhaps another aligner like Novoalign might be effective - you'll have to see what works for you.

A large number of contigs is the norm for eukaryotic projects - I think the Panda assembly is a good example.

I don't know if this is accurate, but it is fairly astounding!

http://www.ncbi.nlm.nih.gov/pubmed/20724458
""This has led to the generation of several draft genome sequences based exclusively on short sequence Illumina sequence reads, recently culminating in the assembly of the 2.25-Gb genome of the giant panda from Illumina sequence reads with an average length of just 52 nucleotides.""
colindaven is offline   Reply With Quote
Old 10-22-2010, 04:46 AM   #5
themerlin
Member
 
Location: Flagstaff, AZ

Join Date: Feb 2010
Posts: 51
Default

Have you tried using GapCloser from the Soap package? I'm not sure how well it will work on one direction 454 reads, but it works great with PE illumina data at resolving internal Ns in scaffolds.

J
themerlin is offline   Reply With Quote
Old 10-25-2010, 10:39 PM   #6
Linnea
Member
 
Location: Uppsala, Sweden

Join Date: Mar 2010
Posts: 23
Default

Quote:
Originally Posted by themerlin View Post
Have you tried using GapCloser from the Soap package? I'm not sure how well it will work on one direction 454 reads, but it works great with PE illumina data at resolving internal Ns in scaffolds.
Could you provide GapCloser with 454 reads, even if the original assembly is done by illumina?? I have never tried it, but if this is possible I'll certainly give it a shot!
Linnea is offline   Reply With Quote
Old 10-26-2010, 04:42 AM   #7
themerlin
Member
 
Location: Flagstaff, AZ

Join Date: Feb 2010
Posts: 51
Default

I believe so. You can also map back your illumina reads to the scaffolds, which often resolves some internal Ns.

J
themerlin is offline   Reply With Quote
Old 10-28-2010, 04:04 AM   #8
Linnea
Member
 
Location: Uppsala, Sweden

Join Date: Mar 2010
Posts: 23
Default SOAP GapCloser not for 454 reads

I just tested GapCloser on my 454 reads, and it failed saying "read max length should be less than 188bp". So actually it is not working with 454 reads at all (since most of them are much longer than 188bp). Too bad

I'm also for the moment running it on the original Illumina data, hopefully this will work better!
Linnea is offline   Reply With Quote
Old 10-28-2010, 11:18 PM   #9
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

If you can get your assembly in ace file format, perhaps you can use consed? It has a way to add 454 reads from an sff file.
flxlex is offline   Reply With Quote
Old 10-29-2010, 05:41 AM   #10
Linnea
Member
 
Location: Uppsala, Sweden

Join Date: Mar 2010
Posts: 23
Default

Quote:
Originally Posted by flxlex View Post
If you can get your assembly in ace file format, perhaps you can use consed? It has a way to add 454 reads from an sff file.
So you can actually align 454 reads (that has not been aligned before) to an assembly with consed? I thought it was more for viewing. Sounds interesting! I don't have the assembly in ace, but it shouldn't be too hard to fix.
Do you know if consed can handle very large assemblys (>1Gb, made with 600,000,000 illumina reads)?
Linnea is offline   Reply With Quote
Old 11-01-2010, 06:21 AM   #11
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

Quote:
Originally Posted by Linnea View Post
Do you know if consed can handle very large assemblys (>1Gb, made with 600,000,000 illumina reads)?
You never know, but I guess this could be tricky... Perhaps best to ask the phui mailing list?
flxlex is offline   Reply With Quote
Old 11-18-2010, 08:08 AM   #12
kbushley
Member
 
Location: Oregon

Join Date: Jan 2010
Posts: 22
Default Supply MIRA with assembled Illumina contigs

Hi,

MIRA is a great package and will give you the info you're looking for in terms of contig connections. It can't handle supplying raw Illumina reads but I've heard of people imputting assembled Illumina contigs...it has a size limitation on how large a contig it can handle but I think it's in the range of 50kb or so...you could then map the 454 reads to this. Haven't tried this but would be worth a shot. You might also want to post to the MIRA listserve...the author may have some useful suggestions.
kbushley is offline   Reply With Quote
Old 02-28-2011, 09:40 PM   #13
roshanbernard
Member
 
Location: South Korea

Join Date: Feb 2011
Posts: 31
Default

hi,

i am having 454 sequncing data and i have got 6 contigs with that.

also i am having illumina solexa sequences....

Can i use this illumina data for gap filling of the 6 contigs..

i am using geneious software...is there any better plat form for the assembly

(p.S: when i assemble illumina reads,it gives me 361 contigs...)

please help me with this..
roshanbernard is offline   Reply With Quote
Old 05-06-2011, 06:06 PM   #14
ProfYorke
Junior Member
 
Location: Maryland

Join Date: May 2011
Posts: 1
Default

As reported in the paper below, the turkey ~ 1Gb was assembled using 454 and Illumina by feeding all the data into the Celera assembler. A nice assembly emerged. Having only 1x coverage by 454 is not a problem. It should help the Illumina coverage. For the turkey over 2% of the assembly was covered only by 454 and over 2% was covered only by Illumina. The Celera assembler is not easy to use.

Rami Dalloul, Julie Long, Aleksey Zimin, ... James Yorke, Liqing Zhang, Hong-Bin Zhang, Xiaojun Zhang, Yang Zhang, and Kent Reed;
Multi-platform Next Generation Sequencing of the Domestic Turkey (Meleagris gallopavo): Genome Assembly and Analysis,
PLoS Biology. Published Sept 7 2010
ProfYorke is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:58 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO