SEQanswers

Go Back   SEQanswers > Applications Forums > De novo discovery



Similar Threads
Thread Thread Starter Forum Replies Last Post
Experienced Bioinformatics Field Application Scientist, Japan CLC bio Industry Jobs! 0 07-13-2011 01:40 AM
8k users! ECO Site Announcements 3 04-22-2010 12:59 AM
Experienced bioinformatician specialized in biological databases aurelielaugraud Academic/Non-Profit Jobs 0 01-25-2010 04:36 AM
Experienced genomicist with statistical expertise aurelielaugraud Academic/Non-Profit Jobs 0 08-09-2009 11:36 PM
Jobs : Experienced bioinformatician specialized in biological databases aurelielaugraud Academic/Non-Profit Jobs 0 08-09-2009 11:35 PM

Reply
 
Thread Tools
Old 06-26-2011, 02:44 PM   #1
salmonella
Junior Member
 
Location: texas

Join Date: Feb 2011
Posts: 5
Default advice from more experienced users

Hi
I am new to using NGS and need advice. Let me start by thanking all of you who post replies to the forums- it takes time to share, but believe me- we newbies really appreciate it! The learning curve on this stuff is steep!

I am trying to assemble a 5Mb bacterial genome from Illumina 40bp single reads. FastQC says that we got about 30 million reads. The base quality on all of the reads is above 30 so I do not believe that I have to trim or filter.
I would like to try de novo assembly since the genes I am interested in are novel and likely to reside on a plasmid, hence it would be difficult to use a reference genome to make the contigs. In addition they are likely flanked by non- unique DNA. I am not computer savy, but I do have access to Velvet and have run it a few times before. I have been using Tablet to see the .afg file output.

Here are my questions:
1. Apart changing k-mer length, what other parameters should be manipulated to optimize assembly?

2. On a related note, until we get the recommendations back from projects like GAGE, is there a series of hints that anyone can forward or a post somewhere with hints that will help with this kind of analysis for single read data?

3. Is there free software that can take any of the output files from Velvet and calculate the N50 value, so that as we do our iterations, we can figure out what works better? As I said, I am not a programmer so I am looking for something that is plug and play.

4. Does anyone have advice on the contig sizes that are 'normally expected' for this kind of assembly. In other words, if I get about 1700 contigs that are longer than 100 bp with a few that are 50-70 kb, is this considered good, or do I have a long way to go for optimization?

Many thanks
salmonella is offline   Reply With Quote
Old 07-04-2011, 10:29 PM   #2
tonybolger
Senior Member
 
Location: berlin

Join Date: Feb 2010
Posts: 156
Default

I'm no velvet expert, but here goes

Quote:
Originally Posted by salmonella View Post
1. Apart changing k-mer length, what other parameters should be manipulated to optimize assembly?
The coverage parameters - expected coverage and coverage cutoff.

Quote:
Originally Posted by salmonella View Post
3. Is there free software that can take any of the output files from Velvet and calculate the N50 value, so that as we do our iterations, we can figure out what works better? As I said, I am not a programmer so I am looking for something that is plug and play.
Curtain, a related project to curtain, comes with a fairly simple program, statsContigAll which gives a reasonable set of stats, min, max, N10 .. N90 etc.

Quote:
Originally Posted by salmonella View Post
4. Does anyone have advice on the contig sizes that are 'normally expected' for this kind of assembly. In other words, if I get about 1700 contigs that are longer than 100 bp with a few that are 50-70 kb, is this considered good, or do I have a long way to go for optimization?
Depends on the target genome - if you have a relatively simple genome (and i guess you have), you should get only a few large contigs. You should also get a total size in the right range (unless you have a lot of repeats).

That said, you could really use longer paired reads - 40 bases is on the short side, and paired data is sooo much better for de-novo.

BTW, i wouldn't rule out the need to filter / trim - most assemblers ignore quality scores entirely, so why not help them by removing the crud, even if it is a relatively small percentage of the data. Removing adapters is also strongly recommended.
tonybolger is offline   Reply With Quote
Old 07-17-2011, 05:43 AM   #3
chkuo
Member
 
Location: Taiwan

Join Date: May 2010
Posts: 11
Default

As has been suggested, the coverage parameters are important, particularly when you want to separate the chromosome and plasmids (plasmids tend to have much higher coverage).

n50, max contig size, and some other information about the assembly should be readily available in the velvet log file.

Hope this helps.
chkuo is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:52 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO