SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
how complete is the draft assembly? cegma? chrishah De novo discovery 12 02-12-2015 11:39 AM
PubMed: RAPSearch2: a fast and memory-efficient protein similarity search tool for ne Newsbot! Literature Watch 0 08-28-2012 03:00 AM
PubMed: FAAST: Flow-space Assisted Alignment Search Tool. Newsbot! Literature Watch 0 07-21-2011 07:40 AM
Mapping to SOLiD reads to draft genome saha SOLiD 1 04-16-2010 08:17 AM
BFAST: Blat-like Fast Accurate Search Tool for Large-Scale Genome Resequencing nilshomer Bioinformatics 1 11-06-2008 10:36 PM

Reply
 
Thread Tools
Old 10-09-2012, 01:10 AM   #1
Gorbenzer
Member
 
Location: italy

Join Date: Apr 2012
Posts: 45
Default search genes in a reads pool or (very) draft assembly: any usefull tool?

Hi everyone,
i'm in the need to screen a few bacterial genomes for the presence of around 100 genes i defined.

We have not much sequencing data for this genomes (20MBp circa with 200bp reads), that is far from the quantity needed for a good assembly.

I've already tryed to blastn/tblastx my genes database against reads and the contigs coming from the very draft assemblys we can get with this data but i'm not satisfied with the results because of the too high ration of false positives.

There's a tool that can help me in this work??? maybe some kind of metagenomic tool?


Thanks in advance to anyone that will help me
Gorbenzer is offline   Reply With Quote
Old 10-09-2012, 03:29 AM   #2
gaffa
Member
 
Location: Gothenburg/Uppsala, Sweden

Join Date: Oct 2010
Posts: 82
Default

I haven't tried it myself, but maybe this could be of interest: http://www.plosone.org/article/info%...l.pone.0019816
gaffa is offline   Reply With Quote
Old 10-09-2012, 07:28 AM   #3
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

TBLASTX is a good way to get a lot of difficult-to-interpret noise; I strongly recommend using BLASTX against a carefully focused database of proteins. If you are concerned with false positives, tighten up your P-value cutoff.

Merging your reads with FLASH prior to assembly may boost things a bit, if you have paired-end data. Still, depending on your bug, 20Mb of reads should be 2-10X coverage and you would be expected to get some sizable contigs. What sort of contig size distribution are you getting?

Also, it is relevant which sequencing technology you are using. Ion's indel issue could be causing you much grief, if that is the technology.

In the end, what is the question you wish the data to answer? It may be that this dataset is simply too sparse. On the other hand, for some questions it may well be good enough.
krobison is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:33 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO