SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
ID conversion between species xiefanfang@gmail.com Bioinformatics 2 09-11-2012 10:49 AM
Analysis showing unexpected species hawaii454-0 Sample Prep / Library Generation 1 08-01-2012 03:46 AM
cross species RNAseq sidderb Bioinformatics 0 04-11-2012 02:12 AM
what's the default species when using repeatmasker heiya Bioinformatics 3 12-30-2011 04:44 AM
Best Cross species aligner Khanjan Bioinformatics 0 02-07-2011 06:59 AM

Reply
 
Thread Tools
Old 09-21-2012, 01:41 AM   #1
chris_bioinfo
Junior Member
 
Location: London

Join Date: Aug 2012
Posts: 8
Default novel species discovery metagenomics

hello everybody..

I have two environmental bacteria data sequenced on Illumina for metagenomics. I have already done the taxonomic content of species in the sample. Now I want to find out any novel species, can somebody please suggest me some bioinformatics tools for that


Can you please help..

I appreciate your help

Christopher
chris_bioinfo is offline   Reply With Quote
Old 09-27-2012, 04:09 AM   #2
Mark
Member
 
Location: Raleigh, NC

Join Date: Nov 2008
Posts: 48
Default

Can you be more specific about what type of of data you have (16S?) and what you have already done?
Mark is offline   Reply With Quote
Old 09-27-2012, 04:14 AM   #3
chris_bioinfo
Junior Member
 
Location: London

Join Date: Aug 2012
Posts: 8
Default

Dear Mark,

Its genomic data and I've already done taxonomic classification of the species that are present in the sample. using MEGAN. I have mapped reads to all bacteria NCBI database using bowtie, imported that sam file in MEGAN and got a nice tree view. but now I want to find out the novel species that have already been sequenced, from the reads which have not been aligned at all.
chris_bioinfo is offline   Reply With Quote
Old 09-27-2012, 04:25 AM   #4
Mark
Member
 
Location: Raleigh, NC

Join Date: Nov 2008
Posts: 48
Default

Well, given your approach (bowtie vs nucleotide database) it seems likely that your hits should be very close matches. To see the next tier of taxonomic relatedness you might try aligning you reads using blast (or another such tool) to do translated searches against a comprehensive protein database. Note that when you do this and examine the taxonomic assignments made by MEGAN, the hits identified are often significant yet still far from exact (much more so than when using bowtie) thus implying the presence of potentially novel species.
Mark is offline   Reply With Quote
Old 09-27-2012, 04:47 AM   #5
chris_bioinfo
Junior Member
 
Location: London

Join Date: Aug 2012
Posts: 8
Default

Im sorry Mark if Im wrong since Im new in metagenomics, but as far as I understand, if its a meta-transcriptome data then I should use tblastx and sear against nr database, right? what I feel is, this is genomic data, so matching similarity with nt database would solve the purpose..

and I tried doing standalone blast as well, but i have tremendous number of reads, 18 million paired end illumina reads, 36 million in total, so blast ran for four days and still running so I had to stop it and then I opted for bowtie2. I am confident that this is not a memory problem since I am running it on cluster which has more than 210 GB ram..

I'm truly thankful to your replies.

Best,
Christopher
chris_bioinfo is offline   Reply With Quote
Old 09-27-2012, 09:19 AM   #6
Mark
Member
 
Location: Raleigh, NC

Join Date: Nov 2008
Posts: 48
Default

Hi Chris

Actually, you would use blastx vs a protein database. tblastx is where both the query and the subject are translated and searched in protein space. This might also work but is even more computationally demanding than blastx.
I think you probably do want to search in protein space as it is more sensitive since amino acid sequence evolves more slowly than nucleotide sequence.

Yes, running a tool like blast on that much NGS data is burdensome unless you have prolonged access to a large cluster. One alternative that would still allow you to search in protein space is rapsearch2. It achieves 50-100X speedups over blastx with only limited loss in sensitivity. Parallelizing its execution may provide you with the speed you need to get the job done.

Mark
Mark is offline   Reply With Quote
Reply

Tags
bacteria, bioinformatics, metagenomics, novel species discovery

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:11 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO