Seqanswers Leaderboard Ad

**RickBioinf** · 10-02-2012, 03:50 AM

How many reads do you have? The more reads you have the longer it takes (sometimes even 3 months) so it's probably not a good idea. You should check if you can make your dataset smaller.
Maybe you should also try to use another program than BLAST, it is slower than alot of different tools, like: https://github.com/csmiller/EMIRGE
If it still takes a long time let me know, I can help you search for some other things too.

Good luck.

**wkrhc4mia** · 10-03-2012, 01:50 PM

Originally posted by newBioinfo View Post

Hi Everyone,
I am very new in the field of Bioinformatics, I have a sequencing data from Ilumina of 16S r DNA from water. I got the quality score distribution of the data and it lies within 30 -35 range, which means it is a good data to start with. I want to know what will be the next step to deal with this data.
I am thinkng of doing this:
1) Blast the data with ribosomal data base

Can anyone provide me some idea how to start with this data.

Thanks for any help!!!!!

Take a look at QIIME (www.qiime.org, and the overview tutorial there) or mothur (www.mothur.org). Those provide standard pipelines for dealing with 16S sequences. Blasting some of the sequences against a database such as RDP or greengenes is usually part of the pipeline.

**newBioinfo** · 10-04-2012, 07:37 AM

Originally posted by RickBioinf View Post

How many reads do you have? The more reads you have the longer it takes (sometimes even 3 months) so it's probably not a good idea. You should check if you can make your dataset smaller.
Maybe you should also try to use another program than BLAST, it is slower than alot of different tools, like: https://github.com/csmiller/EMIRGE
If it still takes a long time let me know, I can help you search for some other things too.

Good luck.

Thanks RickBioinf for the help.
I have around 78 million reads and I have filtered these reads to 77 million. Now my data has those reads which have no 'N'. I tried blasting the data to non redundant database but it was taking too long. I will try what you have suggested. So, this db has only ribosomal DNA.
Thanks for the help!!!

**newBioinfo** · 10-04-2012, 07:39 AM

Originally posted by wkrhc4mia View Post

Take a look at QIIME (www.qiime.org, and the overview tutorial there) or mothur (www.mothur.org). Those provide standard pipelines for dealing with 16S sequences. Blasting some of the sequences against a database such as RDP or greengenes is usually part of the pipeline.

Thanks wkrhc4mia,
I am thinking of using mothur, so you mean I do not have to blast the data to any db separately, it will be a part of mothur pipeline.

Thanks for the help!!!

**GenoMax** · 10-04-2012, 09:32 AM

If you run a single blast search, it is going to take a long time. This is where you could break up your initial search file into multiple smaller fragments and then run the searches in parallel (would work best if you have access to a compute cluster).

There are parallel implementations of blast http://www.mpiblast.org/ that can be useful. Installing and using mpiBLAST is not trivial though .. just a fair warning.

Originally posted by newBioinfo View Post

Thanks RickBioinf for the help.
I tried blasting the data to non redundant database but it was taking too long. I will try what you have suggested. So, this db has only ribosomal DNA.
Thanks for the help!!!

**fanyucai1** · 10-27-2012, 06:19 PM

There is a software named MEGAN (http://ab.inf.uni-tuebingen.de/software/megan/)
,you could use. The reference database you could choose SILVA\Greengene\RDP

**Polecat** · 10-28-2012, 06:43 PM

Becareful if using MEGAN that you don't waste time by doing your blasts against the wrong database.

MEGAN likes NCBI taxonomy for BLASTN, BLASTX or BLASTP to compare against NCBI-NT, NCBI-NR or genome specific databases. MEGAN can also parse files generated by the RDP website or the Silva. MEGAN can also parse files in SAM format.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

how to deal with 16S rDNA data form Illumina

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News