Unconfigured Ad

**rhinoceros** · 05-08-2013, 12:38 AM

How many months/years you expect this query will take? You think you have enough hdd space for the output file? If it's impossible for you to run your query on some more powerful platform, at least split the input into smaller files..

**flacchy** · 05-08-2013, 12:46 AM

We do have enough space for the output file, I know somebody tried this before and took 6 months, that's why I was wondering for a way to keep track...
Do you know if there is a different way then clustering the data? or a free platform I could use?

**rhinoceros** · 05-08-2013, 01:28 AM

If I were you, I'd run my blasts on Amazon EC2 or something similar. It's not that expensive..

**maubp** · 05-08-2013, 02:21 AM

How may sequences in your contig FASTA file?

Are your contigs from a transcriptome assembly, meaning each is not that long (typical genes)? Or genomic meaning some could be very large? Either way, try smaller batches of 100 or 1000 sequences at a time - that should let you estimate how long the whole assembly will take.

Does your computer have enough RAM for the NT database?

Does your computer have multiple CPU cores? Have you tried running BLAST with multiple threads and/or multiple copies of BLAST on separate query files?

Are you using the plain text output? If so what will you do with it - parse it? Perhaps a more compact and computer friendly output might be wiser, like the tabular output?

**flacchy** · 05-08-2013, 02:38 AM

Thanks maubp... so

The metagenome is been sequenced with Illumina and we know that the read length is in a range between 15 and 99 bp.

Do you suggest using softwares such as CD-Hit to spilt the file into smaller batches?

We installed the nt db into the NX machine and we have 8cores CPU, could you help me a little more on how run BLAST with multiple threads and/or multiple copies of BLAST on separate query files? (is there some link I can look at?)

as a output I set a fasta file (I sow that on some workshops) so I told the program to give as output a file named contigs.fa.blastn

**rhinoceros** · 05-08-2013, 02:44 AM

Originally posted by flacchy View Post

The metagenome is been sequenced with Illumina and we know that the read length is in a range between 15 and 99 bp.

Do you suggest using softwares such as CD-Hit to spilt the file into smaller batches?

Why not assemble before doing anything else, or alternatively send the reads for blast to mg-rast or img/m or some other online pipeline? But really, you should assemble first. What do you hope to gain from blasting reads that are just 15 nt long?

We installed the nt db into the NX machine and we have 8cores CPU, could you help me a little more on how run BLAST with multiple threads and/or multiple copies of BLAST on separate query files? (is there some link I can look at?)

http://www.ncbi.nlm.nih.gov/books/NBK1762/ ..you had already set up 4 threads with the -a flag. In newer versions of blast -num_threads replaces this flag, and really, for speed gains you should be using the latest version..

**maubp** · 05-08-2013, 02:47 AM

Originally posted by rhinoceros View Post

Why not assemble before doing anything else, or alternatively send the reads for blast to mg-rast or img/m or some other online pipeline? But really, you should assemble first. What do you hope to gain from blasting reads that are just 15 nt long?

http://www.ncbi.nlm.nih.gov/books/NBK1762/ ..you had already set up 4 threads with the -a flag. In newer versions of blasts -num_threads replaces this flag..

I assumed from your question from the filename contigs.fa that you had already assembled the data. If not, you should do that first.

**flacchy** · 05-08-2013, 05:15 AM

I assemble these reads with velvet, now I am trying to set metavelvet to get better contigs, since the contigs I obtained are still short (some of them 41nt)

at the same time we are running a search on the reads to look at what kind of 'organisms' expect from the data. Does it make sense?

**GenoMax** · 05-08-2013, 05:23 AM

Wouldn't it be preferable to use a resource like MG-RAST (http://metagenomics.anl.gov/) for this type of analysis? Assuming that the sample here is metagenomic, of course.

**flacchy** · 05-08-2013, 06:11 AM

yes it is metagenome (specifically marine viromes), I'll have a look.. Thank you so much this was of great help!

**flacchy** · 05-08-2013, 07:01 AM

If anyone is curious there is a script to keep track on blast (if you are dealing with huge data)

http://wilkox.wordpress.com/2010/07/19/a-simple-progress-monitor-for-blast-searches/

**kmcarr** · 05-08-2013, 07:38 AM

Originally posted by flacchy View Post

yes it is metagenome (specifically marine viromes), I'll have a look.. Thank you so much this was of great help!

DO NOT use nt!! If your query sequences are from marine viruses don't search against the entire universe of DNA sequences.

One of the very first things you should do when setting up a BLAST experiment (yes, think of running BLAST as an in silico experiment) is choosing a database appropriate to your experimental system and objective. The nt database has DNA from every branch of the taxonomic tree and every species from aardvark to zyzzyva. I am hard pressed to think of a time when nt is the correct database to use. Construct a target database focused to the experiment and it will greatly speed up your BLAST.

Topics	Statistics	Last Post
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM
Scientists Solve a 25-Year Mystery in RNA Interference by SEQadmin2 Started by SEQadmin2, 05-26-2026, 10:12 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 05-26-2026, 10:12 AM

Unconfigured Ad

Tracking blastall

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News