SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
cummeRbund gene tracking stephenhart Bioinformatics 3 12-20-2014 12:25 PM
DNA Sample tracking rjohnp Core Facilities 2 04-05-2013 03:44 AM
blastall output: NCBI vs command line ElMichael Bioinformatics 8 11-17-2011 11:52 AM
BLAST+ vs BLASTALL (legacy BLAST) Symphysodon Bioinformatics 4 10-25-2011 02:52 PM
Blastall help !!! empyrean Bioinformatics 3 03-10-2011 07:13 PM

Reply
 
Thread Tools
Old 05-08-2013, 12:24 AM   #1
flacchy
Member
 
Location: UK

Join Date: Apr 2013
Posts: 33
Unhappy Tracking blastall

Hi,
I just started my phd and I am working with a huge dataset (~7mil reads).
I set blastall for nt into my biolinux shell and since it's going to take forever I wanted to ask for some help on how keep traks of the analysis.
Using the less comand I can see what's on the output file but is there a way to get some numbers out of it? such as how many reads have been submitted already, and stuff like that.
could someone help?

ps.: this is the command I used:
blastall -d 'nt' -p 'blastn' -i contigs.fa -o contigs.fa.blastn -e 1e-06 -b 10 -v 10 -a 4

Thanks
flacchy is offline   Reply With Quote
Old 05-08-2013, 12:38 AM   #2
rhinoceros
Senior Member
 
Location: sub-surface moon base

Join Date: Apr 2013
Posts: 372
Default

How many months/years you expect this query will take? You think you have enough hdd space for the output file? If it's impossible for you to run your query on some more powerful platform, at least split the input into smaller files..

Last edited by rhinoceros; 05-08-2013 at 01:26 AM.
rhinoceros is offline   Reply With Quote
Old 05-08-2013, 12:46 AM   #3
flacchy
Member
 
Location: UK

Join Date: Apr 2013
Posts: 33
Default

We do have enough space for the output file, I know somebody tried this before and took 6 months, that's why I was wondering for a way to keep track...
Do you know if there is a different way then clustering the data? or a free platform I could use?
flacchy is offline   Reply With Quote
Old 05-08-2013, 01:28 AM   #4
rhinoceros
Senior Member
 
Location: sub-surface moon base

Join Date: Apr 2013
Posts: 372
Default

If I were you, I'd run my blasts on Amazon EC2 or something similar. It's not that expensive..
rhinoceros is offline   Reply With Quote
Old 05-08-2013, 02:21 AM   #5
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

How may sequences in your contig FASTA file?

Are your contigs from a transcriptome assembly, meaning each is not that long (typical genes)? Or genomic meaning some could be very large? Either way, try smaller batches of 100 or 1000 sequences at a time - that should let you estimate how long the whole assembly will take.

Does your computer have enough RAM for the NT database?

Does your computer have multiple CPU cores? Have you tried running BLAST with multiple threads and/or multiple copies of BLAST on separate query files?

Are you using the plain text output? If so what will you do with it - parse it? Perhaps a more compact and computer friendly output might be wiser, like the tabular output?
maubp is offline   Reply With Quote
Old 05-08-2013, 02:38 AM   #6
flacchy
Member
 
Location: UK

Join Date: Apr 2013
Posts: 33
Default

Thanks maubp... so

The metagenome is been sequenced with Illumina and we know that the read length is in a range between 15 and 99 bp.

Do you suggest using softwares such as CD-Hit to spilt the file into smaller batches?

We installed the nt db into the NX machine and we have 8cores CPU, could you help me a little more on how run BLAST with multiple threads and/or multiple copies of BLAST on separate query files? (is there some link I can look at?)

as a output I set a fasta file (I sow that on some workshops) so I told the program to give as output a file named contigs.fa.blastn
flacchy is offline   Reply With Quote
Old 05-08-2013, 02:44 AM   #7
rhinoceros
Senior Member
 
Location: sub-surface moon base

Join Date: Apr 2013
Posts: 372
Default

Quote:
Originally Posted by flacchy View Post
The metagenome is been sequenced with Illumina and we know that the read length is in a range between 15 and 99 bp.

Do you suggest using softwares such as CD-Hit to spilt the file into smaller batches?
Why not assemble before doing anything else, or alternatively send the reads for blast to mg-rast or img/m or some other online pipeline? But really, you should assemble first. What do you hope to gain from blasting reads that are just 15 nt long?
Quote:
We installed the nt db into the NX machine and we have 8cores CPU, could you help me a little more on how run BLAST with multiple threads and/or multiple copies of BLAST on separate query files? (is there some link I can look at?)
http://www.ncbi.nlm.nih.gov/books/NBK1762/ ..you had already set up 4 threads with the -a flag. In newer versions of blast -num_threads replaces this flag, and really, for speed gains you should be using the latest version..

Last edited by rhinoceros; 05-08-2013 at 02:52 AM.
rhinoceros is offline   Reply With Quote
Old 05-08-2013, 02:47 AM   #8
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

Quote:
Originally Posted by rhinoceros View Post
Why not assemble before doing anything else, or alternatively send the reads for blast to mg-rast or img/m or some other online pipeline? But really, you should assemble first. What do you hope to gain from blasting reads that are just 15 nt long?

http://www.ncbi.nlm.nih.gov/books/NBK1762/ ..you had already set up 4 threads with the -a flag. In newer versions of blasts -num_threads replaces this flag..
I assumed from your question from the filename contigs.fa that you had already assembled the data. If not, you should do that first.
maubp is offline   Reply With Quote
Old 05-08-2013, 05:15 AM   #9
flacchy
Member
 
Location: UK

Join Date: Apr 2013
Posts: 33
Default

I assemble these reads with velvet, now I am trying to set metavelvet to get better contigs, since the contigs I obtained are still short (some of them 41nt)

at the same time we are running a search on the reads to look at what kind of 'organisms' expect from the data. Does it make sense?

Last edited by flacchy; 05-08-2013 at 05:17 AM.
flacchy is offline   Reply With Quote
Old 05-08-2013, 05:23 AM   #10
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,794
Default

Wouldn't it be preferable to use a resource like MG-RAST (http://metagenomics.anl.gov/) for this type of analysis? Assuming that the sample here is metagenomic, of course.
GenoMax is offline   Reply With Quote
Old 05-08-2013, 06:11 AM   #11
flacchy
Member
 
Location: UK

Join Date: Apr 2013
Posts: 33
Default

yes it is metagenome (specifically marine viromes), I'll have a look.. Thank you so much this was of great help!
flacchy is offline   Reply With Quote
Old 05-08-2013, 07:01 AM   #12
flacchy
Member
 
Location: UK

Join Date: Apr 2013
Posts: 33
Default

If anyone is curious there is a script to keep track on blast (if you are dealing with huge data)

http://wilkox.wordpress.com/2010/07/...last-searches/
flacchy is offline   Reply With Quote
Old 05-08-2013, 07:38 AM   #13
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,150
Default

Quote:
Originally Posted by flacchy View Post
yes it is metagenome (specifically marine viromes), I'll have a look.. Thank you so much this was of great help!
DO NOT use nt!! If your query sequences are from marine viruses don't search against the entire universe of DNA sequences.

One of the very first things you should do when setting up a BLAST experiment (yes, think of running BLAST as an in silico experiment) is choosing a database appropriate to your experimental system and objective. The nt database has DNA from every branch of the taxonomic tree and every species from aardvark to zyzzyva. I am hard pressed to think of a time when nt is the correct database to use. Construct a target database focused to the experiment and it will greatly speed up your BLAST.
kmcarr is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:56 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO