Thanks a lo for useful inputs. As I mentioned, the main challenge is that I have to wander without a reference sequence, which leaves only a few options for de novo discovery. BLAT is indeed one of attractive options, I have not tried it yet as I need a local install due to data volumes. This data set was generated after only a few MiSeq runs, so as data set grows a cluster or cloud is unavoidable, but I need to design a smart heuristic de novo strategy, otherwise any capability can be eventually saturated in some way. That is what I am trying to do using the local machine. De novo discovery has a dedicated forum, but I could not find much discussion there. Any ideas are appreciated and taken into consideration.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
I'm not sure I complete understand what you actually try to do?
I did not use this strategy on the bigger dataset yet as all online servers kick me out, so it is likely quite contaminated with known sequences. What I want to know eventually is what are those sequences that do not match to anything, whether they carry any biologically meaningful (from my point of understanding) information or simply represent artefactual junk (again from my point of understanding). As I know from analysis of the smaller dataset that blastn cannot give me any clue, I want to try blastx in hope that it will be more sensitive in detecting biologically meaningful information. Does this make sense?Last edited by yaximik; 01-04-2013, 06:20 AM.
Comment
-
Originally posted by pallevillesen View PostSo have you simply tested the
-num_threads <Integer, >=1>
Number of threads (CPUs) to use in the BLAST search
Default = `1'
option as suggested? And how did it go?
Anyway. for blasting 300k sequences I would find a cluster somewhere and parallelize it on as many nodes as possible.
I just found that T610 cannot accept any of GPUs either due to inadequate PCI slots or lack of needed power connectors. Two options are either use remaining in production vCORE GPU servers, which can be connected to T610, or to get entirely separate GPU server. The first option is about to be discontinued, and the second is more expensive. Any advice on reliable GPU server vendors? Unfortunately, many of them are Windows-based, which not a good option for bioinformatics.
Comment
-
Originally posted by yaximik View PostAs I am self educated user negotiating steep learning curves, I am not sure I am clear on the use multithreading. Does this apply to cores (the current box has 16) or CPUs (it has 2)? What happens if I specify 16 when the box actually has only 2 CPUs?
I just found that T610 cannot accept any of GPUs either due to inadequate PCI slots or lack of needed power connectors. Two options are either use remaining in production vCORE GPU servers, which can be connected to T610, or to get entirely separate GPU server. The first option is about to be discontinued, and the second is more expensive. Any advice on reliable GPU server vendors? Unfortunately, many of them are Windows-based, which not a good option for bioinformatics.
If this is a one time project then perhaps splitting the workload across a cluster (as others have suggested previously) may be the most efficient route to follow.
If this is a personally owned machine then you could always look at upgrading components (motherboard/power supply) and pursue the GPU compute route.
Comment
-
retraction
Originally posted by yaximik View PostThanks for input, that is overwhelming how much I need to learn. Before I got MiSeq, I was using online BLAST with its pretty interface. Once NCBI kicked me out because of sharply increased data volume, I had to make a local install. Now it is time to learn nuts and bolts of CLI. Thanks for advice, maupb and mjp, it is time to digest the manual.
Wow, that looks very promising! Could you enlighten me a bit more, as this is yet another set of nuts and bolts I have to learn on the fly. GPU - graphic processor unit, correct? Perhaps I could find and install one, but how do you plug it in to work with blast+, I presume instead of motherboard CPU?
Comment
-
A while ago I contacted NCBI about this and here was their reply:
####################
Not all phases of the algorithm are multi-threaded, which often means that even with "-num_threads" set >1, only 1 cpu is used. You might try formatting the large fasta file as a database and run the gene sequence against that, but that may not work either. It often takes a large input file against a large database to invoke multi-threading; it is not implemented for the traceback or formatting phases.
#####################
There apparently are other versions of blast which speed it up (e.g. mpiBlast, but this is probably an overkill), so as others have mentioned it probably is fastest if you split your inputs, giving them all 1 or 2 cores and working in batches simultaneously.
Comment
-
Try mpiBLAST.
Comment
-
Originally posted by Kennels View PostThere apparently are other versions of blast which speed it up (e.g. mpiBlast, but this is probably an overkill), so as others have mentioned it probably is fastest if you split your inputs, giving them all 1 or 2 cores and working in batches simultaneously.Last edited by rhinoceros; 08-29-2013, 01:36 PM.savetherhino.org
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 11:49 AM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
Today, 11:49 AM
|
||
Started by seqadmin, Yesterday, 08:47 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
Yesterday, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
61 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
Comment