Go Back   SEQanswers > Sequencing Technologies/Companies > MGISEQ (FKA Complete Genomics)

Similar Threads
Thread Thread Starter Forum Replies Last Post
BLAST+ creating custom blast database and using blast+ filtering features deniz Bioinformatics 3 07-07-2019 08:04 AM
How would you BLAST a large de novo dataset to NCBI? grassgirl Bioinformatics 2 06-06-2011 04:10 PM
ChIP-Seq: MEME-ChIP: motif analysis of large DNA datasets. Newsbot! Literature Watch 0 04-14-2011 02:50 AM
Keep large paired-end Fastq datasets in sync sklages Bioinformatics 4 03-17-2011 02:28 AM
Visualization Tools for Large Datasets mrawlins Bioinformatics 4 04-28-2010 02:53 AM

Thread Tools
Old 03-05-2012, 01:49 PM   #1
Junior Member
Location: Louisiana

Join Date: Sep 2011
Posts: 9
Default BLAST parameters for large datasets

I am mapping contig sequences with plant database (size 2 GB). I am using tblastx and blastx search tool. Can anyone suggest what parameter for blastx/tblastx i should use for mapping with database?. I am having about 1 million contig sequences.
parameter in case for e value, length to be covered by query sequence etc.
renesh is offline   Reply With Quote
Old 03-06-2012, 06:52 AM   #2
Rick Westerman
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104

For e-value it depends on how many false positives you can stand. E=1 means that you would expect 1 of your contigs to map to the database by random chance. And many others to be weakly associated to the database. You may wish to cast such a wide net and then take the time to track down the false positives. Personally I use E of 10 to the minus 6 as a first pass. I may miss some interesting correlations but at the same time I can focus in on what I do obtain since those results are going to be strong.
westerman is offline   Reply With Quote
Old 03-06-2012, 07:25 AM   #3
Location: Pittsburgh, PA

Join Date: Feb 2011
Posts: 49

The culling_limit and max_target_sequences should also affect the speed. culling_limit discards hits that have n number of reads that are better than it. max_target_sequences are the number of hits to retain. I generally set the max_target_sequences > culling_limit but you can tune it to what you want.

Hop this helps!
twaddlac is offline   Reply With Quote
Old 05-06-2015, 04:17 PM   #4
Junior Member
Location: California

Join Date: May 2015
Posts: 1
Default why max_target_seqs > culling_limit?

It isn't immediately obvious to me how these two parameters are related. The reason I believe Twaddlac suggested these parameters is because for any given query, you'd like to retrieve at least as many hits as the number of better matches than the alignment in question. This seems like a recursive definition, because when looking across many species most of the hits for a query will not be the best hit, so you'll just end up printing all the hits no matter what...

I'm pretty sure I'm wrong about something here, but cannot rectify my current understanding of how these parameters function:

-max_target_seqs <Integer, >=1>
Maximum number of aligned sequences to keep

Example 1. max_target_sequences = 5
If target1 is the best match for query1, but targets 2,3,4,5 are also good matches - this will retrieve the 5 alignments in a target-centric manner.

-culling_limit <Integer, >=0>
If the query range of a hit is enveloped by that of at least this many
higher-scoring hits, delete the hit

Example 2. culling_limit = 5
Queries (q1, q2, q3, q4, q5) are all good matches for target 1. However, if any of these queries has higher scoring matches to 5+ other targets, then the query->target1 alignment hit will be deleted.

If the query matches X genes better than a given target, therefore, the query-target relationship is deleted, it seems to me this has nothing to do with number of hits you want to retrieve at the end.

Can anyone explain if/how these two parameters are related? Thanks in advance.

Last edited by flow_science; 05-06-2015 at 04:19 PM.
flow_science is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 05:17 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO