Seqanswers Leaderboard Ad

**BioTalk** · 07-26-2010, 07:46 AM

Originally posted by rglover View Post

What are the names of the database files that formatdb is creating? Could you list them here? You could also try putting "-o T" on the end of your formatdb. Other than that I'm not really sure!

The list of file created by formatdb are:
.nhr, .nin, .nsq, .nsd, .nsi

**maubp** · 07-26-2010, 08:10 AM

Originally posted by BioTalk View Post

The list of file created by formatdb are:
.nhr, .nin, .nsq, .nsd, .nsi

What are the full names? This is important for what you tell blast, e.g. for example.nin etc tell blast the database name is example, but for example.fas.nin etc the database name is example.fas instead.

**rglover** · 07-26-2010, 08:14 AM

Hiya. Looking at the link you posted for where you downloaded your executables, you're definitely using BLAST+, so blastall (probably) won't work well.
If you try formatting your database with:

makeblastdb -in yourfasta.fasta -dbtype nucl

that should format your database for use with the Blast+ executables. I've managed to use formatdb/makeblastdb interchangeably between blast/blast+ in the past, but on Windows so you never know, it might error on Linux.
If you then use the "blastn -db <yourfasta.fasta> -word_size" etc convention for starting your blast it might work.

**BioTalk** · 07-26-2010, 08:25 AM

Thank you all very much! Now, I am able to generate following type of Blast output file:
Which is a huge file because of the repetition of the information. Does anyone know how can we get it in any other format?

BLASTN 2.2.21 [Jun-14-2009]

Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.

Query= 1-72342
(20 letters)

Database:/home//Desktop/mma.faa
15,632 sequences; 339,921 total letters

Searching..................................................done

***** No hits found ******

BLASTN 2.2.21 [Jun-14-2009]

Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.

Query= 2-55421
(19 letters)

Database: /home//Desktop/mma.fa
15,632 sequences; 339,921 total letters

Searching..................................................done

***** No hits found ******

BLASTN 2.2.21 [Jun-14-2009]

Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.

Query= 3-46574
(21 letters)

Database: /home/Desktop/mma.fa
15,632 sequences; 339,921 total letters

Searching..................................................done

***** No hits found ******

BLASTN 2.2.21 [Jun-14-2009]

Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.

Query= 4-38013
(17 letters)

Database: /home//Desktop/mma.fa
15,632 sequences; 339,921 total letters

Searching..................................................done

***** No hits found ******

**rglover** · 07-26-2010, 08:30 AM

if you have a look in the blast+ manual there's some formatting guidelines for tabulating the data output etc.

Glad you got it working!

**BioTalk** · 07-26-2010, 08:37 AM

Thanks to you! Sure I will have look into blast+ manual.

**westerman** · 07-27-2010, 07:17 AM

Also it looks like your query sequences are very short (20 bp). You will probably have take this into consideration via non-default command line parameters.

**BioTalk** · 07-27-2010, 07:29 AM

Originally posted by westerman View Post

Also it looks like your query sequences are very short (20 bp). You will probably have take this into consideration via non-default command line parameters.

Yes, my query sequences are shorter than 20bp. What non default commands do I need to use?

**westerman** · 07-27-2010, 07:51 AM

For short sequences and for blast+ then using the commands 'blastn-short' or 'megablast' will be preferable to the regular commands. If those commands are not directly available then run 'blastn' with the command line option '-task blastn-short' or '-task megablast'.

There may be other options that I am unaware of since I do not do many short sequence alignments. The most important concept is to simply be aware that blast is generally used to align longer sequence and that at 20-bp you are getting close to the window sizes that blast uses. Blast, like many tools, is not something to use without some thought.

**robs** · 07-27-2010, 12:02 PM

If you expect errors in your sequences or want to look for more distant relationships, you might want to lower the seed length (default of 11; try 6-8; parameter -W). BLAST(+) also filters regions for low complexity and your short sequences might be filtered out before any alignment. You can turn off the filtering and see if it makes any differences (-nofilter).

**BioTalk** · 07-28-2010, 07:24 AM

Does anyone know how to get an output file in Blast with only the details of aligned regions?

Because I am trying to compare two files with any fasta sequences in it and I am getting huge file with match as well as not matched regions.

**rglover** · 07-28-2010, 07:29 AM

If you use this command you'll only get the alignments:
-num_descriptions 0 -num_alignments <however-many-you-want>

You'll still get an output for the sequences where no matches have been found though. You could also try using BioPerl to process the Blast results.

**BioTalk** · 07-28-2010, 07:45 AM

Originally posted by rglover View Post

If you use this command you'll only get the alignments:
-num_descriptions 0 -num_alignments <however-many-you-want>

You'll still get an output for the sequences where no matches have been found though. You could also try using BioPerl to process the Blast results.

I tried -num_descriptions 0 -num_alignment 1 -outfmt 0, but I am still getting all the matched and unmatched regions in the same file.

**rglover** · 07-28-2010, 11:55 AM

Just to clarify - you're getting the one alignment that you want, but you're also getting the "No hits found" ones too?
If that's the case, you could use BioPerl to go through the file and then choose to only print out the ones that have hits to a new file.

**BioTalk** · 07-30-2010, 10:44 AM

Originally posted by rglover View Post

Just to clarify - you're getting the one alignment that you want, but you're also getting the "No hits found" ones too?
If that's the case, you could use BioPerl to go through the file and then choose to only print out the ones that have hits to a new file.

Yes, that's correct. But the output file generated is of random pattern which makes it more difficult for me to extract only aligned regions. Below if the example of the file.

Please let me know if anyone knows how to deal with this. Thank you!

BLASTN 2.2.23+

Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.
Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs", Nucleic Acids Res. 25:3389-3402.

Database: Desktop/RNA.fa
15,632 sequences; 339,921 total letters

Query= 1-72342
Length=20

***** No hits found *****

Lambda K H
0.634 0.408 0.912

Gapped
Lambda K H
0.625 0.410 0.780

Effective search space used: 956935

Query= 2-55421
Length=19

***** No hits found *****

Lambda K H
0.634 0.408 0.912

Gapped
Lambda K H
0.625 0.410 0.780

Effective search space used: 1066359

Query= 3-46574
Length=21
Score E
Sequences producing significant alignments: (Bits) Value

>lcl|zma-miR159k MIMAT0013980 Zea mays miR159k
Length=21

Score = 39.2 bits (42), Expect = 1e-06
Identities = 21/21 (100%), Gaps = 0/21 (0%)
Strand=Plus/Plus

Query 1 TTTGGATTGAAGGGAGCTCTG 21
|||||||||||||||||||||
Sbjct 1 TTTGGATTGAAGGGAGCTCTG 21

>lcl|
MIMAT0013979 Zea mays miR159j
Length=21

Score = 39.2 bits (42), Expect = 1e-06
Identities = 21/21 (100%), Gaps = 0/21 (0%)
Strand=Plus/Plus

Query 1 TTTGGATTGAAGGGAGCTCTG 21
|||||||||||||||||||||
Sbjct 1 TTTGGATTGAAGGGAGCTCTG 21

>lcl|zma-miR159f MIMAT0013975 Zea mays miR159f
Length=21

Score = 39.2 bits (42), Expect = 1e-06
Identities = 21/21 (100%), Gaps = 0/21 (0%)
Strand=Plus/Plus

Query 1 TTTGGATTGAAGGGAGCTCTG 21
|||||||||||||||||||||
Sbjct 1 TTTGGATTGAAGGGAGCTCTG 21

>lcl|tae-miR159b MIMAT0005344 Triticum aestivum miR159b
Length=21

Score = 39.2 bits (42), Expect = 1e-06
Identities = 21/21 (100%), Gaps = 0/21 (0%)
Strand=Plus/Plus

Query 1 TTTGGATTGAAGGGAGCTCTG 21
|||||||||||||||||||||
Sbjct 1 TTTGGATTGAAGGGAGCTCTG 21

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News