Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BLAST Help

    Hello all,

    I am trying to compare two .fasta files with many fasta sequences within both the files using Blast. But for some reason Blast is considering only first sequence from the fasta files. I am not sure what parameter I should use (for standalone BLAST) to compare all the sequences of both the files.

    Please let me know if anyone knows how to do it.

    Thank you!

  • #2
    It should "just work".

    Which flavour and version of standalone BLAST do you have (the NCBI "legacy" version in C, the new NCBI BLAST+ written in C++, or one of the 3rd party BLAST implementations)?

    What BLAST command line are you using?

    Have you checked your FASTA files are using the right new line characters for your OS? The command unix2dos or dos2unix can help here. Also try this to count the entries:

    grep -c "^>" your_file.fasta

    Comment


    • #3
      Thank you for your prompt response!

      I am using Blast for Linux 64 bit downloaded from:


      The command line I am using is: blastall -p blastn -F F -W 16 -i <inputfile.fa> -d <knownsequences.fa> -o <outputfile.fa>

      Comment


      • #4
        I am not sure about this question: Have you checked your FASTA files are using the right new line characters for your OS?

        Comment


        • #5
          Originally posted by BioTalk View Post
          I am not sure about this question: Have you checked your FASTA files are using the right new line characters for your OS?
          Unix/Linux/Mac OS X etc all use a LF character for a new line, while DOS/Windows uses the CR LF characters. This incompatibility is a common problem when dealing with files created on another OS.

          Comment


          • #6
            Originally posted by BioTalk View Post
            Thank you for your prompt response!

            I am using Blast for Linux 64 bit downloaded from:


            The command line I am using is: blastall -p blastn -F F -W 16 -i <inputfile.fa> -d <knownsequences.fa> -o <outputfile.fa>
            If you are using "blastall" then you are using the old legacy BLAST executables based on the NCBI C Toolkit.

            If you are using the new BLAST+ suite written in C++ then the command here would be "blastn" instead (and all the options have been renamed).
            Last edited by maubp; 07-26-2010, 07:28 AM. Reason: blastn vs blastp typo

            Comment


            • #7
              Originally posted by maubp View Post
              Unix/Linux/Mac OS X etc all use a LF character for a new line, while DOS/Windows uses the CR LF characters. This incompatibility is a common problem when dealing with files created on another OS.
              http://en.wikipedia.org/wiki/Newline
              Oh okay, I think this is not a problem with the fasta files as they are all created in Linux and being used in Linux. Also, I tried to open the files and they both looks like a normal fasta file.

              Please let me know if you know how should I deal with "Blank output file" problem!

              Comment


              • #8
                Originally posted by BioTalk View Post
                Please let me know if you know how should I deal with "Blank output file" problem!
                You never mentioned a "blank output file" until now. I thought you said you were having trouble getting BLAST to use multiple input query sequences.


                Does blastall give any error messages?

                Did you remember to create a BLAST database first using formatdb?
                Last edited by maubp; 07-26-2010, 07:21 AM.

                Comment


                • #9
                  I am sorry for the confusion! It is giving almost blank output file with the following details in it instead of alignment result.

                  BLASTN 2.2.21 [Jun-14-2009]


                  Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
                  Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
                  "Gapped BLAST and PSI-BLAST: a new generation of protein database search
                  programs", Nucleic Acids Res. 25:3389-3402.

                  Query= Cluster_573384 1
                  (23 letters)

                  So, I thought Blast it not using multiple input query sequences.

                  Comment


                  • #10
                    Originally posted by maubp View Post
                    If you are using "blastall" then you are using the old legacy BLAST executables based on the NCBI C Toolkit.

                    If you are using the new BLAST+ suite written in C++ then the command here would be "blastp" instead (and all the options have been renamed).
                    I tried the command for blastn as you have suggested and for that I got some error for indexing.
                    @biocomp:~/Desktop/Blast/bin$ ./blastn -word_size 16 -query <inputseq.fa> -db <sequencetocompare.fa> -out <outputfile.fa>
                    BLAST Database error: No alias or index file found for nucleotide database [/home/Desktop/sequencetocompare.fa] in search path [/home/Desktop/Blast/bin::]

                    Comment


                    • #11
                      You should have one line starting "Query=" for each query sequence.

                      If that is a full file, it looks like BLAST is crashing or failing to finish.

                      If I recall correctly, the next output would have been information about the BLAST database - are you sure that is setup right using formatdb? For example, can you do single queries against this database?

                      Comment


                      • #12
                        It could be that when you've formatted the blast database you didn't set it for nucleotide sequences - formatdb defaults to protein if it finds no command to specify nucleotide.
                        Try "formatdb -i <yourfasta.fasta> -p F"
                        The -p F turns protein off and nucleotide on

                        Comment


                        • #13
                          I just tried inputting one query sequence with the command: blastall -p blastn -F F -W 16 -i <inputfile.fa> -d <knownsequences.fa> -o <outputfile.fa>

                          and I got almost similar output:

                          BLASTN 2.2.21 [Jun-14-2009]


                          Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
                          Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
                          "Gapped BLAST and PSI-BLAST: a new generation of protein database search
                          programs", Nucleic Acids Res. 25:3389-3402.

                          Query= 1-72342
                          (20 letters)
                          Do you think there is installation problem? or the command I am using are not correct?

                          Comment


                          • #14
                            Originally posted by rglover View Post
                            It could be that when you've formatted the blast database you didn't set it for nucleotide sequences - formatdb defaults to protein if it finds no command to specify nucleotide.
                            Try "formatdb -i <yourfasta.fasta> -p F"
                            The -p F turns protein off and nucleotide on
                            I tried "formatdb -i <yourfasta.fasta> -p F" and then my previous command but it gave me the same output as before

                            Comment


                            • #15
                              What are the names of the database files that formatdb is creating? Could you list them here? You could also try putting "-o T" on the end of your formatdb. Other than that I'm not really sure!

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              30 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X