Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by rglover View Post
    What are the names of the database files that formatdb is creating? Could you list them here? You could also try putting "-o T" on the end of your formatdb. Other than that I'm not really sure!
    The list of file created by formatdb are:
    .nhr, .nin, .nsq, .nsd, .nsi

    Comment


    • #17
      Originally posted by BioTalk View Post
      The list of file created by formatdb are:
      .nhr, .nin, .nsq, .nsd, .nsi
      What are the full names? This is important for what you tell blast, e.g. for example.nin etc tell blast the database name is example, but for example.fas.nin etc the database name is example.fas instead.

      Comment


      • #18
        Hiya. Looking at the link you posted for where you downloaded your executables, you're definitely using BLAST+, so blastall (probably) won't work well.
        If you try formatting your database with:

        makeblastdb -in yourfasta.fasta -dbtype nucl

        that should format your database for use with the Blast+ executables. I've managed to use formatdb/makeblastdb interchangeably between blast/blast+ in the past, but on Windows so you never know, it might error on Linux.
        If you then use the "blastn -db <yourfasta.fasta> -word_size" etc convention for starting your blast it might work.

        Comment


        • #19
          Thank you all very much! Now, I am able to generate following type of Blast output file:
          Which is a huge file because of the repetition of the information. Does anyone know how can we get it in any other format?

          BLASTN 2.2.21 [Jun-14-2009]


          Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
          Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
          "Gapped BLAST and PSI-BLAST: a new generation of protein database search
          programs", Nucleic Acids Res. 25:3389-3402.

          Query= 1-72342
          (20 letters)

          Database:/home//Desktop/mma.faa
          15,632 sequences; 339,921 total letters

          Searching..................................................done

          ***** No hits found ******


          BLASTN 2.2.21 [Jun-14-2009]


          Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
          Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
          "Gapped BLAST and PSI-BLAST: a new generation of protein database search
          programs", Nucleic Acids Res. 25:3389-3402.

          Query= 2-55421
          (19 letters)

          Database: /home//Desktop/mma.fa
          15,632 sequences; 339,921 total letters

          Searching..................................................done

          ***** No hits found ******


          BLASTN 2.2.21 [Jun-14-2009]


          Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
          Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
          "Gapped BLAST and PSI-BLAST: a new generation of protein database search
          programs", Nucleic Acids Res. 25:3389-3402.

          Query= 3-46574
          (21 letters)

          Database: /home/Desktop/mma.fa
          15,632 sequences; 339,921 total letters

          Searching..................................................done

          ***** No hits found ******


          BLASTN 2.2.21 [Jun-14-2009]


          Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
          Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
          "Gapped BLAST and PSI-BLAST: a new generation of protein database search
          programs", Nucleic Acids Res. 25:3389-3402.

          Query= 4-38013
          (17 letters)

          Database: /home//Desktop/mma.fa
          15,632 sequences; 339,921 total letters

          Searching..................................................done

          ***** No hits found ******

          Comment


          • #20
            if you have a look in the blast+ manual there's some formatting guidelines for tabulating the data output etc. Glad you got it working!

            Comment


            • #21
              Thanks to you! Sure I will have look into blast+ manual.

              Comment


              • #22
                Also it looks like your query sequences are very short (20 bp). You will probably have take this into consideration via non-default command line parameters.

                Comment


                • #23
                  Originally posted by westerman View Post
                  Also it looks like your query sequences are very short (20 bp). You will probably have take this into consideration via non-default command line parameters.
                  Yes, my query sequences are shorter than 20bp. What non default commands do I need to use?

                  Comment


                  • #24
                    For short sequences and for blast+ then using the commands 'blastn-short' or 'megablast' will be preferable to the regular commands. If those commands are not directly available then run 'blastn' with the command line option '-task blastn-short' or '-task megablast'.

                    There may be other options that I am unaware of since I do not do many short sequence alignments. The most important concept is to simply be aware that blast is generally used to align longer sequence and that at 20-bp you are getting close to the window sizes that blast uses. Blast, like many tools, is not something to use without some thought.

                    Comment


                    • #25
                      If you expect errors in your sequences or want to look for more distant relationships, you might want to lower the seed length (default of 11; try 6-8; parameter -W). BLAST(+) also filters regions for low complexity and your short sequences might be filtered out before any alignment. You can turn off the filtering and see if it makes any differences (-nofilter).

                      Comment


                      • #26
                        Does anyone know how to get an output file in Blast with only the details of aligned regions?

                        Because I am trying to compare two files with any fasta sequences in it and I am getting huge file with match as well as not matched regions.

                        Comment


                        • #27
                          If you use this command you'll only get the alignments:
                          -num_descriptions 0 -num_alignments <however-many-you-want>

                          You'll still get an output for the sequences where no matches have been found though. You could also try using BioPerl to process the Blast results.

                          Comment


                          • #28
                            Originally posted by rglover View Post
                            If you use this command you'll only get the alignments:
                            -num_descriptions 0 -num_alignments <however-many-you-want>

                            You'll still get an output for the sequences where no matches have been found though. You could also try using BioPerl to process the Blast results.
                            I tried -num_descriptions 0 -num_alignment 1 -outfmt 0, but I am still getting all the matched and unmatched regions in the same file.

                            Comment


                            • #29
                              Just to clarify - you're getting the one alignment that you want, but you're also getting the "No hits found" ones too?
                              If that's the case, you could use BioPerl to go through the file and then choose to only print out the ones that have hits to a new file.

                              Comment


                              • #30
                                Originally posted by rglover View Post
                                Just to clarify - you're getting the one alignment that you want, but you're also getting the "No hits found" ones too?
                                If that's the case, you could use BioPerl to go through the file and then choose to only print out the ones that have hits to a new file.
                                Yes, that's correct. But the output file generated is of random pattern which makes it more difficult for me to extract only aligned regions. Below if the example of the file.

                                Please let me know if anyone knows how to deal with this. Thank you!

                                BLASTN 2.2.23+


                                Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.
                                Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
                                Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of
                                protein database search programs", Nucleic Acids Res. 25:3389-3402.



                                Database: Desktop/RNA.fa
                                15,632 sequences; 339,921 total letters



                                Query= 1-72342
                                Length=20


                                ***** No hits found *****



                                Lambda K H
                                0.634 0.408 0.912

                                Gapped
                                Lambda K H
                                0.625 0.410 0.780

                                Effective search space used: 956935


                                Query= 2-55421
                                Length=19


                                ***** No hits found *****



                                Lambda K H
                                0.634 0.408 0.912

                                Gapped
                                Lambda K H
                                0.625 0.410 0.780

                                Effective search space used: 1066359


                                Query= 3-46574
                                Length=21
                                Score E
                                Sequences producing significant alignments: (Bits) Value



                                >lcl|zma-miR159k MIMAT0013980 Zea mays miR159k
                                Length=21

                                Score = 39.2 bits (42), Expect = 1e-06
                                Identities = 21/21 (100%), Gaps = 0/21 (0%)
                                Strand=Plus/Plus

                                Query 1 TTTGGATTGAAGGGAGCTCTG 21
                                |||||||||||||||||||||
                                Sbjct 1 TTTGGATTGAAGGGAGCTCTG 21


                                >lcl|
                                MIMAT0013979 Zea mays miR159j
                                Length=21

                                Score = 39.2 bits (42), Expect = 1e-06
                                Identities = 21/21 (100%), Gaps = 0/21 (0%)
                                Strand=Plus/Plus

                                Query 1 TTTGGATTGAAGGGAGCTCTG 21
                                |||||||||||||||||||||
                                Sbjct 1 TTTGGATTGAAGGGAGCTCTG 21


                                >lcl|zma-miR159f MIMAT0013975 Zea mays miR159f
                                Length=21

                                Score = 39.2 bits (42), Expect = 1e-06
                                Identities = 21/21 (100%), Gaps = 0/21 (0%)
                                Strand=Plus/Plus

                                Query 1 TTTGGATTGAAGGGAGCTCTG 21
                                |||||||||||||||||||||
                                Sbjct 1 TTTGGATTGAAGGGAGCTCTG 21


                                >lcl|tae-miR159b MIMAT0005344 Triticum aestivum miR159b
                                Length=21

                                Score = 39.2 bits (42), Expect = 1e-06
                                Identities = 21/21 (100%), Gaps = 0/21 (0%)
                                Strand=Plus/Plus

                                Query 1 TTTGGATTGAAGGGAGCTCTG 21
                                |||||||||||||||||||||
                                Sbjct 1 TTTGGATTGAAGGGAGCTCTG 21

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin


                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                                  Yesterday, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                39 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                41 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                35 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                55 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X