Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BLAST database related question

    Hi, all,
    I'm downloading nt database from BLAST here: ftp://ftp.ncbi.nlm.nih.gov/blast/db/

    These are splited to individual files from nt.00.tar.gz to nt.13.tar.gz. I wonder do I need to somehow put them together after downloading them individually?

    Like this?
    <code>
    cat nt.00 nt.01 ... nt.13
    </code>

    Or it doesn't matter whether I have a single file or multiple files?

    Also, what is the file .md5 accompanying each nt.*tar.gz file?

    Thanks.

  • #2
    If I recall correctly, you don't need to paste them together.

    The md5 files are checksums. If you're worried that the file didn't download properly, you can run the md5 program on your own computer (it's on most unixes) on the file, then check to make sure that it's the same as the number on the web. I have never needed to do this.

    Comment


    • #3
      No you should not merge the files. They all need to be in the same directory though. Unless you are the worrying kind (or your network is not reliable) it may be ok to skip the md5sum checks since that could take some time on large files.

      You will to need to only provide the name of the database as in this minimal example (no numbers needed)

      Code:
      blastn –db nt –query query.fa –out results.out

      Comment


      • #4
        Thanks for the replies!

        I have another question:
        In the README file here :ftp://ftp.ncbi.nlm.nih.gov/blast/db/README

        nr.*tar.gz | non-redundant protein sequence database with
        | entries from GenPept, Swissprot, PIR, PDF, PDB,
        | and NCBI RefSeq

        nt.*tar.gz | nucleotide sequence database, with entries
        | from all traditional divisions of GenBank,
        | EMBL, and DDBJ excluding bulk divisions (gss,
        | sts, pat, est, and htg divisions. wgs entries
        | are also excluded. Not non-redundant.
        So now nr refers to protein sequence now? I should use nt for DNA?

        Comment


        • #5
          Originally posted by gene_x View Post
          Thanks for the replies!

          I have another question:
          In the README file here :ftp://ftp.ncbi.nlm.nih.gov/blast/db/README



          So now nr refers to protein sequence now? I should use nt for DNA?
          The answer is in the text you quoted in post #4.

          Comment


          • #6
            I know.. I read from somewhere (http://openwetware.org/wiki/Wikiomic...utorial#blastn) where it indicates that nr is also used to refer to nucleotides.. that why it makes me confused about it.

            So previously people use nr for both protein and nucleotides and now it's just proteins?

            Comment


            • #7
              Originally posted by gene_x View Post
              I know.. I read from somewhere (http://openwetware.org/wiki/Wikiomic...utorial#blastn) where it indicates that nr is also used to refer to nucleotides.. that why it makes me confused about it.

              So previously people use nr for both protein and nucleotides and now it's just proteins?
              That seems to have changed at some point in time .. not sure when that happened.

              This page at NCBI is still referring to old style options where "nr" could be used for either.

              Comment


              • #8
                Right.. they need to modify the old pages to clear things up..

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Recent Advances in Sequencing Analysis Tools
                  by seqadmin


                  The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                  Yesterday, 07:48 AM
                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 07:17 AM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-02-2024, 08:06 AM
                0 responses
                19 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-30-2024, 12:17 PM
                0 responses
                20 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-29-2024, 10:49 AM
                0 responses
                29 views
                0 likes
                Last Post seqadmin  
                Working...
                X