Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • blast database creation ( multiple file )

    Hi,

    I'm a newbie in standalone blast. I'm working on the Bos Taurus Genome. My question is how to make a blast database of the bos taurus genome. On the NCBI ftp site in the bos taurus genome directory ( ftp://ftp.ncbi.nih.gov/genomes/Bos_taurus/ ) there's a lot of file . Which on is the good one to create this database. Other question, how to combine chromosomes files to create one database ?

    Thanks a lot,

    Nicolas

  • #2
    There are lots of files since the Bos Taurus genome is far from complete. People have various ways that they want to deal with the incomplete data.

    Since I am not in your shoes I can not say for certain, but I suspect that taking the 'bt_ref*.fa' (non-masked reference chromosomsal) files from the assembled section ( ftp://ftp.ncbi.nih.gov/genomes/Bos_t...romosomes/seq/ ) will be want you want to do. As for combining the files, the blast database creation program (aka, 'formatdb') will do this for you if you put multiple files after the '-i' option.

    Comment


    • #3
      ok thanks, I'll try that

      Do I take the bt_ref_*_unplaced.fa ?

      on the ncbi blast site, when a blast serach on bos taurus genome is done, which sequence is taken ?
      Last edited by NicoBxl; 10-04-2010, 11:37 PM.

      Comment


      • #4
        Either, put all the files in the same folder, and then launch the following command:

        cat *.fa > complete_bos.fasta && formatdb -i complete_bos.fasta -p F
        All your fasta files will be written in complete_bos.fasta and the formatting will be performed after.

        This command will work on Unix-like only, not in WinM$
        Francois Sabot, PhD

        Be realistic. Demand the Impossible.
        www.wikiposon.org

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        47 views
        0 likes
        Last Post seqadmin  
        Working...
        X