Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BLASTn virus with CLC

    Hello

    I am a newcomer to bioinformatics, and need your assistance.

    I bought CLC Genomics Workbench, and started with using BLAST at NCBI.
    I selected CLC's tutorial dataseet (P. aeruginosa) and
    chose Limit by entrez query as Bacteria [ORGN] or Viruses [ORGN].

    However, the process expired after half an hour,
    and returned only an error message.

    I contacted the supplier of CLC, who replied that NCBI recently rejects (or put a very low priority to) requests through CLC workbench. The supplier recommended me to use "Download BLAST Databases" tool.

    My ultimate interest is virus. My idea is
    first exclude reads which align with human genome
    then denovo assemble the remainiing reads
    and align the resulting contigs (scaffolds) to known bacteria/viruses.
    Non-aligned contigs would be from a novel virus.

    Please let me know
    which bacterial/viral database (with URL) I have to download.

    I would like to download not only NCBI DB but also GOLD database
    which includes non-published sequences.

    Thank you

  • #2
    I don't think there is a special NCBI virus only BLAST database.

    I made a BLAST database of complete viruses using Entrez to download virus only sequences as FASTA format, http://blastedbio.blogspot.co.uk/201...-chimeras.html

    You might want to use the viral sequences from the NCBI FTP site?

    Comment


    • #3
      Hi maubp

      Thank you very much for your useful information.

      CLC told me to download db from
      ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/

      Is it possible to merge this and your recommendation in CLC?

      thanks

      Comment


      • #4
        My Entrez idea ought to give the latest (largest) set of viruses but is also more complicated.

        The viral sequences from RefSeq on the NCBI FTP site should be fine (and is much easier to reproduce since they have a clear release process).

        Comment


        • #5
          @doggy: Your ultimate interest is "virus" but perhaps there is a more specific type of virus that you are interested in. If that is the case then Peter's tax ID solution may still be better since it will allow you to get all viral sequences for that ID. With blast it is always best to start with the smallest search space you can use.

          Comment


          • #6
            Hi Peter,

            I read your "trouble with chimeras...", which seems a very important issue.

            Have you generated the entire-virus genome database by using your script?

            If so, I would like to download it, since I have never run python.

            Thank you,

            Comment


            • #7
              Thanks,

              However, I have no idea which taxomic branch (DNA, RNA, ds, ss) the virus which are looking for is in.

              Comment


              • #8
                If you read the comments on http://blastedbio.blogspot.co.uk/201...-chimeras.html you'll see that Zag pointed out I was missing many complete viral genomes due to annotation properties (they were not flagged as complete). In any case, my download is already about a year old, so I would not encourage you to use it.

                See also ftp://ftp.ncbi.nih.gov/genomes/Viruses/ which has many virus genomes in separate folders - note the very useful tar-balls like ftp://ftp.ncbi.nih.gov/genomes/Viruses/all.faa.tar.gz for all the viral protein sequences (as many separate files).

                I would suggest initially you follow CLC Bio's advice and start from ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/ to make a virus BLAST database. This is simple, and easy to describe or reproduce

                Comment


                • #9
                  Hi Peter,

                  Thanks again for your very useful advices.

                  Following your previous posting, I am downloading multiple viral taxonomies
                  one by one (i.e. deltavirus, dsDNA with no RNA, dsRNA, Retro...) by using taxonomy browser, and combining them all.

                  I have also downloaded ftp://ftp.ncbi.nih.gov/genomes/Viruses/all.fna.tar.gz
                  and ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/.

                  I will compare these three DBs, by blasting specific sequences against these three.

                  Thanks again

                  Comment


                  • #10
                    Making custom-made databank for local Blast

                    Hello,

                    we provide the CLC bio Workbench platform with a plugin that contains a tool (Databank Manager) to setup custom-made databanks. Among others, you can automatically download public databanks and filter them out using taxonomy criteria. Maybe this could be useful to work with local Blast. Please have a look at: http://www.clcbio.com/clc-plugin/klast-4-cores/ and http://www.korilog.com/sequence-databank-manager

                    regards,
                    Patrick

                    Comment


                    • #11
                      Patrick,

                      Thank you very much for letting me know this plugin, KLAST.
                      I will read the user's manual.

                      By the way, does this cost $910?

                      Comment


                      • #12
                        KLAST being a high-performance sequence comparison tool, the fee relies on the number of cores available on your system. $910 is the fee for a 4 core based computer, and it is a perpetual license. Regarding Klast performance and use, you may also read some uses cases available at http://www.korilog.com/use-cases, in addition to plugin manual.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Essential Discoveries and Tools in Epitranscriptomics
                          by seqadmin




                          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                          04-22-2024, 07:01 AM
                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Today, 11:49 AM
                        0 responses
                        12 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 08:47 AM
                        0 responses
                        16 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        61 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        60 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X