Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    In that case you need to use -db 16sBLASTdb when you are doing the blast search.
    Code:
    $ blastn -query unibac.fasta -db 16sBLASTdb  -out blastn.outfmt6 -evalue 1e-5 -num_threads 6 -max_target_seqs 1 -outfmt 6
    That said, are you not using the latest Blast+ package where the command is now makeblastdb?

    Comment


    • #17
      Originally posted by GenoMax View Post
      In that case you need to use -db 16sBLASTdb when you are doing the blast search.
      Code:
      $ blastn -query unibac.fasta -db 16sBLASTdb  -out blastn.outfmt6 -evalue 1e-5 -num_threads 6 -max_target_seqs 1 -outfmt 6
      That said, are you not using the latest Blast+ package where the command is now makeblastdb?

      Ok this time I used makeblastdb with the following command:

      makeblastdb -in 16s.fasta -out 16sdatabaseBLAST -dbtype nucl -parse_seqids

      I ended up receiving 6 16sdatabaseBLAST named files, each with a different file extension.

      I then ran the blast query with the code that was mentioned previously. Consequently, I received a single file called blastn.outfmt6. The only problem's that it doesn't have anything inside the output file... At least there isn't an error this time.

      Comment


      • #18
        Start by removing the e-value restriction and other limits. Once you are sure the blast is working then you can start adding filters in as needed.

        Comment


        • #19
          Originally posted by GenoMax View Post
          Start by removing the e-value restriction and other limits. Once you are sure the blast is working then you can start adding filters in as needed.
          I just recalled that I gave only one short sequence that belonged to the forward primer that's used to amplify Bacteroidales sequences. When I used a fasta file that contained many more sequences, I received a 6.5 GB file.

          I'm also looking through all the types of outputs I can generate... I'm just not sure which one's the best to use for downstream analyses, or what some of these terms mean.

          Here's the list of formats that can be generated.

          0 = pairwise,
          1 = query-anchored showing identities,
          2 = query-anchored no identities,
          3 = flat query-anchored, show identities,
          4 = flat query-anchored, no identities,
          5 = XML Blast output,
          6 = tabular,
          7 = tabular with comment lines,
          8 = Text ASN.1,
          9 = Binary ASN.1,
          10 = Comma-separated values,
          11 = BLAST archive format (ASN.1),
          12 = JSON Seqalign output,
          13 = JSON Blast output,
          14 = XML2 Blast output

          Comment


          • #20
            I am not sure why you are using blast here but there are well known programs (Qiime and Mothur) that are designed for NGS data and computational ecology. They are going to be more efficient that using blast.

            Comment


            • #21
              Originally posted by GenoMax View Post
              I am not sure why you are using blast here but there are well known programs (Qiime and Mothur) that are designed for NGS data and computational ecology. They are going to be more efficient that using blast.
              I'm mainly using blast because my profs suggested it for the course. For my project, I plan on working with QIIME; our fellow Masters student already prepared a pipeline for processing sequences with QIIME. I heard that he had trouble incorporating a check for chimeric sequences though. Once he's done his Masters project and I've finished the course, it's most likely I'll take over his pipeline,

              Comment


              • #22
                Originally posted by Naphtap View Post
                I just recalled that I gave only one short sequence that belonged to the forward primer that's used to amplify Bacteroidales sequences. When I used a fasta file that contained many more sequences, I received a 6.5 GB file.

                I'm also looking through all the types of outputs I can generate... I'm just not sure which one's the best to use for downstream analyses, or what some of these terms mean.

                Here's the list of formats that can be generated.

                0 = pairwise,
                1 = query-anchored showing identities,
                2 = query-anchored no identities,
                3 = flat query-anchored, show identities,
                4 = flat query-anchored, no identities,
                5 = XML Blast output,
                6 = tabular,
                7 = tabular with comment lines,
                8 = Text ASN.1,
                9 = Binary ASN.1,
                10 = Comma-separated values,
                11 = BLAST archive format (ASN.1),
                12 = JSON Seqalign output,
                13 = JSON Blast output,
                14 = XML2 Blast output
                Format 6/7 are popular when one needs to parse the output programatically. First few options are more visual (compare at a glance) formats. Peter Cock has a post on these here: http://blastedbio.blogspot.com/2012/...criptions.html

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 11:49 AM
                0 responses
                15 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-24-2024, 08:47 AM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                61 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                60 views
                0 likes
                Last Post seqadmin  
                Working...
                X