Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Blast2GO Beginner's Question

    Hi everybody, I'm new in NGS. I used the flow STAR/Cufflinks/Cuffcompare and now i need annotate my transcripts, and I decided to use Blast2GO because it seems more intuitive.
    But not so totaly intuitive for a total beginner like me, and I don't know what fashion of blast I should perform. The basic version of program allow me to use this methods:

    QBlast@NCBI. NCBI o ers a public service that allows searching molecular sequence
    databases with the BLAST algorithm. The main advantages of making use of this service
    are its versatility and that no database maintenance is required. Therefore by selecting
    this option at Blast2GO no additional installations have to be done.

    Remote BLAST. Blast2GO will download the latest BLAST+ executable form NCBI and
    will use it to query NR or other databases remotely.

    Local BLAST against own database. It is possible to use BLAST+ excuteble to query a
    local/own database.

    WWW-BLAST. Alternatively, BLAST can be done locally against a custom database. For
    this, you need to place a copy of your FASTA formatted custom DB plus a WWW-BLAST
    installation on a local BLAST server and indicate Blast2GO their location.

    My fasta have 16450 sequences, and I want to use database NCBI NR Full, I have a i7 3770 8gb ram computer.

    So the question is: with this resources what is the most safe and easy way to Blast ?

  • #2
    Do this locally. Download the nr database and use BLASTX against it.

    BUT, it is MUCH faster if you use mpiBLAST on a cluster.

    The command for mpiBLAST (with the correct flags for B2GO) is something like:

    mpiblast -p blastx -d nr -i input.fa -v 20 -b 20 -I T -e 0.001 -m 7 -o output.xml

    Comment


    • #3
      Originally posted by cement_head View Post
      Do this locally. Download the nr database and use BLASTX against it.

      BUT, it is MUCH faster if you use mpiBLAST on a cluster.

      The command for mpiBLAST (with the correct flags for B2GO) is something like:

      mpiblast -p blastx -d nr -i input.fa -v 20 -b 20 -I T -e 0.001 -m 7 -o output.xml
      mpiBLAST is much faster even in this pc alone ?

      Comment


      • #4
        No, MPI-BLAST only makes sense on a cluster (if you have access to one). Local multithreading BLAST will be faster than local MPI multiprocessing, and more memory-efficient.

        You might run into trouble with only 8 GB RAM if you want to BLAST the complete nr locally, give it a try but you may get out-of-memory problems. i7-3770 would be 4 cores with hyperthreading, so be prepared for several days of BLASTing...

        Comment


        • #5
          Originally posted by sarvidsson View Post
          No, MPI-BLAST only makes sense on a cluster (if you have access to one). Local multithreading BLAST will be faster than local MPI multiprocessing, and more memory-efficient.

          You might run into trouble with only 8 GB RAM if you want to BLAST the complete nr locally, give it a try but you may get out-of-memory problems. i7-3770 would be 4 cores with hyperthreading, so be prepared for several days of BLASTing...
          Yes, correct - only if you have (access to) a cluster.

          Comment


          • #6
            Blasting against nr is not easy. Even with 4 threads, to blast 16,000 sequences will take around 4,000 minutes, or 66 days. Performing GO assignment is also not easy. Importing 16,000 blastx results into the free version of Blast2GO, and then doing GO assignments, will take many days.

            Comment


            • #7
              Sorry, my math was all wrong on my last post. Let me try again.

              In reality, it takes at least 5 minutes for blastx to align one transcript to nr. For 16,000 sequences, with 4 threads, that is (16,000x5)/4 = 20,000 minutes, or 13.8 days. Then if you want to get GOs by importing into the blast2GO free version, that takes several more days at least.

              Comment


              • #8
                @Will Nelson: curious if you have a roughly equivalent spec computer as the OP. Did you actually time a search?

                Comment


                • #9
                  Well I'm having a hard time with this, Blast2GO-basic remotely blasting just take too long for each sequence, so I got more speed trying it locally with Blast+ Blastx and importing the output xml on Blast2GO for subsequently steps. But Will Nelson is right, is impraticable do this on this computer. Our lab is about to buy a server with 128gb RAM, until then I wanna be more experienced with this, so I made a 100 sequences sample.

                  So I got this repeatedly when running the Blastx:

                  CFastaReader: Bad gap size at line ***
                  CFastaReader: Problem parsing gap mods at line ***

                  When "***" are line numbers, this lines matches with sequence id lines, that use this format:

                  >?_GroupUn999_2_939_+

                  What in this format is generating that error ?

                  Comment


                  • #10
                    What format are your sequences in?

                    That error seems to indicate that there may be a problem with your fasta file. Can you try to replace the "?_" at the beginning of the header? Looks like that may be causing a problem.

                    Comment


                    • #11
                      Originally posted by Romualdo View Post
                      Well I'm having a hard time with this, Blast2GO-basic remotely blasting just take too long for each sequence, so I got more speed trying it locally with Blast+ Blastx and importing the output xml on Blast2GO for subsequently steps. But Will Nelson is right, is impraticable do this on this computer. Our lab is about to buy a server with 128gb RAM, until then I wanna be more experienced with this, so I made a 100 sequences sample.

                      So I got this repeatedly when running the Blastx:

                      CFastaReader: Bad gap size at line ***
                      CFastaReader: Problem parsing gap mods at line ***

                      When "***" are line numbers, this lines matches with sequence id lines, that use this format:

                      >?_GroupUn999_2_939_+

                      What in this format is generating that error ?
                      Make absolutely sure that you buy ECC RAM. Anything less and you will have major problems

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Advancing Precision Medicine for Rare Diseases in Children
                        by seqadmin




                        Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                        12-16-2024, 07:57 AM
                      • seqadmin
                        Recent Advances in Sequencing Technologies
                        by seqadmin



                        Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                        Long-Read Sequencing
                        Long-read sequencing has seen remarkable advancements,...
                        12-02-2024, 01:49 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 12-17-2024, 10:28 AM
                      0 responses
                      22 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 12-13-2024, 08:24 AM
                      0 responses
                      42 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 12-12-2024, 07:41 AM
                      0 responses
                      28 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 12-11-2024, 07:45 AM
                      0 responses
                      42 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X