Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tracking blastall

    Hi,
    I just started my phd and I am working with a huge dataset (~7mil reads).
    I set blastall for nt into my biolinux shell and since it's going to take forever I wanted to ask for some help on how keep traks of the analysis.
    Using the less comand I can see what's on the output file but is there a way to get some numbers out of it? such as how many reads have been submitted already, and stuff like that.
    could someone help?

    ps.: this is the command I used:
    blastall -d 'nt' -p 'blastn' -i contigs.fa -o contigs.fa.blastn -e 1e-06 -b 10 -v 10 -a 4

    Thanks

  • #2
    How many months/years you expect this query will take? You think you have enough hdd space for the output file? If it's impossible for you to run your query on some more powerful platform, at least split the input into smaller files..
    Last edited by rhinoceros; 05-08-2013, 01:26 AM.
    savetherhino.org

    Comment


    • #3
      We do have enough space for the output file, I know somebody tried this before and took 6 months, that's why I was wondering for a way to keep track...
      Do you know if there is a different way then clustering the data? or a free platform I could use?

      Comment


      • #4
        If I were you, I'd run my blasts on Amazon EC2 or something similar. It's not that expensive..
        savetherhino.org

        Comment


        • #5
          How may sequences in your contig FASTA file?

          Are your contigs from a transcriptome assembly, meaning each is not that long (typical genes)? Or genomic meaning some could be very large? Either way, try smaller batches of 100 or 1000 sequences at a time - that should let you estimate how long the whole assembly will take.

          Does your computer have enough RAM for the NT database?

          Does your computer have multiple CPU cores? Have you tried running BLAST with multiple threads and/or multiple copies of BLAST on separate query files?

          Are you using the plain text output? If so what will you do with it - parse it? Perhaps a more compact and computer friendly output might be wiser, like the tabular output?

          Comment


          • #6
            Thanks maubp... so

            The metagenome is been sequenced with Illumina and we know that the read length is in a range between 15 and 99 bp.

            Do you suggest using softwares such as CD-Hit to spilt the file into smaller batches?

            We installed the nt db into the NX machine and we have 8cores CPU, could you help me a little more on how run BLAST with multiple threads and/or multiple copies of BLAST on separate query files? (is there some link I can look at?)

            as a output I set a fasta file (I sow that on some workshops) so I told the program to give as output a file named contigs.fa.blastn

            Comment


            • #7
              Originally posted by flacchy View Post
              The metagenome is been sequenced with Illumina and we know that the read length is in a range between 15 and 99 bp.

              Do you suggest using softwares such as CD-Hit to spilt the file into smaller batches?
              Why not assemble before doing anything else, or alternatively send the reads for blast to mg-rast or img/m or some other online pipeline? But really, you should assemble first. What do you hope to gain from blasting reads that are just 15 nt long?
              We installed the nt db into the NX machine and we have 8cores CPU, could you help me a little more on how run BLAST with multiple threads and/or multiple copies of BLAST on separate query files? (is there some link I can look at?)
              http://www.ncbi.nlm.nih.gov/books/NBK1762/ ..you had already set up 4 threads with the -a flag. In newer versions of blast -num_threads replaces this flag, and really, for speed gains you should be using the latest version..
              Last edited by rhinoceros; 05-08-2013, 02:52 AM.
              savetherhino.org

              Comment


              • #8
                Originally posted by rhinoceros View Post
                Why not assemble before doing anything else, or alternatively send the reads for blast to mg-rast or img/m or some other online pipeline? But really, you should assemble first. What do you hope to gain from blasting reads that are just 15 nt long?

                http://www.ncbi.nlm.nih.gov/books/NBK1762/ ..you had already set up 4 threads with the -a flag. In newer versions of blasts -num_threads replaces this flag..
                I assumed from your question from the filename contigs.fa that you had already assembled the data. If not, you should do that first.

                Comment


                • #9
                  I assemble these reads with velvet, now I am trying to set metavelvet to get better contigs, since the contigs I obtained are still short (some of them 41nt)

                  at the same time we are running a search on the reads to look at what kind of 'organisms' expect from the data. Does it make sense?
                  Last edited by flacchy; 05-08-2013, 05:17 AM.

                  Comment


                  • #10
                    Wouldn't it be preferable to use a resource like MG-RAST (http://metagenomics.anl.gov/) for this type of analysis? Assuming that the sample here is metagenomic, of course.

                    Comment


                    • #11
                      yes it is metagenome (specifically marine viromes), I'll have a look.. Thank you so much this was of great help!

                      Comment


                      • #12
                        If anyone is curious there is a script to keep track on blast (if you are dealing with huge data)

                        Comment


                        • #13
                          Originally posted by flacchy View Post
                          yes it is metagenome (specifically marine viromes), I'll have a look.. Thank you so much this was of great help!
                          DO NOT use nt!! If your query sequences are from marine viruses don't search against the entire universe of DNA sequences.

                          One of the very first things you should do when setting up a BLAST experiment (yes, think of running BLAST as an in silico experiment) is choosing a database appropriate to your experimental system and objective. The nt database has DNA from every branch of the taxonomic tree and every species from aardvark to zyzzyva. I am hard pressed to think of a time when nt is the correct database to use. Construct a target database focused to the experiment and it will greatly speed up your BLAST.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM
                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 06:37 PM
                          0 responses
                          8 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, Yesterday, 06:07 PM
                          0 responses
                          8 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-22-2024, 10:03 AM
                          0 responses
                          49 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-21-2024, 07:32 AM
                          0 responses
                          66 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X