Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GNU parallel + usearch piping

    Greetings, I am wondering if anyone knows how to pipe sequence data into usearch. I am trying to use GNU parallel to break-up up and distribute multiple smaller ublast jobs over our small server. I can do this with regular blast, but i get fatal errors when i try it with usearch-ublast.

    cat CAM_SMPL_001754_RMRNA_derep.fasta | parallel --block 100k --recstart '>' --pipe /public2/Tony/usearch -db /public2/Tony/CAMERA/RefSeqMicrobial/microbial.nonredundant.all.udb -top_hits_only -threads 16 -blast6out DATA

    Any suggestions would be helpful. thanks.

  • #2
    Given: usearch -cluster_fast seqs.fasta -id 0.9 -centroids nr.fasta

    You can do:

    cat CAM_SMPL_001754_RMRNA_derep.fasta | parallel --block 100k --recstart '>' --pipe "cat > {#}; usearch -cluster_fast {#} -id 0.9 -centroids {#}.out; cat {#}.out; rm {#} {#}.out"

    Comment


    • #3
      Yes, this works for cluster_fast, but its blast I am really after. thanks,

      Comment


      • #4
        I am no expert in usearch, but if you show the command line you would run to do it without GNU Parallel, then I might be able to help you parallelize it.

        Comment


        • #5
          thanks, below is the usearch command I would like to pipe;

          usearch -ublast ./454reads.fa -db ./RefSeqmicrobes.udb -evalue 1e-5 -top_hits_only -blast6out ./454reads_refseq_results

          Comment


          • #6
            Extremely similar to the cluster_fast command:

            cat CAM_SMPL_001754_RMRNA_derep.fasta | parallel --block 100k --recstart '>' --pipe "cat > {#}; usearch -ublast ./{#} -db ./RefSeqmicrobes.udb -evalue 1e-5 -top_hits_only -blast6out ./{#}.out; cat {#}.out; rm {#} {#}.out"

            Comment


            • #7
              Usearch has annoying default stdout output

              Originally posted by tange View Post
              Given: usearch -cluster_fast seqs.fasta -id 0.9 -centroids nr.fasta

              You can do:

              cat CAM_SMPL_001754_RMRNA_derep.fasta | parallel --block 100k --recstart '>' --pipe "cat > {#}; usearch -cluster_fast {#} -id 0.9 -centroids {#}.out; cat {#}.out; rm {#} {#}.out"
              This works very well, except for the fact that usearch (even using -quiet) will print 6 lines to stdout!
              usearch v7.0.1090_i86linux32, 4.0Gb RAM (32.5Gb total), 8 cores
              (C) Copyright 2013 Robert C. Edgar, all rights reserved.


              Licensed to: [email protected]

              The best solution I have found is to add:
              grep -E "^>|^[A,C,G,T]" > tyt

              Comment


              • #8
                Originally posted by GisleVestergaard View Post
                This works very well, except for the fact that usearch (even using -quiet) will print 6 lines to stdout!
                usearch v7.0.1090_i86linux32, 4.0Gb RAM (32.5Gb total), 8 cores
                (C) Copyright 2013 Robert C. Edgar, all rights reserved.


                Licensed to: [email protected]

                The best solution I have found is to add:
                grep -E "^>|^[A,C,G,T]" > tyt
                Would this work with GNU Parallel 20140822:

                parallel --pipepart -a CAM_SMPL_001754_RMRNA_derep.fasta --block 100k --recstart '>' --cat "usearch -cluster_fast {} -id 0.9 -centroids {#}.out; tail -n +7 {#}.out; rm {#}.out"

                Comment


                • #9
                  Originally posted by tange View Post
                  Would this work with GNU Parallel 20140822:

                  parallel --pipepart -a CAM_SMPL_001754_RMRNA_derep.fasta --block 100k --recstart '>' --cat "usearch -cluster_fast {} -id 0.9 -centroids {#}.out; tail -n +7 {#}.out; rm {#}.out"
                  Yes, this works and is faster than sed. Thanks!

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 11:49 AM
                  0 responses
                  15 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-24-2024, 08:47 AM
                  0 responses
                  16 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  61 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X