Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • find files which contain a string

    Hi members,

    Pardon me if this looks a trivial doubt.
    I know this has been asked several times on stackoverflow, unix and many other forums.

    I'm having a hard time, finding file using 'find'. I have tens of folders and each of them have another tens of folders, and so on.
    The name of folders and the file I am interested to find are way to long.

    Isolate_96_CN_11_B21_M1_C3_P2_GGATTAGG_L001_R2_001.fastq.gz

    Example:

    I want to find file which has "96_CN_11_B21_M1_C3_P2".

    Code:
    find . -name "*.fastq.*" | xargs grep "96_CN"
    Code:
     find . -name "*.fastq.gz" -exec egrep -Hn "96_CN_" {} \;
    Above commands take ages. I have waited more than 90 mins for the output and it still was processing.

    Please guide here.
    Again, sorry for this small query.
    Last edited by bio_informatics; 03-04-2015, 05:27 AM.
    Bioinformaticscally calm

  • #2
    I assume you are looking to locate file names/paths?

    If you are going to be needing to do this frequently perhaps creating a database of file/folder names like so

    Code:
    $ updatedb --require-visibility 0 -U /folder_hierarchy_to_search -o mydb
    Then you can use "locate" command to find file names/paths very rapidly

    Code:
    $ locate -d mydb filename_to_search
    Creating the database could take a significant amount of time but would be worth the returns.

    Comment


    • #3
      Hi Genomax,
      Thank you for your reply.
      I would be working on files for few months, then new data - new files names.
      Creating database is a nice idea but not worth when new data floods in on regular basis.

      I will post it on some other forum. (Will mention this post as well)
      Bioinformaticscally calm

      Comment


      • #4
        This is not a regular database and may be the fastest way of finding files (http://linux-sxs.org/utilities/updatedb.html) and http://en.wikipedia.org/wiki/Locate_%28Unix%29.

        You could look into running this as a cron job each day so you would not need to worry about it.

        Comment


        • #5
          Thank you for your valuable suggestions. I didn't know if anything of this type existed.
          I'm on a cluster and do not have much rights.
          I will definitely take these suggestions into consideration.
          Bioinformaticscally calm

          Comment


          • #6
            I'm doing wrong here:
            find . -name "*.fastq.*" | xargs grep "96_CN"
            It finds .fastq files and in them, it tries to look/grep "96_CN".
            fastq.gz are binary files and it would definitely take ages to grep.

            Didn't try this on small directories first.

            --
            The correct one I got as:

            Code:
            time find . -name "*96_CN_11_B21_M1_C3_P2*"
            Got path and file names needed with time.
            real 0m0.434s
            user 0m0.037s
            sys 0m0.152s
            Last edited by bio_informatics; 03-04-2015, 05:44 AM.
            Bioinformaticscally calm

            Comment


            • #7
              Can you try this?

              Code:
              $ find . -type f -name "*.fastq.*" | grep "96_CN"

              Comment


              • #8
                Genomax:
                I was making a horrible error in my command.
                Morning with fresh mind picked it up instantly.
                Bioinformaticscally calm

                Comment


                • #9
                  Consider adding "-type f" to your find command since you are only looking for files.

                  Comment


                  • #10
                    That worked.

                    Thanks much for your help and time.
                    Bioinformaticscally calm

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM
                    • seqadmin
                      The Impact of AI in Genomic Medicine
                      by seqadmin



                      Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                      02-26-2024, 02:07 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-14-2024, 06:13 AM
                    0 responses
                    32 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-08-2024, 08:03 AM
                    0 responses
                    72 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-07-2024, 08:13 AM
                    0 responses
                    80 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-06-2024, 09:51 AM
                    0 responses
                    68 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X