Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • find files which contain a string

    Hi members,

    Pardon me if this looks a trivial doubt.
    I know this has been asked several times on stackoverflow, unix and many other forums.

    I'm having a hard time, finding file using 'find'. I have tens of folders and each of them have another tens of folders, and so on.
    The name of folders and the file I am interested to find are way to long.

    Isolate_96_CN_11_B21_M1_C3_P2_GGATTAGG_L001_R2_001.fastq.gz

    Example:

    I want to find file which has "96_CN_11_B21_M1_C3_P2".

    Code:
    find . -name "*.fastq.*" | xargs grep "96_CN"
    Code:
     find . -name "*.fastq.gz" -exec egrep -Hn "96_CN_" {} \;
    Above commands take ages. I have waited more than 90 mins for the output and it still was processing.

    Please guide here.
    Again, sorry for this small query.
    Last edited by bio_informatics; 03-04-2015, 05:27 AM.
    Bioinformaticscally calm

  • #2
    I assume you are looking to locate file names/paths?

    If you are going to be needing to do this frequently perhaps creating a database of file/folder names like so

    Code:
    $ updatedb --require-visibility 0 -U /folder_hierarchy_to_search -o mydb
    Then you can use "locate" command to find file names/paths very rapidly

    Code:
    $ locate -d mydb filename_to_search
    Creating the database could take a significant amount of time but would be worth the returns.

    Comment


    • #3
      Hi Genomax,
      Thank you for your reply.
      I would be working on files for few months, then new data - new files names.
      Creating database is a nice idea but not worth when new data floods in on regular basis.

      I will post it on some other forum. (Will mention this post as well)
      Bioinformaticscally calm

      Comment


      • #4
        This is not a regular database and may be the fastest way of finding files (http://linux-sxs.org/utilities/updatedb.html) and http://en.wikipedia.org/wiki/Locate_%28Unix%29.

        You could look into running this as a cron job each day so you would not need to worry about it.

        Comment


        • #5
          Thank you for your valuable suggestions. I didn't know if anything of this type existed.
          I'm on a cluster and do not have much rights.
          I will definitely take these suggestions into consideration.
          Bioinformaticscally calm

          Comment


          • #6
            I'm doing wrong here:
            find . -name "*.fastq.*" | xargs grep "96_CN"
            It finds .fastq files and in them, it tries to look/grep "96_CN".
            fastq.gz are binary files and it would definitely take ages to grep.

            Didn't try this on small directories first.

            --
            The correct one I got as:

            Code:
            time find . -name "*96_CN_11_B21_M1_C3_P2*"
            Got path and file names needed with time.
            real 0m0.434s
            user 0m0.037s
            sys 0m0.152s
            Last edited by bio_informatics; 03-04-2015, 05:44 AM.
            Bioinformaticscally calm

            Comment


            • #7
              Can you try this?

              Code:
              $ find . -type f -name "*.fastq.*" | grep "96_CN"

              Comment


              • #8
                Genomax:
                I was making a horrible error in my command.
                Morning with fresh mind picked it up instantly.
                Bioinformaticscally calm

                Comment


                • #9
                  Consider adding "-type f" to your find command since you are only looking for files.

                  Comment


                  • #10
                    That worked.

                    Thanks much for your help and time.
                    Bioinformaticscally calm

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    27 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    30 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    26 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X