Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • perl?

    Hi,
    I have a file like this:
    ID : 741158
    PARENT ID : 9605
    RANK : species
    GC ID : 1
    MGC ID : 2
    SCIENTIFIC NAME : Homo sp. Altai
    GENBANK COMMON NAME : Denisova hominin
    //
    ID : 756884
    PARENT ID : 9598
    RANK : subspecies
    GC ID : 1
    MGC ID : 2
    SCIENTIFIC NAME : Pan troglodytes ellioti
    //
    I need just ID number if rank :species. so for this example the uotput should be :741158.
    my perl script is like this:
    #!/usr/bin/perl -w
    use strict;
    use warnings;


    open (FILE, 'm.txt');
    while (my $p = <FILE>){
    if ($p =~ /^\/\/\n/){
    last;
    }elsif ($p =~ /GC ID : 1/){
    next;
    }elsif ($p =~ /MGC ID : 2/){
    next;
    }elsif ($p =~ /SCIENTIFIC NAME :\D/){
    next;
    }elsif ($p =~ /\bspecies$/){
    print "ID number";?????
    }
    }

    Any sugeestion? Thanks.

  • #2
    This should work if the file is formatted exactly as you've shown:

    Code:
    open (FILE, 'm.txt');
    while(<FILE>) {
      if ($_ =~ m/^ID :/ ) {
        @id = split(/ : /,$_);
      }
      if ($_ =~ m/\bspecies$/) {
        print $id[1];
      }
    }
    Note that it's pretty fragile -- it doesn't do any checking and assumes the input is just as you've shown. In particular: (1) Every record must have an ID; (2) the ID must always come before the rank; (3) the ID line must have a space, colon, space and then the ID number; (4) Any line that ends with the word "species" will cause the ID to be printed, so "subspecies" (etc) must always be one word and "species" shouldn't appear as the final word of any other line.

    Comment


    • #3
      Hi thurisaz,
      Thanks for your answer. All your assumptions are true but your code is not working.

      Comment


      • #4
        Hi,

        Just to be clear: I omitted the initial "#!/usr/bin/perl -w" line. If you want to copy & paste into a file, you will need to include that, like this:

        Code:
        #!/usr/bin/perl -w
        
        open (FILE, 'm.txt');
        while(<FILE>) {
          if ($_ =~ m/^ID :/ ) {
            @id = split(/ : /,$_);
          }
          if ($_ =~ m/\bspecies$/) {
            print $id[1];
          }
        }
        If that code isn't working for you, then please let me know what exactly is going wrong. I just copied & pasted to be sure and it seems to work fine.

        Comment


        • #5
          I just copy your code but the error is :
          Use of uninitialized value in print at 110.pl line 13, <FILE> line 3
          for that I defined array and variable ID (my) but noting change.
          Thanks.

          Comment


          • #6
            An entry at around line 13 in your file (m.txt) is breaking the script; it seems like it's because there is a line that ends in "species" _before_ an ID has been provided. Like I said, it's a fragile script that assumes everything is well-behaved. A few changes will make sure that ID has been assigned before printing and also clears the ID at the end of each record:

            Code:
            #!/usr/bin/perl -w
            
            open (FILE, 'm.txt');
            while(<FILE>) {
                if ($_ =~ m/^ID :/ ) {
                  @id = split(/ : /,$_);
                }
                if ($_ =~ m/\bspecies$/ && $id[1]) {
                  print $id[1];
                }
                if ($_ =~ m?^//$?) {
                  $id[1]=0;
                }
            }
            It sounds like you have something unexpected going on with your input file, though, so I strongly recommend having a good look at it, especially since the script makes so many assumptions.
            Last edited by thurisaz; 07-18-2011, 05:02 AM.

            Comment


            • #7
              Thanks but it is still not working. Each time I just used exactly that file posted on this page.It is really strange. But anyway thanks so much for your help.

              Comment


              • #8
                Originally posted by semna View Post
                I just copy your code but the error is :
                Use of uninitialized value in print at 110.pl line 13, <FILE> line 3
                for that I defined array and variable ID (my) but noting change.
                Thanks.
                Make sure you define your id-array with
                Code:
                my @id;
                and not
                Code:
                my $id;
                If this is not it, just copy & paste your code here.

                Comment


                • #9
                  unix script

                  ID : 741158
                  PARENT ID : 9605
                  RANK : species
                  GC ID : 1
                  MGC ID : 2
                  SCIENTIFIC NAME : Homo sp. Altai
                  GENBANK COMMON NAME : Denisova hominin
                  //

                  At the command prompt you can

                  more filename.txt | egrep ID | awk '{print :$3}' | more
                  that will give you your list. of IDs and the colon in front.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Recent Innovations in Spatial Biology
                    by seqadmin


                    Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

                    3D Genomics
                    While spatial biology often involves studying proteins and RNAs in their...
                    Yesterday, 07:30 PM
                  • seqadmin
                    Advancing Precision Medicine for Rare Diseases in Children
                    by seqadmin




                    Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                    12-16-2024, 07:57 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 12-30-2024, 01:35 PM
                  0 responses
                  21 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-17-2024, 10:28 AM
                  0 responses
                  41 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-13-2024, 08:24 AM
                  0 responses
                  55 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-12-2024, 07:41 AM
                  0 responses
                  40 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X