Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Perl script: Make Statistics Of Mirna Abundances For Many Samples

    Dear All,

    Actually I post the following question a few weeks ago on Biostars (https://www.biostars.org/p/97538/#97599). I got a very nice answer there, but it's perl one liner command. I am eager to sort out the problem with perl hash of hash. Could anyone here give me an answer?

    I need to make statistics of mirna abundances for many samples. Below is an example.
    Code:
    SAMPLE    MIR    ABUNDANCE
    sample1   mir1   30
    sample1   mir3   100
    sample1   mir4   120
    sample2   mir1   40
    sample2   mir2   200
    sample3   mir1   190
    ......
    I want to change the format to below.
    Code:
              sample1    sample2    sample3
    mir1      30           40         190
    mir2      0            200         0
    mir3      190          0           0
    mir4      120          0           0
    ......
    i tried to write perl hash of hash, but was stuck (see below). Could perl export teaches me with this? I greatly appreciate your help!!
    Code:
    open FH, '<', $ARGV[0] or die "open failed:$!";
    my %h;
    while (<>){
            my ($sample, $mir, $abun) = /(.+?)\t(.+)\t(.+)/;
            $h{$sample}{$mir} = $abun; 
    }
    foreach my $sample (keys %h){
            foreach my $mir (keys %{h{$sample}})
                    print "   "      # i am stuck here. Need your help!
    }

  • #2
    see if this works:

    Code:
            my ($sample, $mir, $abun) = /(.+?)\t(.+)\t(.+)/;
            $h{$mir}{$sample} = $abun; 
    }
    foreach my $mir (sort keys %h){
            print "$mir\t";
            foreach my $sample (sort keys %{h{$mir}}){
                    print "$$h{$mir}{$sample}\t;"
            }
            print "\n";
    }
    Last edited by mastal; 05-02-2014, 12:21 PM.

    Comment


    • #3
      Hi mastal, I still have problem, but thanks a lot anyway.

      Comment


      • #4
        What problem are you still having? I think mastal re-organized it correctly. You want to print a line that has a mir, and then the value for each sample. So you would definitely want to have the outer loop be mir, and the inner loop be sample. That way it prints the mir, then on the same line prints each of the sample values.

        mastal's code may have some typos in it (with Perl, it is difficult to tell the difference between a typo and brilliant code, so I am not sure), but I edited it and it works:

        Code:
        foreach my $mir (sort keys %h){
                print "$mir\t";
                foreach my $sample (sort keys %{$h{$mir}}){ # changed h{$mir} to $h{$mir}
                        print "$h{$mir}{$sample}\t"; # changed $$h to $h and \t;" to \t";
                }
                print "\n";
        }

        when I made a little tester it outputs this:
        m1 1.1 1.2 1.3
        m2 2.1 2.2 2.3
        which is correct.
        Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

        Comment


        • #5
          Hi SNPSaurus,
          Thanks a lot for your comments! Actually it's not so easy. If the input data is as follows, then it's ok.
          Code:
          sample1	mir1	1.1
          sample1	mir2	1.2
          sample2	mir1	2.1
          sample2	mir2	2.2
          However, if the input data changes to below, it won't be what i expect. The problem is MISSING VALUE.
          Code:
          sample1	mir1	1.1
          sample1	mir2	1.2
          sample2	mir1	2.1
          sample2	mir2	2.2
          sample3	mir4	3.1
          i am a beginner and am learning perl. I tried best to write a script as follows (i add some comments for easier understanding), but still have problem. Could you please check please? I appreciate your helps!
          Code:
          #!/usr/bin/perl
          use strict;
          use warnings;
          
          open FH, '<', $ARGV[0] || die "open failed $!";
          my %h;
          my %h2;
          while (<FH>){
                  my ($sample, $mir, $abun) = /(\S+?)\t(\S+)\t(\S+)/;
                  $h{$mir}{$sample} = $abun; 
          		$h2{$sample} +=1; #increament to calculate total samples
          }
          
          foreach my $sample_h2 (sort keys %h2){ #print sample header 
          	print "\t$sample_h2";
          }
          print "\n";
          
          foreach my $mir (sort keys %h){
              print "$mir\t";  #print mir name
          	foreach my $sample2(sort keys %h2){ #sort according to sample header
          		foreach my $sample (sort keys %{$h{$mir}}){  #search sample name in %h2 from that in %h
          			if ($sample eq $sample2) {  
          				print "$h{$mir}{$sample}\t"; #when matched print 
          				last;
          			}
          		}
          	}
          	print "\n";
          }
          Last edited by pony2001mx; 05-04-2014, 05:36 AM.

          Comment


          • #6
            I think I see what you are trying to do. Some mir don't have data for all samples. So you construct a list of samples separate from the hash of hashes. You go through the hash of samples, and then go through the list of samples in your hash of hashes, and if they match you print. This is probably better done with an "exist" check, and a printing of a blank if not present:

            Code:
            foreach my $mir (sort keys %h){
                print "$mir\t";  #print mir name
            	foreach my $sample2(sort keys %h2){ #sort according to sample header
            		if (exists $h{$mir}{$sample2}) {
            				print "$h{$mir}{$sample2}\t"; #if exists print 
            		} else {
            			print "\t"; # print a blank if that sample doesn't exist for that mir
            		}
            	}
            	print "\n";
            }
            Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

            Comment


            • #7
              Hi SNPSaurus, Thank you very much! It's really good stuff for me to learn. Thanks.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Advancing Precision Medicine for Rare Diseases in Children
                by seqadmin




                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                12-16-2024, 07:57 AM
              • seqadmin
                Recent Advances in Sequencing Technologies
                by seqadmin



                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                Long-Read Sequencing
                Long-read sequencing has seen remarkable advancements,...
                12-02-2024, 01:49 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 12-17-2024, 10:28 AM
              0 responses
              32 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-13-2024, 08:24 AM
              0 responses
              48 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-12-2024, 07:41 AM
              0 responses
              34 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-11-2024, 07:45 AM
              0 responses
              46 views
              0 likes
              Last Post seqadmin  
              Working...
              X