Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with awk

    Hi guys

    My knowledge on shell linux/unix is pretty basic. I need some help with a command I don't think is working, it may be something simple, which I find is usually the case with informatics!

    One line of my dataset looks like this:

    "exonic","GPR153","synonymous SNV","NM_207370:c.A144G.T48T",,,0.68,0.81,0.60,rs11590458,,1,6237409,6237409,T,C,"het","204","49","60"

    i want to pull out those lines which have values in columns 7, 8 and 9 (in this case the values in this line are 0.68,0.81,0.60,)

    Is there a simple way of doing this using awk?

    I initially tried matching those lines with no values in those columns with this command:

    awk '{ if ($7=="" && $8=="" && $9=="" && $10=="") print $0'} | wc -l

    but it doesn't work

    Any advice is appreciated!

    Cheers

  • #2
    There is a fairly simple way, you were on the right track.

    try:

    awk '{print $7,$8,$9}' FS="," < (filename)

    and then you can redirect the stdout to a new file
    Hopefully this helps!

    Comment


    • #3
      You can also set the field separator within the awk code using 'BEGIN{FS=",";}' which can be helpful. Also, if you're checking for the presence of values, make sure to use !="" which checks that a value is not empty

      awk 'BEGIN{FS=",";}{ if ($7!="" && $8!="" && $9!=""){print $0}}' input_filename > output_filename

      Hope this helps!

      Comment


      • #4
        That works, thanks guys!

        Comment


        • #5
          I'd have done this:
          awk -F"," '$7 && $8 && $9' input > output

          If only the condition is specified, the default action is already "print $0". Also, "$N" as a condition should only return FALSE if the column is empty (but will be TRUE even for a 0 or a space).

          Comment


          • #6
            try
            cut -f 7-9 inputfile > outputfile

            where 7-9 means cut columns 7 8 and 9
            if you want everything except those columns then use something like --complement

            Comment


            • #7
              Originally posted by husamia View Post
              try
              cut -f 7-9 inputfile > outputfile

              where 7-9 means cut columns 7 8 and 9
              if you want everything except those columns then use something like --complement
              I think vanisha wanted lines in output, not just columns ("i want to pull out those lines which have values in columns 7, 8 and 9").

              Comment


              • #8
                hi guys!
                i am dealing with the problem where in single file 3 different columns are there.

                eg:
                1 7 -0.2567
                2 8 -0.4321
                2 10 -0.4560
                3 12 -0.2210
                4 12 -0.3243
                5 11 -0.9870
                5 15 -0.7860
                6 16 -0.2345

                description:whn (1) link with (7) its energy is [-0.2567]
                same things follws

                if u look at col 1
                (2) is linked with (8) &(10) of column two and gives respective energy( -0.4321)& (-0.4560) .in those cases i would prefer the least energy say (-0.4321)

                same case whn we look into coumn 2 value (12) two option exist like 3 (-0.2210) &
                4( -0.3243).i would prefer least one as mention early (-0.2210)


                so finally the result should something like below:

                1 7 -0.2567
                2 8 -0.4321
                3 12 -0.2210
                5 15 -0.7860
                6 16 -0.2345

                can any one hlep me to get this...
                thanks in advance....

                Comment


                • #9
                  Hi rahulvarma,

                  Looking at your question, while it could be achieved by scripting in awk it is a more complex question than just pulling lines with a value in certain columns. You would need to script it according to the parameters you've set. Something like the following nearly works in awk, it currently gives the wrong results but it might give you a start in resolving your problem.

                  Code:
                  #!/usr/bin/awk
                  BEGIN{
                          lastnumber1=0;
                          lastnumber2=0
                          energy=0;
                          toprint=0;
                          alreadydone=0;
                  }
                  
                  {
                          if (lastnumber1==$1) {
                                  if ($3>energy and lastnumber1 != 0) {
                                          toprint = $0;
                                  }
                          } else if (lastnumber2 == $2) {
                                  if ($3>energy and lastnumber2 != 0) {
                                          toprint = $0;
                                  }
                          } else {
                                  if (toprint != 0) {
                                          print toprint;
                                  } else {
                                          print lastline;
                                  }
                                  toprint = 0;
                          }
                  
                          lastnumber1 = $1;
                          lastnumber2 = $2;
                          energy = $3;
                          lastline = $0;
                  }
                  On your test set it gives the results:
                  1 7 -0.2567
                  2 10 -0.4560
                  4 12 -0.3243
                  5 15 -0.7860

                  Good luck!

                  Comment


                  • #10
                    hi pj_oz,
                    thank you soo much for your effort.
                    appreciate your work.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Advancing Precision Medicine for Rare Diseases in Children
                      by seqadmin




                      Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                      12-16-2024, 07:57 AM
                    • seqadmin
                      Recent Advances in Sequencing Technologies
                      by seqadmin



                      Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                      Long-Read Sequencing
                      Long-read sequencing has seen remarkable advancements,...
                      12-02-2024, 01:49 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 12-17-2024, 10:28 AM
                    0 responses
                    33 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-13-2024, 08:24 AM
                    0 responses
                    49 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-12-2024, 07:41 AM
                    0 responses
                    34 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-11-2024, 07:45 AM
                    0 responses
                    46 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X