Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with awk

    Hi guys

    My knowledge on shell linux/unix is pretty basic. I need some help with a command I don't think is working, it may be something simple, which I find is usually the case with informatics!

    One line of my dataset looks like this:

    "exonic","GPR153","synonymous SNV","NM_207370:c.A144G.T48T",,,0.68,0.81,0.60,rs11590458,,1,6237409,6237409,T,C,"het","204","49","60"

    i want to pull out those lines which have values in columns 7, 8 and 9 (in this case the values in this line are 0.68,0.81,0.60,)

    Is there a simple way of doing this using awk?

    I initially tried matching those lines with no values in those columns with this command:

    awk '{ if ($7=="" && $8=="" && $9=="" && $10=="") print $0'} | wc -l

    but it doesn't work

    Any advice is appreciated!

    Cheers

  • #2
    There is a fairly simple way, you were on the right track.

    try:

    awk '{print $7,$8,$9}' FS="," < (filename)

    and then you can redirect the stdout to a new file
    Hopefully this helps!

    Comment


    • #3
      You can also set the field separator within the awk code using 'BEGIN{FS=",";}' which can be helpful. Also, if you're checking for the presence of values, make sure to use !="" which checks that a value is not empty

      awk 'BEGIN{FS=",";}{ if ($7!="" && $8!="" && $9!=""){print $0}}' input_filename > output_filename

      Hope this helps!

      Comment


      • #4
        That works, thanks guys!

        Comment


        • #5
          I'd have done this:
          awk -F"," '$7 && $8 && $9' input > output

          If only the condition is specified, the default action is already "print $0". Also, "$N" as a condition should only return FALSE if the column is empty (but will be TRUE even for a 0 or a space).

          Comment


          • #6
            try
            cut -f 7-9 inputfile > outputfile

            where 7-9 means cut columns 7 8 and 9
            if you want everything except those columns then use something like --complement

            Comment


            • #7
              Originally posted by husamia View Post
              try
              cut -f 7-9 inputfile > outputfile

              where 7-9 means cut columns 7 8 and 9
              if you want everything except those columns then use something like --complement
              I think vanisha wanted lines in output, not just columns ("i want to pull out those lines which have values in columns 7, 8 and 9").

              Comment


              • #8
                hi guys!
                i am dealing with the problem where in single file 3 different columns are there.

                eg:
                1 7 -0.2567
                2 8 -0.4321
                2 10 -0.4560
                3 12 -0.2210
                4 12 -0.3243
                5 11 -0.9870
                5 15 -0.7860
                6 16 -0.2345

                description:whn (1) link with (7) its energy is [-0.2567]
                same things follws

                if u look at col 1
                (2) is linked with (8) &(10) of column two and gives respective energy( -0.4321)& (-0.4560) .in those cases i would prefer the least energy say (-0.4321)

                same case whn we look into coumn 2 value (12) two option exist like 3 (-0.2210) &
                4( -0.3243).i would prefer least one as mention early (-0.2210)


                so finally the result should something like below:

                1 7 -0.2567
                2 8 -0.4321
                3 12 -0.2210
                5 15 -0.7860
                6 16 -0.2345

                can any one hlep me to get this...
                thanks in advance....

                Comment


                • #9
                  Hi rahulvarma,

                  Looking at your question, while it could be achieved by scripting in awk it is a more complex question than just pulling lines with a value in certain columns. You would need to script it according to the parameters you've set. Something like the following nearly works in awk, it currently gives the wrong results but it might give you a start in resolving your problem.

                  Code:
                  #!/usr/bin/awk
                  BEGIN{
                          lastnumber1=0;
                          lastnumber2=0
                          energy=0;
                          toprint=0;
                          alreadydone=0;
                  }
                  
                  {
                          if (lastnumber1==$1) {
                                  if ($3>energy and lastnumber1 != 0) {
                                          toprint = $0;
                                  }
                          } else if (lastnumber2 == $2) {
                                  if ($3>energy and lastnumber2 != 0) {
                                          toprint = $0;
                                  }
                          } else {
                                  if (toprint != 0) {
                                          print toprint;
                                  } else {
                                          print lastline;
                                  }
                                  toprint = 0;
                          }
                  
                          lastnumber1 = $1;
                          lastnumber2 = $2;
                          energy = $3;
                          lastline = $0;
                  }
                  On your test set it gives the results:
                  1 7 -0.2567
                  2 10 -0.4560
                  4 12 -0.3243
                  5 15 -0.7860

                  Good luck!

                  Comment


                  • #10
                    hi pj_oz,
                    thank you soo much for your effort.
                    appreciate your work.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    27 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    30 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    26 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X