Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • print first occurence of a line

    Hi all,

    I extracted ORFs from a initial fasta file and now I want to get the longest ORF for each transcript.

    After having extracted the size of the ORFs with faSize and sorted them by size, the code I was used to use is:

    Code:
    perl -ane'print unless $x{$F[0]}++'
    This time I have a problem using the perl command.

    After having extracted the size and sorted the transcripts I have something like this:

    Code:
        Singlet_1000_61 3844
    
        Singlet_2000_73 3508
    
        Singlet_1000_62 3081
    
        Singlet_2000_62 3008
    
        Singlet_3500_48 2973
    
        Singlet_4000_48 2964
    
        Singlet_3500_54 2863
    
    What I want is:
    
        Singlet_1000_61 3844
    
        Singlet_2000_73 3508
    
        Singlet_3500_48 2973
    ...

    The perl command is not working in this case.

    Do you have any suggestions on how I can make it work?

    Or a awk command?

    Thanks for help

  • #2
    Try stackoverflow.com, they love this sort of thing, but put your thick-skin on.

    Comment


    • #3
      Or, as a hint, use a hash to keep track of if a transcript has been printed or not. Hashes are a wonderful data structure to know about.

      Comment


      • #4
        I don't get it. What do the numbers in your input refer to? If the first one is the ID of the transcript and the last one the length I would do

        Code:
        tac list | awk -F "_" '{t[$2]=$0}END{for (i in t)print t[i]}'
        If the ID of each transcript is the first column then it is even simpler

        Code:
        tac list | awk '{t[$1]=$0}END{for (i in t)print t[i]}'
        Pipe sort the output if needed.
        Since you already sorted them by size the script starts by the end (tac) and only remembers the last occurrence of each ID.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        7 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        7 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        66 views
        0 likes
        Last Post seqadmin  
        Working...
        X