Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • question?

    Hi
    I have a file that has a one column like this:

    ENSSSCG00000000005|ENSSSCT00000000006
    ENSSSCG00000000005|ENSSSCT00000000006
    ATXN10|ENSSSCT00000000009
    ENSSSCG00000019685|ENSSSCT00000021280
    LDOC1L|ENSSSCT00000000023
    -
    TSPO|ENSSSCT00000000035
    ENSSSCG00000000032|ENSSSCT00000000034
    ENSSSCG00000000032|ENSSSCT00000000034
    TTLL1|ENSSSCT00000000037
    TTLL1|ENSSSCT00000000037
    TTLL1|ENSSSCT00000000037
    TTLL1|ENSSSCT00000000037
    How can I get rid of lines that start with ENS or those lines that after gene name has |EN.......? Actually I want to keep just gene names like in this example ATXN10,LDOC1L,
    TSPO and TTLL1.
    Anyone know how can I do that? Thanks for your help

  • #2
    Also considering your other, related post I think that perhaps it would be worth it for you to learn Perl and regular expressions. Since that is not going to cut it for you today, here's some sed for you, play around with it (I'm no guru). Hope that helps!

    cheers

    Code:
    sed -e'
      s/|ens(.+?)//g,
      s/ens(.+?)|ens(.+?)//g
    ' inputfile.txt > output.txt
    Code:
    enssscg00000000005|ensssct00000000006
    enssscg00000000005|ensssct00000000006
    atxn10|ensssct00000000009
    enssscg00000019685|ensssct00000021280
    ldoc1l|ensssct00000000023
    -
    tspo|ensssct00000000035
    enssscg00000000032|ensssct00000000034
    enssscg00000000032|ensssct00000000034
    ttll1|ensssct00000000037
    ttll1|ensssct00000000037
    ttll1|ensssct00000000037
    ttll1|ensssct00000000037
    pls use code tags

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin


      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
      Yesterday, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    54 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    50 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    44 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    55 views
    0 likes
    Last Post seqadmin  
    Working...
    X