Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • converting genomic coordinates to transcript coordinates

    Hello,
    I need to convert the genomic coordinates for UTR and CDS features in the arabidopsis gff3 gene annotations to transcript coordinates. Can anyone suggest a way to do this using awk or other languages? I have tried to first calculate the length of the features for each ID, so that I can calculate the feature start and end positions for each transcript based on the length. However I am not sure how to write the code for the second part. I have searched for related threads but none of the solution seemed to provide what I want. Is there already a package or script for this kind of conversion? Any suggestion is most welcome!

    Here is a sample of my input data:
    Code:
    more Athaliana_167_TAIR10.gene.gff3 
    ##gff-version 3
    ##annot-version TAIR10
    Chr1    phytozomev10    gene    3631    5899    .       +       .       ID=AT1G01010.TAIR10;Name=AT1G01010
    Chr1    phytozomev10    mRNA    3631    5899    .       +       .       ID=AT1G01010.1.TAIR10;Name=AT1G01010.1;pacid=19656964;longest=1;Parent=AT1G01010.TAIR10
    Chr1    phytozomev10    five_prime_UTR  3631    3759    .       +       .       ID=AT1G01010.1.TAIR10.five_prime_UTR.1;Parent=AT1G01010.1.TAIR10;pacid=19656964
    Chr1    phytozomev10    CDS     3760    3913    .       +       0       ID=AT1G01010.1.TAIR10.CDS.1;Parent=AT1G01010.1.TAIR10;pacid=19656964
    Chr1    phytozomev10    CDS     3996    4276    .       +       2       ID=AT1G01010.1.TAIR10.CDS.2;Parent=AT1G01010.1.TAIR10;pacid=19656964
    Chr1    phytozomev10    CDS     4486    4605    .       +       0       ID=AT1G01010.1.TAIR10.CDS.3;Parent=AT1G01010.1.TAIR10;pacid=19656964
    Chr1    phytozomev10    CDS     4706    5095    .       +       0       ID=AT1G01010.1.TAIR10.CDS.4;Parent=AT1G01010.1.TAIR10;pacid=19656964
    Chr1    phytozomev10    CDS     5174    5326    .       +       0       ID=AT1G01010.1.TAIR10.CDS.5;Parent=AT1G01010.1.TAIR10;pacid=19656964
    Chr1    phytozomev10    CDS     5439    5630    .       +       0       ID=AT1G01010.1.TAIR10.CDS.6;Parent=AT1G01010.1.TAIR10;pacid=19656964
    Chr1    phytozomev10    three_prime_UTR 5631    5899    .       +       .       ID=AT1G01010.1.TAIR10.three_prime_UTR.1;Parent=AT1G01010.1.TAIR10;pacid=19656964
    Chr1    phytozomev10    gene    5928    8737    .       -       .       ID=AT1G01020.TAIR10;Name=AT1G01020
    Chr1    phytozomev10    mRNA    5928    8737    .       -       .       ID=AT1G01020.1.TAIR10;Name=AT1G01020.1;pacid=19655142;longest=1;Parent=AT1G01020.TAIR10
    Chr1    phytozomev10    CDS     8571    8666    .       -       0       ID=AT1G01020.1.TAIR10.CDS.1;Parent=AT1G01020.1.TAIR10;pacid=19655142
    Here is my working code based on the thread here for calculating the UTR and CDS feature length for each transcript:

    Code:
    awk '$3!="gene" && $3!="mRNA"' <Athaliana_167_TAIR10.gene.gff3|awk  '$9~/Parent/{sub(/.*Parent=/,"",$9);sub(/.TAIR10;.*/,"",$9);ID=$9;L[ID]=$5-$4+1}{for(i in L){print $0,L[i]}}'

Latest Articles

Collapse

  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM
  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
25 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
29 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
25 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
52 views
0 likes
Last Post seqadmin  
Working...
X