Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • sorting merged GTFs for Tophat/Cufflinks pipeline

    Hello,

    I am trying to prepare mouse gene annotation GTF for Tophat/Cufflinks pipeline. I would like to merge Gencode GTF with non-overlapping transcripts from the UCSC Genes GTF. I noted that these GTFs are sorted somewhat differently. For example, if a transcript is on a minus strand, Gencode sorted start coordinates for this transcript are in descending order (so start codon comes before stop codon), and then back to ascending order for the next transcript that is on the plus strand. In UCSC Genes GTF, however, a transcript on a minus strand is also in ascending order (so stop codon comes before start codon).

    In both of these formats, however, if there is an exon that is shared between 2 alterative transcript splice isoforms, it is listed amongst the other exons for the first exon, and then it is listed again amongst the second isoform exons, even though its coordinate is now out of order overall, since it was preceded by later exons of the first transcript when its features were listed first.

    The first question I have is: if I just fuse these GTFs, and as a result (in addition to the above differences) will have entries for the same chromosome interrupted by entries from other chromosome (not in the middle of transcripts though), would this limit its utilization by the Tophat/Cufflinks pipeline?

    The second question is: if I sort merged GTF like this: LC_ALL="C" sort -k 1,1 -k 4,4n input.gtf > sorted.gtf, it would sort everything by chromosomes and in ascending start coordinates order, but if there is an exon that is shared between 2 alterative transcript splice isoforms, it would just list it twice consecutively amongst the features of the first transcript, instead of placing it later amongst the features of the later, second isoform (as it was in the original GTFs). Would this difference in where shared exon is placed limit GTF’s utilization by the Tophat/Cufflinks pipeline?

    The third question is: which command or script I could use to have the shared exon listed only amongst the features of transcript isoforms in which it participates (like in the original GTFs)?

    Would appreciate help, thank you!

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 08:47 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
59 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
54 views
0 likes
Last Post seqadmin  
Working...
X