Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • TSS on "-" strand (Ensembl notation)

    Hello,

    I have a silly question with respect to finding a TSS position from the Ensembl annotation provided by Biomart, please correct me if I am wrong:

    They have the column "Gene start", which I guess corresponds to the TSS if the gene is on the "+" strand. They have the column "Gene end", which I guess corresponds to the TSS if the gene is on the "-" strand.

    Is it correct?

  • #2
    Yes, it's one of them. Briefly playing with Biomart yielded the outermost TSS for me (gene start/end). I assume you can get all of the TSSs (annotated in Ensembl at least) through Biomart, but if not, you can just parse the gtf file from Ensembl (http://www.ensembl.org/info/data/ftp/index.html).

    Comment


    • #3
      thank you!
      but what do you mean by " one of them"?

      basically, the thing that I want to check
      is whether the TSS of a gene in the "-"
      strand is marked as " Gene start", or "Gene end"?

      And is it the same for "Transcript start" and "Transcript end"?

      Comment


      • #4
        to put it even more explicit:

        is it true that what is marked as "gene start"
        is not the TSS, but just the smallest of the
        two values for the TSS and the 3' gene end?
        Because, looking at the values at Biomart,
        I see that the Gene start is always smaller than
        Gene end (which can not be if the gene is on the "-" strand.
        ?
        Last edited by rebrendi; 12-20-2011, 02:52 PM.

        Comment


        • #5
          The only change is that there are often multiple TSSs for a single gene.

          Comment


          • #6
            Originally posted by dpryan View Post
            The only change is that there are often multiple TSSs for a single gene.
            can they be found as
            "Transcript start" &
            "Transcript end"
            instead of "Gene start"
            and "Gene end"?

            Comment


            • #7
              and I still do not understand,
              what the "Gene start" means
              in this notation.
              It is not the real start of the gene?
              it is not necessarily the TSS?

              Comment


              • #8
                Originally posted by rebrendi View Post
                can they be found as
                "Transcript start" &
                "Transcript end"
                instead of "Gene start"
                and "Gene end"?
                It appears so. I checked a single gene that I happen to remember and it looks correct. You'll still have to take strand into account (though that's trivial).

                Comment


                • #9
                  yes, I just want to be sure that I understand that trivial thing correctly:

                  is it true that the TSS of the gene on the "-" strand
                  is marked as "Gene end"?

                  Comment


                  • #10
                    Originally posted by rebrendi
                    is it true that the TSS of the gene on the "-" strand
                    is marked as "Gene end"?
                    It is true that a TSS of a gene on the "-" strand is marked as "gene end". Look at, for example, Ing2, which has 3 transcript start sites.

                    Comment


                    • #11
                      thank you, I got the point.

                      Comment


                      • #12
                        something worth noting Ensembl coordinate conventions are always 5" to 3" regardless of strand so on reverse strand genes the 3" most coordinate is the first coordinate

                        Gene start and end represent the outer limits of that gene loci so the 5" most coordinate of one of its transcripts and the 3" most coordinate of one of its transcripts

                        TSS as you have already been told is best defined by the start coordinate of a particular transcript which is the 5" most coordinate for a forward strand gene and the 3" most coordinate for a reverse strand gene

                        Comment


                        • #13
                          It appears that the Ensembl annotation specifies "gene start" as the ORF start, but not as the TSS. For several manually checked genes the TSS does not coincide with the ORF start or end.

                          Could someone please comment on how to find the TSS coordinates best?

                          Comment


                          • #14
                            Gene Start in Ensembl is the 5" most coordinate of the transcripts that make up the gene. If none of the transcripts have UTRs this will be the 5" most ORF start though at least in Human and Mouse it is very rare for transcripts to have no UTR

                            In ensembl to get specific coordinates for a single transcript you should be looking at the transcript table again the position in the table is the 5" most position in the transcript

                            Comment


                            • #15
                              Laura, The output file that Biomart suggests me includes the following potentially interesting columns: "gene start", "transcript start", "gene end", "transcript end". It appears that none of these columns contains the TSS coordinate that I am looking for. Could you please suggest exactly where to look?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 08:47 AM
                              0 responses
                              14 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              54 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X