Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting DNA position to transcript position

    Hi Friends,

    I have a simple problem that many of you must have considered before me. I have a DNA position showing variation (~SNP) within an exon of a gene/transcript. Is there already a script out there to convert a DNA position to a "transcript position" given a GTF file? Would be really happy to use that script in that case.

    Thanks!
    Boel

  • #2
    I think you mean you have chromosomal position such as chr1:222222 and dna change A>T and you want to know the coding sequence change with respect to start of a coding sequence like ATG. If this isn't what you want, then give example. If this is what you want, there is problem in that there may be more than one version of the coding sequence called isoform you have to decide which isoform you want thats probably why no tool will do this automatically. I have done it by myself based on data from ensembl definition of exons, i found errors in ucsc browser which is another place you can go. The problem is I want highly accurate manually annotated exons ensembl worked best for me. There are alot of other issues that I won't go into. its not as straght forward as seems to be most people have genes of interest in which case you have to prepare it yourself.

    Comment


    • #3
      Hi husamia, and thanks for your reply.
      No, I am not interested in the coding consequence, just interested in the position in the transcript, in the mRNA sequence.

      Like if the DNA pos. is chr1:30000, and this falls within the gene X's first exon, that I want to know the position in the mRNA position (pos 1 if gene X start at pos chr1:30000) . If a gene has several isoforms this will be reflected in my GTF file. A fairly simple mathematical exercise, just very nitty gritty to do, hence just wanted to hear if someone had a simple script. Thanks though.

      Comment


      • #4
        I had to do this exact exercise myself (though going further, to the amino acid as husamia described). I wrote my own script but it is not simple. It makes use of the BioPerl module Bio::Coordinate::GeneMapper which is meant for these types of transformations between coordinate spaces. But to use it everything must be a Bio::SeqFeature object. Since I was working in Arabidopsis I already had a Bio:B::SeqFeature database of TAIR9 set up (back end for GBrowse). If you are conversant with some serious BioPerl I could offer some guidance.

        Comment


        • #5
          Hi kmcarr,

          I'm looking into biopython, and there is some functionality there. Might cross over to BioPerl if I feel the need later on. Thanks a lot.

          Comment


          • #6
            drop me an email @ joachim dot deschrijver at ugent dot be

            I have such a script ready in Perl that you could use

            Comment


            • #7
              Ensembl's variant effect predictor may be of use, here. If you enter in a genomic position and allele(s) it will let you know the position in the cDNA and the protein (if there is one) and the amino acid change. Have a look at the example:



              It's available online, or through the API:



              Email us at [email protected] for more help.

              Comment


              • #8
                Originally posted by Giulietta View Post
                Ensembl's variant effect predictor may be of use, here. If you enter in a genomic position and allele(s) it will let you know the position in the cDNA and the protein (if there is one) and the amino acid change. Have a look at the example:



                It's available online, or through the API:



                Email us at [email protected] for more help.
                The link [http://uswest.ensembl.org/info/website/upload/var.html] gives 404 error but I think the correct link is [http://uswest.ensembl.org/Homo_sapie...oadVariations]

                Comment


                • #9
                  Originally posted by husamia View Post
                  Sorry about the broken link- we will endeavor to fix it.

                  The link at www.ensembl.org is working:



                  Try to change uswest to www (and go back to the UK site if it redirects you again!) The UploadVariations link you quote is not quite the one I was trying to point you to.

                  Cheers.

                  Comment


                  • #10
                    Originally posted by Boel View Post
                    Hi kmcarr,

                    I'm looking into biopython, and there is some functionality there. Might cross over to BioPerl if I feel the need later on. Thanks a lot.
                    Hi Boel, could you share the biopython functionality you used for converting the genomic coordinates to transcript coordinates? I have gff file where I would like to convert the genomic coordinates of utr and cds to transcript coordinates, but I am having a hard time finding a script or function that could do this. Thanks!

                    Comment


                    • #11
                      Ensembl VEP is a best bet for custom annotation (fast, robust, reliable, and easily automated)


                      Comment


                      • #12
                        Originally posted by m_two View Post
                        As far as I understand from the documentation, the ensembl vep requires variant information as input. The sites I would like to convert are not SNP positions, but miRNA target sites-- so I could not use vep for that conversion.

                        Comment


                        • #13
                          You basically need to subtract the position of the transcription start site from the position of the variant. This info is in several places. The source I use is the UCSC Table Browser.



                          The values for clade genome asssembly should be obvious.

                          Group Genes and Gene Predictions
                          Track RefSeq Genes
                          table refGene
                          output format all fields from selected table
                          output file refGene_human (or whatever your organism is)
                          file type returned gzip (speeds up download a lot)

                          Unzip the file and either load it into an SQL table set up with the refGene schema (click the button describe table schema for info) or programmatically search the unzipped text file for your gene to pull its TSS.

                          If you don't know databases, searching the plain text will be faster in the short run. But, if this is part of a major pipeline you will be running a lot, it would be worthwhile to become comfortable with a relational database system and embedding calls to that database inside your language of choice. That may sound like a major hurdle, but all the info you need is on the web. Message me, if you need help getting started to find the resources to learn this.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM
                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 06:37 PM
                          0 responses
                          10 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, Yesterday, 06:07 PM
                          0 responses
                          9 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-22-2024, 10:03 AM
                          0 responses
                          51 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-21-2024, 07:32 AM
                          0 responses
                          67 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X