Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting reads mapped from transcriptome back to genome

    HI - I'm trying to mimic the pipeline of a paper that mapped RNA-Seq reads to the transcriptome, then converted the mapped coordinates to their genomic coordinates.

    Does anyone have an easy way of performing this? I emailed the author but never got a response.

  • #2
    What do you mean by mapped coordinates? I feel that they are the same thing...
    SpliceMap: De novo detection of splice junctions from RNA-seq
    Download SpliceMap Comment here

    Comment


    • #3
      I have a FASTA file of transcripts. If the read maps to a transcript, I need to convert the coordinates on the transcript to coordinates on the genome. This shouldn't be too hard as long as I have the name and location of the transcript and where the reads maps to on the transcript.

      I can determine the genomic coordinates based on the annotation of the transcript. I was hoping someone already had a program to do this.

      Comment


      • #4
        I'm working on the same issue, guess you are talking about the Berger et al. paper? Setting each read to the genome seems relatively easy with a relational database for the reads that map to an exon, so creating a modified SAM file with genome coordinates is relatively easy. But if this is all you want you can just align to genome. So the ____ issue is the spit reads that cross exon-exon boundaries. How to split them and then how not to double count these split reads if you use it for an expression estimate?

        Comment


        • #5
          If you are into BioPerl there is a module, Bio::Coordinate::GeneMapper, which is designed to do transformations between coordinate systems like this.

          Caveats:
          - The documentation for this module is sparse.
          - The module appears to contain a couple of bugs.
          - You really have to grok the BioPerl object model.

          Comment


          • #6
            If you're using Ensembl transcripts, I think Ensembl somewhere stores the set of exons that go into making up each transcript, with corresponding genomic coordinates for exons, so you can probably just write a program to match the numbers there for every transcript.

            Otherwise, you can always do your own alignment with a cDNA alignment program like sim4 or splign

            Comment


            • #7
              Originally posted by Jon_Keats View Post
              I'm working on the same issue, guess you are talking about the Berger et al. paper? Setting each read to the genome seems relatively easy with a relational database for the reads that map to an exon, so creating a modified SAM file with genome coordinates is relatively easy. But if this is all you want you can just align to genome. So the ____ issue is the spit reads that cross exon-exon boundaries. How to split them and then how not to double count these split reads if you use it for an expression estimate?
              What is the title of this paper? This is a very intersting methodology of mapping the reads to the "transcriptome" and I am wondering why they need to convert back to the genome?

              Comment


              • #8
                @thinkRNA- Papers is "Integrative analysis of the melanoma transcriptome". I've emailed Mike Berger 3 times w/ no response. I'm a bit annoyed.

                I'll probably just write my own perl script to do the conversion.

                Comment


                • #9
                  I'm not sure I would trust a transcriptome file, since the inaccuracies in the transcriptome annotation will propagate. The bioinformatics currently available cannot give a perfect transcriptome annotation, and the bias introduced by imperfect annotations may skew your experimental results.

                  If you have any capability to do the junction mapping and alternative splicing analysis yourself (i.e., mapping to the genome, not the transcriptome), I would go that route. If that's not an option, be sure your analysis includes a discussion of how the results are skewed by the inaccuracies of the transcriptome annotation.

                  Comment


                  • #10
                    Hi golharam! Have you had any success in solving your question, i.e. mapping transcript alignments back to genome coordinates?

                    Comment


                    • #11
                      I never managed to reproduce the results in the paper. But I do see translocations in other NGS datasets. I used BWA to map the reads to the ENTIRE genome.

                      After some discussion here, I'm not convinced mapping to just the known transcriptome is the best approach as novel transcripts may be missed.

                      As far as mapping transcript coordinates to genomic coordinates, I wrote a Perl script that uses BioPerl to do this.

                      Comment


                      • #12
                        Want to share your script? : ) I'm about to write the same thing. Maybe.

                        Comment


                        • #13
                          I think it is a good approach. There are fewer pseudo genes in the transcriptome, so the alignments are more accurate. Not to mention that splice boundaries, are iffy at best with short reads.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Essential Discoveries and Tools in Epitranscriptomics
                            by seqadmin




                            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                            04-22-2024, 07:01 AM
                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 11:49 AM
                          0 responses
                          13 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-24-2024, 08:47 AM
                          0 responses
                          16 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          61 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          60 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X