Seqanswers Leaderboard Ad

**john_mu** · 06-02-2010, 04:33 PM

What do you mean by mapped coordinates? I feel that they are the same thing...

**golharam** · 06-02-2010, 07:18 PM

I have a FASTA file of transcripts. If the read maps to a transcript, I need to convert the coordinates on the transcript to coordinates on the genome. This shouldn't be too hard as long as I have the name and location of the transcript and where the reads maps to on the transcript.

I can determine the genomic coordinates based on the annotation of the transcript. I was hoping someone already had a program to do this.

**Jon_Keats** · 06-02-2010, 10:29 PM

I'm working on the same issue, guess you are talking about the Berger et al. paper? Setting each read to the genome seems relatively easy with a relational database for the reads that map to an exon, so creating a modified SAM file with genome coordinates is relatively easy. But if this is all you want you can just align to genome. So the ____ issue is the spit reads that cross exon-exon boundaries. How to split them and then how not to double count these split reads if you use it for an expression estimate?

**kmcarr** · 06-03-2010, 05:06 AM

If you are into BioPerl there is a module, Bio::Coordinate::GeneMapper, which is designed to do transformations between coordinate systems like this.

Caveats:
- The documentation for this module is sparse.
- The module appears to contain a couple of bugs.
- You really have to grok the BioPerl object model.

**xinchen** · 06-03-2010, 07:59 AM

If you're using Ensembl transcripts, I think Ensembl somewhere stores the set of exons that go into making up each transcript, with corresponding genomic coordinates for exons, so you can probably just write a program to match the numbers there for every transcript.

Otherwise, you can always do your own alignment with a cDNA alignment program like sim4 or splign

**thinkRNA** · 06-03-2010, 09:30 AM

Originally posted by Jon_Keats View Post

I'm working on the same issue, guess you are talking about the Berger et al. paper? Setting each read to the genome seems relatively easy with a relational database for the reads that map to an exon, so creating a modified SAM file with genome coordinates is relatively easy. But if this is all you want you can just align to genome. So the ____ issue is the spit reads that cross exon-exon boundaries. How to split them and then how not to double count these split reads if you use it for an expression estimate?

What is the title of this paper? This is a very intersting methodology of mapping the reads to the "transcriptome" and I am wondering why they need to convert back to the genome?

**golharam** · 06-03-2010, 10:14 AM

@thinkRNA- Papers is "Integrative analysis of the melanoma transcriptome". I've emailed Mike Berger 3 times w/ no response. I'm a bit annoyed.

I'll probably just write my own perl script to do the conversion.

**mrawlins** · 06-03-2010, 01:32 PM

I'm not sure I would trust a transcriptome file, since the inaccuracies in the transcriptome annotation will propagate. The bioinformatics currently available cannot give a perfect transcriptome annotation, and the bias introduced by imperfect annotations may skew your experimental results.

If you have any capability to do the junction mapping and alternative splicing analysis yourself (i.e., mapping to the genome, not the transcriptome), I would go that route. If that's not an option, be sure your analysis includes a discussion of how the results are skewed by the inaccuracies of the transcriptome annotation.

**genomicist** · 05-20-2011, 07:55 AM

Hi golharam! Have you had any success in solving your question, i.e. mapping transcript alignments back to genome coordinates?

**golharam** · 05-20-2011, 08:55 AM

I never managed to reproduce the results in the paper. But I do see translocations in other NGS datasets. I used BWA to map the reads to the ENTIRE genome.

After some discussion here, I'm not convinced mapping to just the known transcriptome is the best approach as novel transcripts may be missed.

As far as mapping transcript coordinates to genomic coordinates, I wrote a Perl script that uses BioPerl to do this.

**mgogol** · 09-19-2011, 11:40 AM

Want to share your script? : ) I'm about to write the same thing. Maybe.

**rskr** · 09-19-2011, 02:49 PM

I think it is a good approach. There are fewer pseudo genes in the transcriptome, so the alignments are more accurate. Not to mention that splice boundaries, are iffy at best with short reads.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Converting reads mapped from transcriptome back to genome

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News