SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
TargetScan v6.1 genome coordinates jmw86069 Bioinformatics 1 07-02-2013 06:01 AM
Get allele frequencies for specific coordinates from a .bam file mehc Bioinformatics 1 10-28-2011 01:05 PM
BAM file to Histogram on UCSC Genome Browser qnc Bioinformatics 3 10-14-2011 07:11 AM
Upload Bam file to custom track UCSC Genome Browser gabrielw Bioinformatics 4 06-15-2011 12:26 PM
what is the file size for a 30X human genome sequencing file, raw and BAM? RNA-seq Illumina/Solexa 2 04-15-2011 12:27 PM

Reply
 
Thread Tools
Old 05-18-2012, 11:25 AM   #1
pbluescript
Senior Member
 
Location: Boston

Join Date: Nov 2009
Posts: 224
Default Going from transcriptome to genome coordinates with a bam file

I'm venturing into RNA-editing with mouse, and one of the most common methods to avoid false positives includes mapping to a transcriptome or a custom set of junction sequences followed by mapping to a genome. That's the easy part.

Once I have the mapped reads for the transcriptome, does anyone know of a good tool/method to convert those coordinates to genome coordinates while updating CIGAR strings to show split reads?
pbluescript is offline   Reply With Quote
Old 05-22-2012, 01:04 PM   #2
cedance
Senior Member
 
Location: Germany

Join Date: Feb 2011
Posts: 108
Default

What you are asking for, is essentially, another mapping, isn't it? There should be a map of transcriptome to your genome constructed first (maybe easier with a genome annotation available? ) and then you want to map your reads from transcriptome to that of the genome with the constructed map. This is the idea I could think of. Certainly seems a nice problem to invest some time for me. I am not aware of any existing tools that do this. I'll try to work a bit on this and see if I get anywhere and post back if I have something going on.

Just 1 question, did you construct your transcriptome yourself (from an annotation file)? Or do you have a GFF file at all?
cedance is offline   Reply With Quote
Old 05-23-2012, 07:19 AM   #3
pbluescript
Senior Member
 
Location: Boston

Join Date: Nov 2009
Posts: 224
Default

Quote:
Originally Posted by cedance View Post
What you are asking for, is essentially, another mapping, isn't it? There should be a map of transcriptome to your genome constructed first (maybe easier with a genome annotation available? ) and then you want to map your reads from transcriptome to that of the genome with the constructed map. This is the idea I could think of. Certainly seems a nice problem to invest some time for me. I am not aware of any existing tools that do this. I'll try to work a bit on this and see if I get anywhere and post back if I have something going on.

Just 1 question, did you construct your transcriptome yourself (from an annotation file)? Or do you have a GFF file at all?
It's essentially a solved problem since this is what mappers like Tophat do. However, my python skills are not sufficient to figure out how the code works so that I could implement it on a separate bam file. Any help you could provide would be great.
I'm using mm9 with a UCSC gtf.
pbluescript is offline   Reply With Quote
Old 05-23-2012, 09:44 AM   #4
golharam
Member
 
Location: Philadelphia, PA

Join Date: Dec 2009
Posts: 55
Default

Use Tophat. You will need a GTF file of exons, CDS, etc. Tophat will map to the known transcriptome then map to the rest of the genome.
golharam is offline   Reply With Quote
Old 05-23-2012, 10:27 AM   #5
pbluescript
Senior Member
 
Location: Boston

Join Date: Nov 2009
Posts: 224
Default

Quote:
Originally Posted by golharam View Post
Use Tophat. You will need a GTF file of exons, CDS, etc. Tophat will map to the known transcriptome then map to the rest of the genome.
Tophat's output isn't sufficient for what I want to do. One reason is that a common filtering step for RNA edits is based on MAPQ, which Tophat doesn't output in a manner correlating with quality.
pbluescript is offline   Reply With Quote
Old 05-23-2012, 10:37 AM   #6
cedance
Senior Member
 
Location: Germany

Join Date: Feb 2011
Posts: 108
Default

Excuse the bad terminology. The mapping I meant is not the generic "mapping" terminology associated with mapping reads to your genome. Rather a mapping in a "function" sense, or association, if you will.

Rewriting, you'll need an association of every coordinate of your transcriptome to that of your genome. Imagine a read starting at chromosome "Chr1" and position "1500" and its CIGAR string is "80M". Imagine that, if you mapped to your reference genome, the read's CIGAR string would be "30M60N50M". This of course means that the read is spliced in this position. For you to be able to do this, the only way I could think of right now is for you to have known that 1500-1529 of your transcriptome corresponds to 1500-1529 of your reference genome. However, 1530-1579 of your transcriptome corresponds to 1530+60 = 1590 to 1639. Hence the need for association of transcriptome to genome.

Going by this logic, if your GTF/GFF file for your transcriptome and genome have similar gene ids (or you know which RNA id of your transcriptome corresponds to which gene, and its coordinates), then, probably it might be possible to establish this association. In case I once again confused you or I understood it totally wrong, excuse the mess!
cedance is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:47 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO