SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
RNA-Seq: Characterizing short read sequencing for gene discovery and RNA-Seq analysis Newsbot! Literature Watch 0 01-17-2012 06:50 AM
RNA-Seq: Detection, annotation and visualization of alternative splicing from RNA-Seq Newsbot! Literature Watch 0 01-10-2012 05:00 AM
RNA-Seq: ReCount: A multi-experiment resource of analysis-ready RNA-seq gene count da Newsbot! Literature Watch 0 11-18-2011 03:20 AM
RNA-Seq: Length Bias Correction for RNA-seq Data in Gene Set Analyses. Newsbot! Literature Watch 0 01-22-2011 03:02 AM
RNA-Seq: Meiosis-specific gene discovery in plants: RNA-seq applied to isolated Arabi Newsbot! Literature Watch 0 12-21-2010 03:00 AM

Reply
 
Thread Tools
Old 09-07-2010, 12:28 PM   #1
rongronghai
Junior Member
 
Location: wa

Join Date: Sep 2010
Posts: 2
Default RNA-seq & gene annotation

I have RNA-seq data and need the count of reads at gene level. To do this using BEDTOOLS I need an gene annotation file. I download the hg18 (group: Genes and Gene Prediction Tracks ; track: UCSC Genes) from UCSC in BED format. However, it seems that each line in this file represent an isoform instead of a gene (see the two lines below have the same position but different IDs).

chr7 50625258 50817758 uc003tpk.1 0 - 50628142 50767513 0 19
chr7 50625258 50817758 uc010kzb.1 0 - 50628142 50709814 0 18

What I need actually is each line for a gene. How can I get that? Thanks~
rongronghai is offline   Reply With Quote
Old 09-07-2010, 01:29 PM   #2
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default

use the mergeBed feature in BEDtools to give yourself a non-redundant list.
RockChalkJayhawk is offline   Reply With Quote
Old 09-07-2010, 02:22 PM   #3
rongronghai
Junior Member
 
Location: wa

Join Date: Sep 2010
Posts: 2
Default

Quote:
Originally Posted by RockChalkJayhawk View Post
use the mergeBed feature in BEDtools to give yourself a non-redundant list.
Thanks for your reply~ I have checked the mergeBed, but I am a little worried about the merged "feature" it gives. It may merge isoforms originated from different genes, right? If so, the resulting "feature" may have no biological meaning.
rongronghai is offline   Reply With Quote
Old 09-07-2010, 03:19 PM   #4
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default

Quote:
Originally Posted by rongronghai View Post
Thanks for your reply~ I have checked the mergeBed, but I am a little worried about the merged "feature" it gives. It may merge isoforms originated from different genes, right? If so, the resulting "feature" may have no biological meaning.
Not if you use the strand option.
RockChalkJayhawk is offline   Reply With Quote
Old 09-08-2010, 02:35 AM   #5
epigen
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 101
Default

You could also use htseq-count from the HTSeq package (developer Simon Anders is active on this forum) with a GTF file, e.g. from ENSEMBL. I constructed a fake GTF file from the RefSeq genes with some non-professional Perl scripts that take into account that different genes can overlap (on the same strand!). There may be a professional conversion tool out there or the possibility to download UCSC genes in GTF format.
epigen is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:16 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO