Hi all,
I am a beginner of python and just installed the HTSeq package.
my question is:
after read in the GTF file with the HTSeq.GFF_reader function, is there an efficient way
to exact certain lines from the GTF file?
now, I have three gene lists: high expression, medium expression and low expression,
with first column the gene name, and the second column counts of RNA-seq (generated by htseq.count)
one way I can do is:
gtf_file=HTseq.GFF_reader(ifile)
genelist= open(highexpression)
for line in genelist:
line_list=line.split()
for feature in gtf_file:
if feature.name==linelist[0]: # basically check the gene_name.
do something
but it is really slow because of the nested loop.
I want to plot a TSS plot as shown here http://www-huber.embl.de/users/ander...c/tss.html#tss
but I want to group Transcription start sites according to their expression levels(high, medium and low), and then plot three curves together.
-------------------------------------------------------------
one comment on the TSS plot is that when Dr.Simon Anders (the package creator, thanks!) drew
the TSS plots, he used the start point of exon1 as TSS, it is not correct.
TSS is the transcription start site,
the start point of the exon1 is the translation start site.
to plot a TSS plot, we need the GTF file containing the 5UTR information
which can be obtained here http://genomewiki.ucsc.edu/index.php..._or_gff_format
I am a beginner of python and just installed the HTSeq package.
my question is:
after read in the GTF file with the HTSeq.GFF_reader function, is there an efficient way
to exact certain lines from the GTF file?
now, I have three gene lists: high expression, medium expression and low expression,
with first column the gene name, and the second column counts of RNA-seq (generated by htseq.count)
one way I can do is:
gtf_file=HTseq.GFF_reader(ifile)
genelist= open(highexpression)
for line in genelist:
line_list=line.split()
for feature in gtf_file:
if feature.name==linelist[0]: # basically check the gene_name.
do something
but it is really slow because of the nested loop.
I want to plot a TSS plot as shown here http://www-huber.embl.de/users/ander...c/tss.html#tss
but I want to group Transcription start sites according to their expression levels(high, medium and low), and then plot three curves together.
-------------------------------------------------------------
one comment on the TSS plot is that when Dr.Simon Anders (the package creator, thanks!) drew
the TSS plots, he used the start point of exon1 as TSS, it is not correct.
TSS is the transcription start site,
the start point of the exon1 is the translation start site.
to plot a TSS plot, we need the GTF file containing the 5UTR information
which can be obtained here http://genomewiki.ucsc.edu/index.php..._or_gff_format
Comment