Unconfigured Ad

**GenoMax** · 01-12-2016, 03:59 AM

For reference cross-posted at: https://www.biostars.org/p/172385/

**westerman** · 01-12-2016, 07:38 AM

I'm not sure I agree with Aminm from Biostars. He said that you need corresponding lines (exon, UTR, etc) in the GTF file but I do not believe this is true. Rather I am going with the oh-so-basic idea of bad formatting, in particular the lack of tabs due to the complaint that htseq-count gave you. Can you do a:

Code:

head --lines=2  genes_hello.gtf | od -c

And send that to us? Yes, I know that the chances of a simple formatting mistake are low but I have had such errors all too many times thus always wish to check formatting first.

**super0925** · 01-12-2016, 09:31 AM

Originally posted by westerman View Post

I'm not sure I agree with Aminm from Biostars. He said that you need corresponding lines (exon, UTR, etc) in the GTF file but I do not believe this is true. Rather I am going with the oh-so-basic idea of bad formatting, in particular the lack of tabs due to the complaint that htseq-count gave you. Can you do a:

Code:

head --lines=2  genes_hello.gtf | od -c

And send that to us? Yes, I know that the chances of a simple formatting mistake are low but I have had such errors all too many times thus always wish to check formatting first.

I remember I have added not only exon but also UTR, but I failed. So I only keep the exon and post this thread.
I tried your command

Code:

$ head --lines=2  henan.gtf | od -c
0000000   2   9                               p   r   o   t   e   i   n
0000020   _   c   o   d   i   n   g           e   x   o   n
0000040       2   8   8   2   0   8   2   5               2   8   8   3
0000060   4   9   4   4               .                               +
0000100                               .                               e
0000120   x   o   n   _   i   d       "   E   N   S   E   C   A   E   0
0000140   0   0   0   0   0   0   0   0   0   1   "   ;       e   x   o
0000160   n   _   n   u   m   b   e   r       "   1   "   ;       g   e
0000200   n   e   _   b   i   o   t   y   p   e       "   p   r   o   t
0000220   e   i   n   _   c   o   d   i   n   g   "   ;       g   e   n
0000240   e   _   i   d       "   E   N   S   E   C   A   G   0   0   0
0000260   0   0   0   9   9   9   9   9   "   ;       g   e   n   e   _
0000300   n   a   m   e       "   H   E   L   L   O   "   ;       g   e
0000320   n   e   _   s   o   u   r   c   e       "   e   n   s   e   m
0000340   b   l   "   ;       p   _   i   d       "   P   9   9   9   9
0000360   9   "   ;       t   r   a   n   s   c   r   i   p   t   _   i
0000400   d       "   E   N   S   E   C   A   T   0   0   0   0   0   0
0000420   9   9   9   9   9   "   ;       t   r   a   n   s   c   r   i
0000440   p   t   _   n   a   m   e       "   H   E   L   L   O   -   0
0000460   0   1   "   ;       t   r   a   n   s   c   r   i   p   t   _
0000500   s   o   u   r   c   e       "   e   n   s   e   m   b   l   "
0000520   ;       t   s   s   _   i   d       "   T   S   S   9   9   9
0000540   9   "   ;  \n   1  \t   p   r   o   t   e   i   n   _   c   o
0000560   d   i   n   g           U   T   R  \t   1   1   1   9   3  \t
0000600   1   1   2   0   9  \t   .  \t   +  \t   .  \t   g   e   n   e
0000620   _   b   i   o   t   y   p   e       "   p   r   o   t   e   i
0000640   n   _   c   o   d   i   n   g   "   ;       g   e   n   e   _
0000660   i   d       "   E   N   S   E   C   A   G   0   0   0   0   0
0000700   0   1   2   4   2   1   "   ;       g   e   n   e   _   n   a
0000720   m   e       "   S   Y   C   E   1   "   ;       g   e   n   e
0000740   _   s   o   u   r   c   e       "   e   n   s   e   m   b   l
0000760   "   ;       p   _   i   d       "   P   2   0   9   7   5   "
0001000   ;       t   r   a   n   s   c   r   i   p   t   _   i   d
0001020   "   E   N   S   E   C   A   T   0   0   0   0   0   0   1   3
0001040   0   0   4   "   ;       t   r   a   n   s   c   r   i   p   t
0001060   _   n   a   m   e       "   S   Y   C   E   1   -   2   0   1
0001100   "   ;       t   r   a   n   s   c   r   i   p   t   _   s   o
0001120   u   r   c   e       "   e   n   s   e   m   b   l   "   ;
0001140   t   s   s   _   i   d       "   T   S   S   1   0   1   3   "
0001160   ;  \n
0001162

Do you have idea?

**westerman** · 01-12-2016, 10:09 AM

Exactly what I suspected. You did not separate the fields by tabs but rather by spaces. Compare the first line (no tabs) to the second line (tabs).

**super0925** · 01-13-2016, 03:00 AM

Originally posted by westerman View Post

Exactly what I suspected. You did not separate the fields by tabs but rather by spaces. Compare the first line (no tabs) to the second line (tabs).

Brilliant you are! I will try it !

PS: So do I need to add those UTR, CDS, transcript, etc? I don't think so. I only want to look at differential gene expression of this pseudo gene (I am sure there are reads mapped in this pseudo gene) but not CDS, isoform etc. What is your opinion?

**westerman** · 01-13-2016, 05:22 AM

Originally posted by super0925 View Post

PS: So do I need to add those UTR, CDS, transcript, etc? I don't think so. I only want to look at differential gene expression of this pseudo gene (I am sure there are reads mapped in this pseudo gene) but not CDS, isoform etc. What is your opinion?

I agree. Only add the pseudo gene. Just double check your tabs!

**super0925** · 01-13-2016, 06:04 AM

Originally posted by westerman View Post

I agree. Only add the pseudo gene. Just double check your tabs!

Done! Thx!

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, 07-02-2026, 11:08 AM	0 responses 7 views 0 reactions	Last Post by SEQadmin2 07-02-2026, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 12 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 54 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

how to add a pseduo gene into a GTF file?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News