Seqanswers Leaderboard Ad

**GenoMax** · 01-12-2016, 03:59 AM

For reference cross-posted at: https://www.biostars.org/p/172385/

**westerman** · 01-12-2016, 07:38 AM

I'm not sure I agree with Aminm from Biostars. He said that you need corresponding lines (exon, UTR, etc) in the GTF file but I do not believe this is true. Rather I am going with the oh-so-basic idea of bad formatting, in particular the lack of tabs due to the complaint that htseq-count gave you. Can you do a:

Code:

head --lines=2  genes_hello.gtf | od -c

And send that to us? Yes, I know that the chances of a simple formatting mistake are low but I have had such errors all too many times thus always wish to check formatting first.

**super0925** · 01-12-2016, 09:31 AM

Originally posted by westerman View Post

I'm not sure I agree with Aminm from Biostars. He said that you need corresponding lines (exon, UTR, etc) in the GTF file but I do not believe this is true. Rather I am going with the oh-so-basic idea of bad formatting, in particular the lack of tabs due to the complaint that htseq-count gave you. Can you do a:

Code:

head --lines=2  genes_hello.gtf | od -c

And send that to us? Yes, I know that the chances of a simple formatting mistake are low but I have had such errors all too many times thus always wish to check formatting first.

I remember I have added not only exon but also UTR, but I failed. So I only keep the exon and post this thread.
I tried your command

Code:

$ head --lines=2  henan.gtf | od -c
0000000   2   9                               p   r   o   t   e   i   n
0000020   _   c   o   d   i   n   g           e   x   o   n
0000040       2   8   8   2   0   8   2   5               2   8   8   3
0000060   4   9   4   4               .                               +
0000100                               .                               e
0000120   x   o   n   _   i   d       "   E   N   S   E   C   A   E   0
0000140   0   0   0   0   0   0   0   0   0   1   "   ;       e   x   o
0000160   n   _   n   u   m   b   e   r       "   1   "   ;       g   e
0000200   n   e   _   b   i   o   t   y   p   e       "   p   r   o   t
0000220   e   i   n   _   c   o   d   i   n   g   "   ;       g   e   n
0000240   e   _   i   d       "   E   N   S   E   C   A   G   0   0   0
0000260   0   0   0   9   9   9   9   9   "   ;       g   e   n   e   _
0000300   n   a   m   e       "   H   E   L   L   O   "   ;       g   e
0000320   n   e   _   s   o   u   r   c   e       "   e   n   s   e   m
0000340   b   l   "   ;       p   _   i   d       "   P   9   9   9   9
0000360   9   "   ;       t   r   a   n   s   c   r   i   p   t   _   i
0000400   d       "   E   N   S   E   C   A   T   0   0   0   0   0   0
0000420   9   9   9   9   9   "   ;       t   r   a   n   s   c   r   i
0000440   p   t   _   n   a   m   e       "   H   E   L   L   O   -   0
0000460   0   1   "   ;       t   r   a   n   s   c   r   i   p   t   _
0000500   s   o   u   r   c   e       "   e   n   s   e   m   b   l   "
0000520   ;       t   s   s   _   i   d       "   T   S   S   9   9   9
0000540   9   "   ;  \n   1  \t   p   r   o   t   e   i   n   _   c   o
0000560   d   i   n   g           U   T   R  \t   1   1   1   9   3  \t
0000600   1   1   2   0   9  \t   .  \t   +  \t   .  \t   g   e   n   e
0000620   _   b   i   o   t   y   p   e       "   p   r   o   t   e   i
0000640   n   _   c   o   d   i   n   g   "   ;       g   e   n   e   _
0000660   i   d       "   E   N   S   E   C   A   G   0   0   0   0   0
0000700   0   1   2   4   2   1   "   ;       g   e   n   e   _   n   a
0000720   m   e       "   S   Y   C   E   1   "   ;       g   e   n   e
0000740   _   s   o   u   r   c   e       "   e   n   s   e   m   b   l
0000760   "   ;       p   _   i   d       "   P   2   0   9   7   5   "
0001000   ;       t   r   a   n   s   c   r   i   p   t   _   i   d
0001020   "   E   N   S   E   C   A   T   0   0   0   0   0   0   1   3
0001040   0   0   4   "   ;       t   r   a   n   s   c   r   i   p   t
0001060   _   n   a   m   e       "   S   Y   C   E   1   -   2   0   1
0001100   "   ;       t   r   a   n   s   c   r   i   p   t   _   s   o
0001120   u   r   c   e       "   e   n   s   e   m   b   l   "   ;
0001140   t   s   s   _   i   d       "   T   S   S   1   0   1   3   "
0001160   ;  \n
0001162

Do you have idea?

**westerman** · 01-12-2016, 10:09 AM

Exactly what I suspected. You did not separate the fields by tabs but rather by spaces. Compare the first line (no tabs) to the second line (tabs).

**super0925** · 01-13-2016, 03:00 AM

Originally posted by westerman View Post

Exactly what I suspected. You did not separate the fields by tabs but rather by spaces. Compare the first line (no tabs) to the second line (tabs).

Brilliant you are! I will try it !

PS: So do I need to add those UTR, CDS, transcript, etc? I don't think so. I only want to look at differential gene expression of this pseudo gene (I am sure there are reads mapped in this pseudo gene) but not CDS, isoform etc. What is your opinion?

**westerman** · 01-13-2016, 05:22 AM

Originally posted by super0925 View Post

PS: So do I need to add those UTR, CDS, transcript, etc? I don't think so. I only want to look at differential gene expression of this pseudo gene (I am sure there are reads mapped in this pseudo gene) but not CDS, isoform etc. What is your opinion?

I agree. Only add the pseudo gene. Just double check your tabs!

**super0925** · 01-13-2016, 06:04 AM

Originally posted by westerman View Post

I agree. Only add the pseudo gene. Just double check your tabs!

Done! Thx!

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

how to add a pseduo gene into a GTF file?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News