Seqanswers Leaderboard Ad

**Simon Anders** · 02-06-2013, 08:58 AM

Could you grep for 'Medtr7g090810' in you GTF file and post the lines containing the term?

**syintel87** · 02-06-2013, 09:25 AM

Originally posted by Simon Anders View Post

Could you grep for 'Medtr7g090810' in you GTF file and post the lines containing the term?

The attached file includes the part of 'Medtr7g090810'.
Thank you for sparing your precious time.

Attached Files

Mt_grep.jpg (80.1 KB, 45 views)

**Simon Anders** · 02-06-2013, 09:27 AM

It seems HTSeq got confused because the same gene occurs on both the "+" and the "-_ strand.

**syintel87** · 02-06-2013, 09:52 AM

Originally posted by Simon Anders View Post

It seems HTSeq got confused because the same gene occurs on both the "+" and the "-_ strand.

1. So, annotation file has to be fixed??

2. When I have just made small changes on the "dexseq_prepare_annotation.py", it worked.

exons = HTSeq.GenomicArrayOfSets( "auto", stranded=True )
for f in HTSeq.GFF_Reader( gtf_file ):
if f.type != "exon":
continue
f.attr['transcript_id'] = f.attr['transcript_id'].replace( ":", "_" )
exons[f.iv] += ( f.attr['transcript_id'], f.attr['transcript_id'] )

But, seeing the original gtf file, since there are both exon and CDS, I am not sure whether this code is okay for my gtf file or not.

3. I have another gtf file. For this, I also made a small change. And it worked.

exons = HTSeq.GenomicArrayOfSets( "auto", stranded=True )
for f in HTSeq.GFF_Reader( gtf_file ):
if f.type != "CDS":
continue
f.attr['transcript_id'] = f.attr['transcript_id'].replace( ":", "_" )
CDS[f.iv] += ( f.attr['transcript_id'], f.attr['transcript_id'] )

But, I am not sure whether this code is okay or not.

4. Do you think the codes that I have modified would be okay?
The attached file is about original gtf and dexseq_prepare_annotation.py and output gtf.

Thank you very much!

**syintel87** · 02-06-2013, 09:53 AM

Originally posted by Simon Anders View Post

It seems HTSeq got confused because the same gene occurs on both the "+" and the "-_ strand.

This is the attachment.

Attached Files

dexseq_edit.txt (5.9 KB, 45 views)

**Simon Anders** · 02-06-2013, 10:48 AM

No, yopu cannot change from gene ID to transcript ID, because there may be many genes with several overlapping transcripts, and they won't be handled correctly anymore.

You really should fix your GTF file: Wherever the same gene ID is used for features on different strands, add something to the gene ID. If this is complicated, just add a "+" or "-" to all gene IDs.

BTW, dexseq_prepare only looks at "exon" lines and ignored "CDS" lines

**syintel87** · 02-06-2013, 11:08 AM

Originally posted by Simon Anders View Post

No, yopu cannot change from gene ID to transcript ID, because there may be many genes with several overlapping transcripts, and they won't be handled correctly anymore.

You really should fix your GTF file: Wherever the same gene ID is used for features on different strands, add something to the gene ID. If this is complicated, just add a "+" or "-" to all gene IDs.

BTW, dexseq_prepare only looks at "exon" lines and ignored "CDS" lines

1.
So, you mean I will have to replace each +/- into + ?
(Alternatively, replace each +/- into -).

2.
In my Mhapla.gtf file, it has only exon. So do I need to fix my gtf file by replaceing CDS with exon?

**Simon Anders** · 02-06-2013, 11:19 AM

1. No, change the gene ID from, say, "Medtr7g090810" to "Medtr7g090810+" and "Medtr7g090810-", depending on strand. This is assuming that you know a scripting language. I wouldn't want to do that manually.

Where did you get this strange GTF file from, anyway? Having the same gene name on both strands is a bug.

2. No, why?

**syintel87** · 02-06-2013, 01:40 PM

Originally posted by Simon Anders View Post

1. No, change the gene ID from, say, "Medtr7g090810" to "Medtr7g090810+" and "Medtr7g090810-", depending on strand. This is assuming that you know a scripting language. I wouldn't want to do that manually.

Where did you get this strange GTF file from, anyway? Having the same gene name on both strands is a bug.

2. No, why?

1. Thank you. I'll try that. I obtained these gtf files from a member of my project group.

2. If Mhapla.gtf only has CDS but dexseq_prepare.py only looks at "exon" lines and ignored "CDS" lines, the output might have no line at all, I guess.

**Simon Anders** · 02-06-2013, 01:49 PM

2. Sure, if it's this way round, you need to change the CDS lines to exon lines.

**arkanion** · 11-06-2019, 11:38 PM

Originally posted by Simon Anders View Post

1. No, change the gene ID from, say, "Medtr7g090810" to "Medtr7g090810+" and "Medtr7g090810-", depending on strand. This is assuming that you know a scripting language. I wouldn't want to do that manually.

Where did you get this strange GTF file from, anyway? Having the same gene name on both strands is a bug.

2. No, why?

I have the same problem. If you search UCSC Genome Browser for the gene for ex; HIST2H3C, you will see 2 genes appearing, one on + and on one - strand. dexseq cannot deal with this, but this situation actually happens in reality for those who use gtf file downloaded from UCSC track.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 47 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

[DEXSeq] prepare_annotation.py: exonic part starts too early!

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News