Seqanswers Leaderboard Ad

**11xinqi** · 10-02-2013, 06:24 AM

Did you get the answer about why this happen. Because I am in the same situation. Would you please explain why?

Originally posted by Lizex View Post

Hi All
I don't know whether somebody else has experience a similar problem.
I'm experiencing the following problem using the pre-compiled binary packages of Tophat-2.0.9 and Cufflinks-2.1.1 where Cuffdiff was reporting many genes and transcripts as having expression levels of zero or "nan" or "inf".
See output:
test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant
XLOC_001554 XLOC_001554 - MDC001340.131:11788-12879 E0 E2 OK 10.7983 0 -inf nan 0.00025 0.0493497 yes
XLOC_001905 XLOC_001905 - MDC001635.618:38277-40655 E0 E2 OK 0 124.446 inf nan 5e-05 0.0183924 yes
XLOC_002511 XLOC_002511 - MDC002108.350:7225-9199 E0 E2 OK 0 32.3 inf nan 5e-05 0.0183924 yes
XLOC_003041 XLOC_003041 - MDC002536.231:5054-9739 E0 E2 OK 0 24.7014 inf nan 5e-05 0.0183924 yes
XLOC_003472 XLOC_003472 - MDC002955.278:7454-8285 E0 E2 OK 11.6936 0 -inf nan 0.00025 0.0493497 yes
XLOC_003551 XLOC_003551 - MDC003040.156:5-453 E0 E2 OK 117.279 0 -inf nan 0.0001 0.0300675 yes
XLOC_004619 XLOC_004619 - MDC004034.219:3633-4187 E0 E2 OK 64.2737 0 -inf nan 5e-05 0.0183924 yes
XLOC_005948 XLOC_005948 - MDC005198.374:10357-11692 E0 E2 OK 0 20.8865 inf nan 0.00025 0.0493497 yes
XLOC_006633 XLOC_006633 - MDC005799.555:2423-4474 E0 E2 OK 12.1785 0 -inf nan 0.00025 0.0493497 yes
XLOC_006750 XLOC_006750 - MDC005908.507:2483-3754 E0 E2 OK 19.815 0 -inf nan 0.0002 0.0446163 yes

Just a few of the DE_genes. See -inf and nan where I'm supposed to get value for log2(fold change). Any suggestions?

Thanks in advance.
Lizex

**chadn737** · 10-02-2013, 07:09 AM

This isn't a problem. What do you think you are going to get when you try to do division with zero?

**Lizex** · 10-02-2013, 08:25 AM

Originally posted by 11xinqi View Post

Did you get the answer about why this happen. Because I am in the same situation. Would you please explain why?

Hi 11xingi
Its not a problem. It happened because one of the conditions had zero and dividing by zero gives you "inf" for infinity.

**11xinqi** · 10-03-2013, 01:10 PM

Originally posted by Lizex View Post

Hi 11xingi
Its not a problem. It happened because one of the conditions had zero and dividing by zero gives you "inf" for infinity.

Hi, Lizex

Thanks for your reply. In my case, most of deferentially expressed genes doesn't have gene name which should be shown on third column following the test_id and gene_id. With the gene name, I can find the gene structure, like how many exons this gene have, in genes.gtf. Without the gene name, I can also find it in merged.gtf. But many of them that dont have gene name only have one exon. I have 3000 genes that express deferentially, but only 73 of them have gene name. Do you think that is normal.

Thant you very much

**Lizex** · 10-04-2013, 12:12 AM

Originally posted by 11xinqi View Post

Hi, Lizex

Thanks for your reply. In my case, most of deferentially expressed genes doesn't have gene name which should be shown on third column following the test_id and gene_id. With the gene name, I can find the gene structure, like how many exons this gene have, in genes.gtf. Without the gene name, I can also find it in merged.gtf. But many of them that dont have gene name only have one exon. I have 3000 genes that express deferentially, but only 73 of them have gene name. Do you think that is normal.

Thant you very much

Hi 11xinqi
I also got no name for the gene but this is what I do to get my transcripts annotated:
Take the gene_exp.diff file open with TextEdit, save as text; use command: grep yes gene_exp.txt > DE_transcripts.txt to extract significant transcripts from the gene_exp.diff file. Open this file in Excel and take the fourth column which is the locus in the genome, make a a header_list.

Use a tool from bedtools to extract
bedtools getfasta -fi ref_genome.fasta -bed cuffmerge_out/transcripts.gtf -fo extracted_transcripts

Rename extracted_transcripts to pool.fa in DE folder. Headers in header_list was converted to fasta format (website) and saved in Textedit.
Run perl script: extractFromFasta.pl (see attachment)
usage: perl extractFromFasta.pl pool.fa list header_list.txt" > transcript.fas

Use DuplicateFinder (java_app) to remove duplicates.

The results I get, I do a BLAST using Blast2Go to functionally annotate it.

In my experiment where I also got a lot of "inf" I only got 34 (have an expression value) out of 3500 differentially expressed genes (significant genes). To me that's way to low. I don't know if your 73 are the only genes that have an expression value?
I'm looking at other means to get a higher number of differentially expressed genes.

Attached Files

extractFromFasta.pl (1.8 KB, 62 views)

**11xinqi** · 10-04-2013, 06:10 AM

Thank you very much, Lizex. You really help me a lot. Actually, I have only 75 genes out of 3000 with gene expression value. I am not sure, but it's a little weird I think. Do you think that's because the genome annotation is not good enough?

Thanks again. I will try the method you told me.

Originally posted by Lizex View Post

Hi 11xinqi
I also got no name for the gene but this is what I do to get my transcripts annotated:
Take the gene_exp.diff file open with TextEdit, save as text; use command: grep yes gene_exp.txt > DE_transcripts.txt to extract significant transcripts from the gene_exp.diff file. Open this file in Excel and take the fourth column which is the locus in the genome, make a a header_list.

Use a tool from bedtools to extract
bedtools getfasta -fi ref_genome.fasta -bed cuffmerge_out/transcripts.gtf -fo extracted_transcripts

Rename extracted_transcripts to pool.fa in DE folder. Headers in header_list was converted to fasta format (website) and saved in Textedit.
Run perl script: extractFromFasta.pl (see attachment)
usage: perl extractFromFasta.pl pool.fa list header_list.txt" > transcript.fas

Use DuplicateFinder (java_app) to remove duplicates.

The results I get, I do a BLAST using Blast2Go to functionally annotate it.

In my experiment where I also got a lot of "inf" I only got 34 (have an expression value) out of 3500 differentially expressed genes (significant genes). To me that's way to low. I don't know if your 73 are the only genes that have an expression value?
I'm looking at other means to get a higher number of differentially expressed genes.

**Lizex** · 10-04-2013, 07:40 AM

Originally posted by 11xinqi View Post

Thank you very much, Lizex. You really help me a lot. Actually, I have only 75 genes out of 3000 with gene expression value. I am not sure, but it's a little weird I think. Do you think that's because the genome annotation is not good enough?

Thanks again. I will try the method you told me.

Yes I guess that could be the problem. In my case the annotation file is not good enough. I'm working on something. Send me an e-mail.

**11xinqi** · 10-24-2013, 07:14 AM

HI, Lizex

I extracted the DEG sequence based on the method you told me. But I met some problems. Would you like to explain for me?

1. The annatation files, ethier transcripts.gtf or merged.gtf, annotate genes exon by exon. For example, gene1 exon1 chr1: 1-500;
gene1 exon2 chr1: 2000-5000.
But in gene_exp.diff file, each gene only appear once with all exons, for eaxmple, gene1 chr1: 1-5000.

When I use this command-perl extractFromFasta.pl pool.fa list header_list.txt" > transcript.fas-, all genes in gene_exp.diff with only one exon are extracted. But for genes with multiple exons like gene1 above, I can't get them. In my case, I have 3000 defferentially expressed genes. 2800 genes of them are single-exon genes of which I can the sequence based on the method you told me. The other 200 genes are multiple-exons gene.

2. The second question is that why so many differentially expressed genes only have one exon.

Do you have any idea about that.

Thanks

Originally posted by Lizex View Post

Hi 11xinqi
I also got no name for the gene but this is what I do to get my transcripts annotated:
Take the gene_exp.diff file open with TextEdit, save as text; use command: grep yes gene_exp.txt > DE_transcripts.txt to extract significant transcripts from the gene_exp.diff file. Open this file in Excel and take the fourth column which is the locus in the genome, make a a header_list.

Use a tool from bedtools to extract
bedtools getfasta -fi ref_genome.fasta -bed cuffmerge_out/transcripts.gtf -fo extracted_transcripts

Rename extracted_transcripts to pool.fa in DE folder. Headers in header_list was converted to fasta format (website) and saved in Textedit.
Run perl script: extractFromFasta.pl (see attachment)
usage: perl extractFromFasta.pl pool.fa list header_list.txt" > transcript.fas

Use DuplicateFinder (java_app) to remove duplicates.

The results I get, I do a BLAST using Blast2Go to functionally annotate it.

In my experiment where I also got a lot of "inf" I only got 34 (have an expression value) out of 3500 differentially expressed genes (significant genes). To me that's way to low. I don't know if your 73 are the only genes that have an expression value?
I'm looking at other means to get a higher number of differentially expressed genes.

**11xinqi** · 10-24-2013, 07:20 AM

Hi chadn737

Thanks for you reply. I met some problems, would you please help me?

I got the differentially expressed genes from gene_exp.diff. Would you tell me how can I extract the sequences of these genes?

I got 3000 differentially expressed genes. 2800 genes of them are single-exon genes. The other 200 genes are multiple-exons gene. Do you know why so many differentially expressed genes only have one exon?

Originally posted by chadn737 View Post

This isn't a problem. What do you think you are going to get when you try to do division with zero?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 50 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Genes and transcripts have expression levels of zero or "nan or "inf"

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News