Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Genes and transcripts have expression levels of zero or "nan or "inf"

    Hi All
    I don't know whether somebody else has experience a similar problem.
    I'm experiencing the following problem using the pre-compiled binary packages of Tophat-2.0.9 and Cufflinks-2.1.1 where Cuffdiff was reporting many genes and transcripts as having expression levels of zero or "nan" or "inf".
    See output:
    test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant
    XLOC_001554 XLOC_001554 - MDC001340.131:11788-12879 E0 E2 OK 10.7983 0 -inf nan 0.00025 0.0493497 yes
    XLOC_001905 XLOC_001905 - MDC001635.618:38277-40655 E0 E2 OK 0 124.446 inf nan 5e-05 0.0183924 yes
    XLOC_002511 XLOC_002511 - MDC002108.350:7225-9199 E0 E2 OK 0 32.3 inf nan 5e-05 0.0183924 yes
    XLOC_003041 XLOC_003041 - MDC002536.231:5054-9739 E0 E2 OK 0 24.7014 inf nan 5e-05 0.0183924 yes
    XLOC_003472 XLOC_003472 - MDC002955.278:7454-8285 E0 E2 OK 11.6936 0 -inf nan 0.00025 0.0493497 yes
    XLOC_003551 XLOC_003551 - MDC003040.156:5-453 E0 E2 OK 117.279 0 -inf nan 0.0001 0.0300675 yes
    XLOC_004619 XLOC_004619 - MDC004034.219:3633-4187 E0 E2 OK 64.2737 0 -inf nan 5e-05 0.0183924 yes
    XLOC_005948 XLOC_005948 - MDC005198.374:10357-11692 E0 E2 OK 0 20.8865 inf nan 0.00025 0.0493497 yes
    XLOC_006633 XLOC_006633 - MDC005799.555:2423-4474 E0 E2 OK 12.1785 0 -inf nan 0.00025 0.0493497 yes
    XLOC_006750 XLOC_006750 - MDC005908.507:2483-3754 E0 E2 OK 19.815 0 -inf nan 0.0002 0.0446163 yes

    Just a few of the DE_genes. See -inf and nan where I'm supposed to get value for log2(fold change). Any suggestions?

    Thanks in advance.
    Lizex

  • #2
    Did you get the answer about why this happen. Because I am in the same situation. Would you please explain why?
    Originally posted by Lizex View Post
    Hi All
    I don't know whether somebody else has experience a similar problem.
    I'm experiencing the following problem using the pre-compiled binary packages of Tophat-2.0.9 and Cufflinks-2.1.1 where Cuffdiff was reporting many genes and transcripts as having expression levels of zero or "nan" or "inf".
    See output:
    test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant
    XLOC_001554 XLOC_001554 - MDC001340.131:11788-12879 E0 E2 OK 10.7983 0 -inf nan 0.00025 0.0493497 yes
    XLOC_001905 XLOC_001905 - MDC001635.618:38277-40655 E0 E2 OK 0 124.446 inf nan 5e-05 0.0183924 yes
    XLOC_002511 XLOC_002511 - MDC002108.350:7225-9199 E0 E2 OK 0 32.3 inf nan 5e-05 0.0183924 yes
    XLOC_003041 XLOC_003041 - MDC002536.231:5054-9739 E0 E2 OK 0 24.7014 inf nan 5e-05 0.0183924 yes
    XLOC_003472 XLOC_003472 - MDC002955.278:7454-8285 E0 E2 OK 11.6936 0 -inf nan 0.00025 0.0493497 yes
    XLOC_003551 XLOC_003551 - MDC003040.156:5-453 E0 E2 OK 117.279 0 -inf nan 0.0001 0.0300675 yes
    XLOC_004619 XLOC_004619 - MDC004034.219:3633-4187 E0 E2 OK 64.2737 0 -inf nan 5e-05 0.0183924 yes
    XLOC_005948 XLOC_005948 - MDC005198.374:10357-11692 E0 E2 OK 0 20.8865 inf nan 0.00025 0.0493497 yes
    XLOC_006633 XLOC_006633 - MDC005799.555:2423-4474 E0 E2 OK 12.1785 0 -inf nan 0.00025 0.0493497 yes
    XLOC_006750 XLOC_006750 - MDC005908.507:2483-3754 E0 E2 OK 19.815 0 -inf nan 0.0002 0.0446163 yes

    Just a few of the DE_genes. See -inf and nan where I'm supposed to get value for log2(fold change). Any suggestions?

    Thanks in advance.
    Lizex

    Comment


    • #3
      This isn't a problem. What do you think you are going to get when you try to do division with zero?

      Comment


      • #4
        Originally posted by 11xinqi View Post
        Did you get the answer about why this happen. Because I am in the same situation. Would you please explain why?
        Hi 11xingi
        Its not a problem. It happened because one of the conditions had zero and dividing by zero gives you "inf" for infinity.

        Comment


        • #5
          Originally posted by Lizex View Post
          Hi 11xingi
          Its not a problem. It happened because one of the conditions had zero and dividing by zero gives you "inf" for infinity.
          Hi, Lizex

          Thanks for your reply. In my case, most of deferentially expressed genes doesn't have gene name which should be shown on third column following the test_id and gene_id. With the gene name, I can find the gene structure, like how many exons this gene have, in genes.gtf. Without the gene name, I can also find it in merged.gtf. But many of them that dont have gene name only have one exon. I have 3000 genes that express deferentially, but only 73 of them have gene name. Do you think that is normal.


          Thant you very much

          Comment


          • #6
            Originally posted by 11xinqi View Post
            Hi, Lizex

            Thanks for your reply. In my case, most of deferentially expressed genes doesn't have gene name which should be shown on third column following the test_id and gene_id. With the gene name, I can find the gene structure, like how many exons this gene have, in genes.gtf. Without the gene name, I can also find it in merged.gtf. But many of them that dont have gene name only have one exon. I have 3000 genes that express deferentially, but only 73 of them have gene name. Do you think that is normal.


            Thant you very much
            Hi 11xinqi
            I also got no name for the gene but this is what I do to get my transcripts annotated:
            Take the gene_exp.diff file open with TextEdit, save as text; use command: grep yes gene_exp.txt > DE_transcripts.txt to extract significant transcripts from the gene_exp.diff file. Open this file in Excel and take the fourth column which is the locus in the genome, make a a header_list.

            Use a tool from bedtools to extract
            bedtools getfasta -fi ref_genome.fasta -bed cuffmerge_out/transcripts.gtf -fo extracted_transcripts

            Rename extracted_transcripts to pool.fa in DE folder. Headers in header_list was converted to fasta format (website) and saved in Textedit.
            Run perl script: extractFromFasta.pl (see attachment)
            usage: perl extractFromFasta.pl pool.fa list header_list.txt" > transcript.fas

            Use DuplicateFinder (java_app) to remove duplicates.

            The results I get, I do a BLAST using Blast2Go to functionally annotate it.

            In my experiment where I also got a lot of "inf" I only got 34 (have an expression value) out of 3500 differentially expressed genes (significant genes). To me that's way to low. I don't know if your 73 are the only genes that have an expression value?
            I'm looking at other means to get a higher number of differentially expressed genes.
            Attached Files

            Comment


            • #7
              Thank you very much, Lizex. You really help me a lot. Actually, I have only 75 genes out of 3000 with gene expression value. I am not sure, but it's a little weird I think. Do you think that's because the genome annotation is not good enough?

              Thanks again. I will try the method you told me.
              Originally posted by Lizex View Post
              Hi 11xinqi
              I also got no name for the gene but this is what I do to get my transcripts annotated:
              Take the gene_exp.diff file open with TextEdit, save as text; use command: grep yes gene_exp.txt > DE_transcripts.txt to extract significant transcripts from the gene_exp.diff file. Open this file in Excel and take the fourth column which is the locus in the genome, make a a header_list.

              Use a tool from bedtools to extract
              bedtools getfasta -fi ref_genome.fasta -bed cuffmerge_out/transcripts.gtf -fo extracted_transcripts

              Rename extracted_transcripts to pool.fa in DE folder. Headers in header_list was converted to fasta format (website) and saved in Textedit.
              Run perl script: extractFromFasta.pl (see attachment)
              usage: perl extractFromFasta.pl pool.fa list header_list.txt" > transcript.fas

              Use DuplicateFinder (java_app) to remove duplicates.

              The results I get, I do a BLAST using Blast2Go to functionally annotate it.

              In my experiment where I also got a lot of "inf" I only got 34 (have an expression value) out of 3500 differentially expressed genes (significant genes). To me that's way to low. I don't know if your 73 are the only genes that have an expression value?
              I'm looking at other means to get a higher number of differentially expressed genes.

              Comment


              • #8
                Originally posted by 11xinqi View Post
                Thank you very much, Lizex. You really help me a lot. Actually, I have only 75 genes out of 3000 with gene expression value. I am not sure, but it's a little weird I think. Do you think that's because the genome annotation is not good enough?

                Thanks again. I will try the method you told me.
                Yes I guess that could be the problem. In my case the annotation file is not good enough. I'm working on something. Send me an e-mail.

                Comment


                • #9
                  HI, Lizex

                  I extracted the DEG sequence based on the method you told me. But I met some problems. Would you like to explain for me?

                  1. The annatation files, ethier transcripts.gtf or merged.gtf, annotate genes exon by exon. For example, gene1 exon1 chr1: 1-500;
                  gene1 exon2 chr1: 2000-5000.
                  But in gene_exp.diff file, each gene only appear once with all exons, for eaxmple, gene1 chr1: 1-5000.

                  When I use this command-perl extractFromFasta.pl pool.fa list header_list.txt" > transcript.fas-, all genes in gene_exp.diff with only one exon are extracted. But for genes with multiple exons like gene1 above, I can't get them. In my case, I have 3000 defferentially expressed genes. 2800 genes of them are single-exon genes of which I can the sequence based on the method you told me. The other 200 genes are multiple-exons gene.

                  2. The second question is that why so many differentially expressed genes only have one exon.

                  Do you have any idea about that.

                  Thanks

                  Originally posted by Lizex View Post
                  Hi 11xinqi
                  I also got no name for the gene but this is what I do to get my transcripts annotated:
                  Take the gene_exp.diff file open with TextEdit, save as text; use command: grep yes gene_exp.txt > DE_transcripts.txt to extract significant transcripts from the gene_exp.diff file. Open this file in Excel and take the fourth column which is the locus in the genome, make a a header_list.

                  Use a tool from bedtools to extract
                  bedtools getfasta -fi ref_genome.fasta -bed cuffmerge_out/transcripts.gtf -fo extracted_transcripts

                  Rename extracted_transcripts to pool.fa in DE folder. Headers in header_list was converted to fasta format (website) and saved in Textedit.
                  Run perl script: extractFromFasta.pl (see attachment)
                  usage: perl extractFromFasta.pl pool.fa list header_list.txt" > transcript.fas

                  Use DuplicateFinder (java_app) to remove duplicates.

                  The results I get, I do a BLAST using Blast2Go to functionally annotate it.

                  In my experiment where I also got a lot of "inf" I only got 34 (have an expression value) out of 3500 differentially expressed genes (significant genes). To me that's way to low. I don't know if your 73 are the only genes that have an expression value?
                  I'm looking at other means to get a higher number of differentially expressed genes.

                  Comment


                  • #10
                    Hi chadn737

                    Thanks for you reply. I met some problems, would you please help me?

                    I got the differentially expressed genes from gene_exp.diff. Would you tell me how can I extract the sequences of these genes?

                    I got 3000 differentially expressed genes. 2800 genes of them are single-exon genes. The other 200 genes are multiple-exons gene. Do you know why so many differentially expressed genes only have one exon?

                    Originally posted by chadn737 View Post
                    This isn't a problem. What do you think you are going to get when you try to do division with zero?

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    22 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    24 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    19 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    50 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X