Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • infinite fold change RNAseq

    Hi all,
    I am analyzing my RNAseq data using DEseq package. Some of the read counts in the data are as follows. (2 groups of 3 samples each)
    CTL CTL CTL HF HF HF
    gene1 144 0 0 0 0 0
    gene2 45 57 0 0 0 0

    Now, after running DEseq NB test, fold change for both gene1 and gene2 is 'inf'. which is okay because the read counts for the 3 samples in group2 is 0. However, I am not sure about gene1 data. Only 1 sample showing reads and other two being zero is unreliable.
    Should I remove such cases and keep only genes like gene2?
    Also, should I remove the genes with all read counts zero before the analysis? (no data for all 6 samples in this case)
    thanks,

  • #2
    Originally posted by biofreak View Post
    Hi all,
    I am analyzing my RNAseq data using DEseq package. Some of the read counts in the data are as follows. (2 groups of 3 samples each)
    CTL CTL CTL HF HF HF
    gene1 144 0 0 0 0 0
    gene2 45 57 0 0 0 0

    Now, after running DEseq NB test, fold change for both gene1 and gene2 is 'inf'. which is okay because the read counts for the 3 samples in group2 is 0. However, I am not sure about gene1 data. Only 1 sample showing reads and other two being zero is unreliable.
    Should I remove such cases and keep only genes like gene2?
    The problem has little to do with the values being zero, you also get genes with values like '200, 1, 3; 2, 3, 2', i.e., one large value and all other values small.

    We recently changed how DESeq deals with this kind of data, so I have to give you two answers.

    for the released version of DESeq (v1.4.1): The data frame returned by 'nbinomTest' contains two columns called 'resVarA' and 'resVarB'. The values in their should be around 1 and for genes with unusually large estimated within-group variance ("variance outliers"), the value will be much higher. We recommend discounting such genes in the downstream analysis. See vignette for details

    for the development version (v1.5.23): We removed the 'resVar' columns as it turned out to be difficult to recommend a good cut-off value. Instead, the test is now informed about high estimated variance and becomes more conservative in this cases, i.e., genes as in your example should no longer reach a significant p value. Again, see the new (extensively revised) vignette for details.

    Also, should I remove the genes with all read counts zero before the analysis? (no data for all 6 samples in this case)
    thanks,
    No need to do that, as DESeq skips such genes anyway.

    Simon

    Comment


    • #3
      thanks a lot Simon. This information is tremendously helpful.

      Comment


      • #4
        Hi Simon,
        I am using 1.4 version if DESeq. I tried filtering differentially expressed genes based on within group variance values from 'resVarA' and 'resVarB'. As you have mentioned, it is indeed difficult to determine a good cutoff level. I tried everything from very strict (smaller) cutoffs to the cutoff of even 20!
        But every time you think that you could have kept some genes (or otherwise).
        For example with cutoff of 20 (in either groupA or groupB), following are some of the
        discarded genes: (6 samples)

        Z-1 Z-2 Z-3 Z-4 Z-5 Z-6
        703 531 7 0 0 1
        158 328 2 4 0 1
        213 284 1 2 3 5
        186 240 4 6 6 2
        382 194 143 2634 951 4336
        229 119 105 2206 543 1837
        1646 120 67 129 167 159
        137 60 46 1147 321 990
        144 64 35 1195 273 955
        249 39 24 19 22 8
        210 83 40 1418 303 952
        46 63 28 417 178 704
        49 42 23 381 156 598
        498 831 42 247 115 88
        261 329 19 82 57 29
        72 29 26 471 137 506
        279 341 24 70 55 70
        101 51 34 364 182 850
        345 389 37 84 78 95

        My data is already sparse and I am worried that so many genes with good read counts are being discarded. (for good reasons ofcourse!)
        Based on your experience, do you have any number which people find reasonable to use as a cut off? or is it very specific to the data?
        thanks a lot in advance.
        Last edited by biofreak; 08-03-2011, 09:59 AM.

        Comment


        • #5
          Worries like yours were what prompted us to re-address this whole issue and make the changes to DESeq mentioned above. Maybe you should give the devel version a try. It should be stable now (and you can manually install it on top of a current release Bioc installation.)

          Comment


          • #6
            HELP Count Table for DESeq

            cat Join_counts_trimmed.txt | head

            1/2-SBSRNA4 13 12 12 48
            A1BG 12 2 7 12
            A1BG-AS1 8 3 12 12
            A1CF 2 3 0 1
            A2LD1 29 33 47 47
            A2M 21 36 19 6
            A2ML1 8 11 6 16
            A2MP1 0 0 1 0
            A4GALT 74 47 11 94
            A4GNT 0 0 0 0

            I would like to load into R for DESeq analysis. WHY IT doesnt show 5 columns?

            > countsTable <- read.table("/countstable/Join_counts_trimmed.txt", sep='\t', header=TRUE, row.names=1)
            > countsTable
            data frame with 0 columns and 23227 rows


            Then I tried differently

            > exampleFile = system.file( "/countstable/Join_counts.txt", package="DESeq" )
            > exampleFile
            [1] ""
            > countsTable <- read.delim( exampleFile, header=FALSE, stringsAsFactors=TRUE )

            Error in read.table(file = file, header = header, sep = sep, quote = quote, :
            no lines available in input
            In addition: Warning message:
            In file(file, "rt") :
            file("") only supports open = "w+" and open = "w+b": using the former


            I dont know how to load this joined.txt file?

            Thank you so much for any help

            Comment


            • #7
              I just opened the excel file and import counts_table.txt file and saved as csv file.


              > countsTable <- read.table("/countstable/count_table_last.csv", sep=',', header=FALSE, row.names=1)
              > head (countsTable) V2 V3 V4 V5
              1/2-SBSRNA4 13 12 12 48
              A1BG 12 2 7 12
              A1BG-AS1 8 3 12 12
              A1CF 2 3 0 1
              A2LD1 29 33 47 47
              A2M 21 36 19 6

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              68 views
              0 likes
              Last Post seqadmin  
              Working...
              X