Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Metagenomic analysis with DESeq2

    I am trying to analyze some metagenomic data with DESeq2 and have some questions about the experimental design and whether my analysis is meaningful.

    For ten swine herds we gathered 30 feces samples, pooled them, extracted metagenomic DNA and sequenced it to approximately 70 mill. reads / dataset (2 x 100 bp, PE) on a HiSeq2500. We replicated the sampling, so we have two pools of 30 from each herd.

    We then used a BWA-based pipeline to map the reads to various databases including one with 2130 resistance genes.

    I've crated a resistance abundance matrix and tried to analyze the 20 samples with DeSeq2. My independent variables are the known antimicrobial consumption (separated by drug type) for the ten herds.

    To analyze whether gene abundances are explained by tetracycline use, I would use a design formula like this:

    formula(~ sample_site + tetracycline)

    However, because the consumption covariates (continuous), are identical for replicates at each sample_site (herd), I am given an error message:

    "Error in DESeqDataSet(se, design = design, ignoreRank) :
    the model matrix is not full rank, so the model cannot be fit as specified.
    one or more variables or interaction terms in the design formula
    are linear combinations of the others and must be removed"

    Is my design inherently flawed or am I just using DESeq2 in a way that wasn't anticipated? If my IV was factorial, I can understand DESeq2 would complain it was unable to separate the effects from the two, but since "tetracycline" is continuous, shouldn't this work? Is this different from having two replicates from each time point in an RNASeq experiment?

    Adding a very small random number to each tetracycline number I can get it to work. I have a number of differentially abundant genes according to the Wald test... all of them increase in abundance as consumption increases.

    A simplified version of my design matrix looks like this:

    sample_name sample_site aminoglycoside tetracycline lib_size
    F10_IS_P_FIB 2014_10 0,029982363 1,280110685 60991062
    F10_ST_FIB 2014_10 0,029982363 1,280110685 60231460
    F1_IS_P_FIB 2014_1 0,15457294 0,171208676 62054282
    F1_ST_FIB 2014_1 0,15457294 0,171208676 70823944
    F2_IS_P_FIB 2014_2 0,588795386 0,892359485 59969860
    F2_ST_FIB 2014_2 0,588795386 0,892359485 62137964
    F3_IS_P_FIB 2014_3 0,902489538 0,101130753 59479678
    F3_ST_FIB 2014_3 0,902489538 0,101130753 73284776
    F4_IS_P_FIB 2014_4 0,633925771 2,051830423 60183764
    F4_ST_FIB 2014_4 0,633925771 2,051830423 76241940
    F5_IS_P_FIB 2014_5 0,076205716 1,02533471 60625410
    F5_ST_FIB 2014_5 0,076205716 1,02533471 67639338
    F6_IS_P_FIB 2014_6 0,573926613 2,683780065 61728500
    F6_ST_FIB 2014_6 0,573926613 2,683780065 60676694
    F7_IS_P_FIB 2014_7 0,077160494 0,346395483 59633136
    F7_ST_FIB 2014_7 0,077160494 0,346395483 55578210
    F8_IS_P_FIB 2014_8 0,142856031 0,242034969 60365050
    F8_ST_FIB 2014_8 0,142856031 0,242034969 77182448
    F9_IS_P_FIB 2014_9 0,290693702 2,488566048 65000142
    F9_ST_FIB 2014_9 0,290693702 2,488566048 79458424
    1) Does DESeq2 use the "sample_site" to estimate the dispersion within herds, as I'm hoping it does?
    2) Are there any DESeq2 parameters I should tweak for my specific application?
    3) If meaningful, can I tell DESeq2 to accept my design formula?

    Best regards,
    Patrick Munk

    sessionInfo()
    R version 3.1.1 (2014-07-10)
    Platform: x86_64-w64-mingw32/x64 (64-bit)

    locale:
    [1] LC_COLLATE=Danish_Denmark.1252 LC_CTYPE=Danish_Denmark.1252 LC_MONETARY=Danish_Denmark.1252
    [4] LC_NUMERIC=C LC_TIME=Danish_Denmark.1252

    attached base packages:
    [1] parallel stats4 stats graphics grDevices utils datasets methods base

    other attached packages:
    [1] knitr_1.9 ez_4.2-2 DESeq2_1.6.1 corrplot_0.73
    [5] RcppArmadillo_0.4.450.1.0 Rcpp_0.11.3 GenomicRanges_1.18.1 GenomeInfoDb_1.2.0
    [9] IRanges_2.0.0 S4Vectors_0.4.0 BiocGenerics_0.12.0 phyloseq_1.9.15
    [13] pheatmap_1.0.2 scales_0.2.4 plyr_1.8.1 data.table_1.9.4
    [17] vegan_2.0-10 lattice_0.20-29 permute_0.8-3 gplots_2.14.2
    [21] RColorBrewer_1.0-5 ggplot2_1.0.0 reshape2_1.4

    loaded via a namespace (and not attached):
    [1] acepack_1.3-3.3 ade4_1.6-2 annotate_1.44.0 AnnotationDbi_1.28.0 ape_3.1-4
    [6] base64enc_0.1-2 BatchJobs_1.4 BBmisc_1.7 Biobase_2.26.0 BiocParallel_1.0.0
    [11] biom_0.3.12 Biostrings_2.34.0 bitops_1.0-6 brew_1.0-6 car_2.0-25
    [16] caTools_1.17.1 checkmate_1.5.0 chron_2.3-45 cluster_1.15.3 codetools_0.2-9
    [21] colorspace_1.2-4 compiler_3.1.1 DBI_0.3.1 digest_0.6.4 evaluate_0.5.5
    [26] fail_1.2 foreach_1.4.2 foreign_0.8-61 formatR_1.1 Formula_1.1-2
    [31] gdata_2.13.3 genefilter_1.48.1 geneplotter_1.44.0 grid_3.1.1 gtable_0.1.2
    [36] gtools_3.4.1 Hmisc_3.14-5 igraph_0.7.1 iterators_1.0.7 KernSmooth_2.23-13
    [41] labeling_0.3 latticeExtra_0.6-26 lme4_1.1-7 locfit_1.5-9.1 MASS_7.3-35
    [46] Matrix_1.1-4 mgcv_1.8-3 minqa_1.2.4 multtest_2.22.0 munsell_0.4.2
    [51] nlme_3.1-118 nloptr_1.0.4 nnet_7.3-8 pbkrtest_0.4-2 proto_0.3-10
    [56] quantreg_5.11 RJSONIO_1.3-0 rpart_4.1-8 RSQLite_0.11.4 sendmailR_1.2-1
    [61] SparseM_1.6 splines_3.1.1 stringr_0.6.2 survival_2.37-7 tools_3.1.1
    [66] XML_3.98-1.1 xtable_1.7-4 XVector_0.6.0 zlibbioc_1.12.0

  • #2
    0) You likely used the wrong function to read in your sample table. Decimals should have a period in them, not a comma (i.e., you probably want read.delim2() rather than read.delim()).
    1) It uses everything in your design, so yes.
    3) The error will likely go away once you use read.delim2 rather than read.delim.

    Comment


    • #3
      Originally posted by dpryan View Post
      0) You likely used the wrong function to read in your sample table. Decimals should have a period in them, not a comma (i.e., you probably want read.delim2() rather than read.delim()).
      1) It uses everything in your design, so yes.
      3) The error will likely go away once you use read.delim2 rather than read.delim.
      Thank you for your response. It is however not the problem.
      colData(resdds) correctly identified my drug consumption columns as "numeric".

      The comma only showed up, because I exported the data to a European Excel version and removed some columns prior to pasting it here.

      To clarify, I can get it work if I add random small numbers to "tetracycline" or if I run only 10 samples (no replicates).

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      10 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      51 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      67 views
      0 likes
      Last Post seqadmin  
      Working...
      X