Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Normalization from different NGS runs

    Hello community, I would certainly appreciate some help here. Many thanks in advance.
    I have been looking around about this subject and everything refers to differential expression of RNA-Seq which is a little different from what I am looking for. I am working with a virus and I would like to assess the depth of coverage produced by sequencing their genomes from two different sources (genomes are the same, the sources are different). Even though both sources are quite different the first step after RNA extraction is a OneStep RT-PCR. I mentione this because as you may guest already I usually get a pretty good coverage. The question here is wether those coverages are comparable.
    I have 30 samples that were ran in different runs in a Mi-Seq platform. Let's say that half of the samples come from one source and that the other half from the other one. Since I want to compare them what I would like to do is to normalize the dataset. What I have in mind is to normalize by log2 instead of normalizing by total number of reads. Then calculate the Depth of coverage based on the number of reads normalized. Does it make sense? Should I go for DESeq or some other package? I think that at the end I will end up comparing the results using different approaches but I just would like some comments and suggestions.
    Thanks once more.

  • #2
    Hello genferreri, I am combining RNAseq data from three experiments generated by HiSeq and another Illumina platform to analyze gene expression patterns across different tissue types. I generated read counts with HTSeq by aligning previously mapped reads (can be in SAM or BAM format) to features (in my case features were genes, but they don't have to be) in a gtf file and normalized them using both EdgeR and DESeq2. If you have a gtf (or gff) file with features that you can align your reads to and generate counts, I don't see why you couldn't use EdgeR or DESeq2 to normalize.

    I used the myDGEList function in EdgeR to collate count files generated by HTSeq for all samples:
    MyDGEList <- readDGE(Count_files, path="./Count_Tables/",labels=sample_ids)

    Then, before normalizing with EdgeR, I extracted counts from the MyDGEList to use with DESeq2:

    Counts <- MyDGEList$Counts #make sure to remove last few rows containing metatags from HTSeq

    Then I filtered and normalized for EdgeR using the following commands in R:

    keep <- rowSums(cpm(MyDGEList)>1) >= 2 #filter out lowly expressed genes
    MyDGEList <- MyDGEList[keep, , keep.lib.sizes=TRUE]
    MyDGEList <- calcNormFactors(MyDGEList) #normalize

    #Extract table of logCPM (log2 counts per million) by:
    log_cpm <- cpm(MyDGEList, prior.count=0.25, log=TRUE)

    For DESeq2, I did the following:

    Count_Table <- DESeqDataSetFromMatrix(countData=Counts,colData=SS_Column_Info, design=~Tissue) #Make count table readable by DESeq

    dds <- DESeq(Count_Table) #Make DESeq object

    ddsClean <- replaceOutliersWithTrimmedMean(dds) #Remove outliers

    dds <- DESeq(ddsClean) #New DESeq object after removing outliers

    dds <- estimateSizeFactors(dds) #for normalization

    #Two options for getting tables of transformed counts in DESeq2
    vsd <- vst(dds) #Variance stabilizing transformation of counts
    rld <- rlog(dds) #Regularized log transformation

    I hope this helps!
    Last edited by amhaan; 09-25-2018, 09:37 PM. Reason: typo in function

    Comment


    • #3
      Thank you very much amhaan. I will go over it and compare the different outcomes.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM
      • seqadmin
        The Impact of AI in Genomic Medicine
        by seqadmin



        Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
        02-26-2024, 02:07 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-14-2024, 06:13 AM
      0 responses
      34 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-08-2024, 08:03 AM
      0 responses
      72 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-07-2024, 08:13 AM
      0 responses
      81 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-06-2024, 09:51 AM
      0 responses
      68 views
      0 likes
      Last Post seqadmin  
      Working...
      X