Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Diff expression: pooling reads of replicates/treatments for de-novo assembly

    I want to make pairwise comparisons of gene expression between tissue samples (same tissue, same species, different individuals) in 5 different treatments (with 2 biological and 2 technical -sequencing- replicates per treatment, using Illumina paired-end reads for de novo assembly). Before the differential expression analysis, I have to assemble a de-novo transcriptome.
    Ideally, I'd like to have a good tradeoff between maximum recovery of splice variants and not too many computational chimeras.
    Assuming unlimited computational resources, what would be the best strategy for pooling the samples in order to get a common set of transcripts for which to compare expression in different treatments. I thought of pooling all 20 samples for creating a single assembly that would contain the transcripts expressed in every condition and then I could map each sample to this assembly and subsequently compare. How much coverage is too much? (in terms of errors, chimeric sequences). My main concern is on how this will affect to the representation of isoforms from different treatments.
    Is it more appropriate to make 5 different assemblies with 4 samples each and then collapse them with CD-HIT or a similar tool?
    Thanks,

    Carlos

  • #2
    Combined assemblies are the way to go, a few programs even give the information about which read formed which contig in the output. Trinity does this, although I am only just starting to play with it now. It looks promising.

    Comment


    • #3
      Unlimited computational resources.. I'm jealous.
      When you talk about 20 different samples, how many reads do you have in total? There is an estimation on the Trinity homepage about the amount of memory needed per amount of reads I think.

      The coverage depends on the expression of your genes and on the amount of reads you obtained in each sequencing run. I find it hard to suggest parameters for that.

      I did a similar aproach on such a problem. First I asembled with Trinity and then did the alignment with bwa. Trinity is in fact able to give you rpkm values for each assembled sequence. Though last time I looked upon their sourceforge page they were still admiting one shouldn't trust these values. Unfortunatly at least parts of sourceforge seems to be down at the moment so I can't look it up.

      Comment


      • #4
        @CarlosVM
        How far have you done in your analyses? I am planning to analyze similar things (have 4 different conditions, 3 individuals per each condition and 3 samples per individual - different tissues, together 4 x 3 x 3 = 36 samples). I isolated RNA separately, but thinking to maybe pool the different tissues for each individual before sequencing to get less libraries to sequence (but still sequence from different tissues to get better transcriptome profiles). I also have to do de novo assembly and I'mplanning to just assembly from the reads I sequence, plus some available ESTs online.
        Then my thinking was like yours, map each sample back to assembly and compare the samples between conditions.

        Do you think the pooling of tissues for each individual separately is a good idea or would you just barcode each tissue separately?
        Do you maybe have some advices for the analysis?

        We are mainly using CLC Genomics Workbench for our analyses, but I used also i.e. Trinity assembler. The main issue I have with it is that Trinity reports alternative transcripts, so that you actually think you have more transcripts than you actually have and I think that might be problem for some follow up analyses..or? Do you have some experience with RSEM, or edgeR, DESeq programmes?? I'm just trying to read up on it.

        Thanks in advance for any advices,
        Please ask if something was not so clearly explained,
        Ilona

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        9 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X