Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparative expression analysis across two species

    Hello everyone!

    I have the following situation/design that I could use some help figuring out the best way to analyze:

    I have two closely related species (non-model, no reference genome) that differ in their response to injury, and I want to compare gene expression along a 4-point time series and across both species. The basic experiment is to sample total RNA for RNAseq at four timepoints at each of the two species: a time t0 (baseline) and 3 consecutive time points t1 to t3.

    My conceptual pipeline so far is as follows:
    -Sequence total RNA using Illumina HiSeq 100PE
    -Pool and clean up reads for time points t0 to t3 for each species.
    -Assemble a transcriptome for each species using Trinity.

    Now comes the question. I can analyze differential gene expression for each species along timepoints using RSEM/edgeR. But how would it be best to compare the same timepoints (say, t0 or t2) across species? It seems to me like I should somehow assemble a "consensus transcriptome". Any ideas on how to do it? My thoughts so far are doing a reciprocal blast search of both transcriptomes and use it to build a "join table", then use it to compare the results of the RSEM analyses made against the species-specific transcriptome; however, I am not sure if FPKM values obtained against different references are comparable.

    I appreciate any thoughts or ideas on this!

    Cheers!

    -Ed-

  • #2
    I think this is a not so straightforward issue and I'd have to think about it a bit more. For starters, however, I should warn you that when you pool RNA-seq data from more than one species, especially if they contain many common genes, tools like Trinity will in fact generate chimeric isoforms merging parts of the two species' versions of those genes together. Ive done this myself by combining simulated RNA-seq reads from Rat and Human. Trinity happily created Rat-Human chimeric isoforms for commonly expressed genes. That's not helpful at all.

    Maybe a good starting point would be to generate two master assemblies (one for each species) and then attempt to identify homologous genes between the two so that you could align those for differential expression analysis. When you get to that point you will have to rely on some kind of length-normalized values (like FPKM) because it's unlikely that the common genes between the species will have the same lengths which would throw off raw count based DE tools. There are no rules for this analysis as far as I know...but you could take a crack at it like this.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment


    • #3
      Thanks for your reply.

      You are correct that pooling reads from both species is not a good idea. I have separate assemblies for each species, and have used RSEM/edgeR to get expression levels for each independently. If I do a reciprocal blast between both assemblies, and then use that table to join the species-specific results standardized to FPKM, that should allow me to compare side by side, right?

      I tried running RSEM to map reads of one species against the transcriptome of the other, hoping that since they are very close, I would be able to map reasonably well, but that failed miserably, as the Bowtie mapping strategy only recognized almost perfect matches, so most of the reads were not mapped. So now I know that doesn't work.

      Ok, I will give it a try and let you know how it worked.

      Cheers!

      Comment


      • #4
        blast is what I was thinking would be a good starting point. keep in mind that Trinity does bundle transcripts into genes based on shared information. it may be best to try to match up the species at the gene level which may mean keeping track of multiple transcripts per species that match up as a group. then you can sum the FPKM values per sample per gene and make the comparisons. I have read that the TPM normalization may be an even better metric for this type of comparison. it is discussed super briefly in the RSEM paper.
        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
        Salk Institute for Biological Studies, La Jolla, CA, USA */

        Comment


        • #5
          This is definitely a hard one. I can see all sorts of problems comparing expression levels between genes that are not exactly the same. Yes, you could get a number for gene 1 and a number for gene 2, compare them, and draw a plot, but even a single bp difference between the gene could totally change the structure / function of the protein, and then you're comparing things that should not really be compared.

          I had a similar issue with comparing two different cultivars.

          Sdriscoll's suggestion is pretty solid. Differential expression between genes that are 100% identical in the coding sequence would be a solid start. Differential expression between genes that are similar but non-identical might require you to show that the differences don't make much difference to the structure/function of the protein.

          Comment


          • #6
            If the intent is to compare response to injury in species 1 at time t1, and response to injury in species 2 at time t1, then you already have what you need.

            I'm assuming you have independent, non-injury controls for both species experiments? So you have differential expression for every time point in species 1 and differential expression for every time point in species 2. You could simple take those species specific gene lists and use estimated species-specific fold change and a simple RankProduct analysis of homologous genes to look at relative injury response between species. Or compute z-scores for each species and compare those (or use any one of several other non-parametric approaches to compare two independent lists of things).

            I'd not want to even try to directly compare relative expression estimates between the two species as I see little value in that. At least not if the intent is to classify genomic injury response between two species. In that case, the comparison of interest is in the species specific response, and how it differs between them. The actual difference in relative expression between the two species for any particular gene doesn't really tell you anything of value about injury response between them. Their response to injury is defined by their species specific response of treatment relative to their respective species specific controls.
            Last edited by mbblack; 10-30-2014, 07:00 AM.
            Michael Black, Ph.D.
            ScitoVation LLC. RTP, N.C.

            Comment


            • #7
              Excellent point. Solve the problem with good experiment design (i.e. use controls for each species at each time point). This is perfect. Then you can quantify each species' injury response at each time point and then make comparisons in that overall quantification without ever having to make direct comparisons between genes in the two species.
              /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
              Salk Institute for Biological Studies, La Jolla, CA, USA */

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              31 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              32 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Working...
              X