Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • de novo transcriptome and diffrential expression

    Hello,

    We have Illumina de novo transcriptome data of 3 different samples. We united the 3 samples and created from them contigs using different methods and united them with CAP3.
    Now we want to check for differential expression in the 3 different samples using the contigs we defined. The problem is that there is redundancy in the contigs (due to either incomplete assembly or to real different transcripts from the same locus).
    So it is a problem to map the reads uniquely to our contigs.
    Any suggestions how to check for differential expression?

  • #2
    Originally posted by gfmgfm View Post
    Hello,

    We have Illumina de novo transcriptome data of 3 different samples. We united the 3 samples and created from them contigs using different methods and united them with CAP3.
    Now we want to check for differential expression in the 3 different samples using the contigs we defined. The problem is that there is redundancy in the contigs (due to either incomplete assembly or to real different transcripts from the same locus).
    So it is a problem to map the reads uniquely to our contigs.
    Any suggestions how to check for differential expression?
    You can merge all 3 datasets and assemble it together. Then use the assembled contigs as reference, re-map the reads from each dataset to the reference.

    Comment


    • #3
      Thanks a lot for the reply!
      This is what we did. But now,not sure how to map to the contigs as a reference. If we consider only unique tags, we get a very low percentage of uniqely aligned reads (probably because of some redundancy in the contigs and maybe because of real different transcripts of the same locus).
      Any suggestions?

      Comment


      • #4
        hm - depends a bit on what you want to do. You could either try to distribute multireads proportionally to the unique reads (what is a problem if the majority are multireads) or create a "non-redundant" reference (where you will sacrifice eventually truely different transcripts from a gene). For the latter you would have to group your transcripts together based on similarity and assemble them - the TGI clustering tool may help you to do this: http://compbio.dfci.harvard.edu/tgi/software/ .

        Comment


        • #5
          Thanks a lot! the TGI clustering tool looks very interesting. I am trying to run it.

          Comment


          • #6
            I am also going to be mapping short reads to assembled contigs from multiple samples- and my strategy is to assemble the contigs together in Trinity, then map the reads to the contigs. I would assume that a clustering step would improve the quality of the data.

            One question: I have tissue from two different organisms in some samples, so I have two transcriptomes. Would clustering take transcripts from different organisms for the same genes and cluster those?

            Comment


            • #7
              Denovo Transcriptome Assembly.

              Hi all,

              I have paired end RNA-Seq tophat run. so now i have to run cufflinks on them. I dont have a refernce GTF file, but i have the genome and transcriptome file for the same. Can anyone pls tell me how to create a reference transcript annotation file from genome and transcriptome file..??

              Thanking you in advance
              Regards
              Deepak.

              Comment


              • #8
                Deepak, I suggest you post your question in a thread that is relevant- if you have a reference genome you are not doing de novo transcriptome assembly, and you are also not looking at differential gene expression unless you have multiple samples.

                Comment


                • #9
                  Hi LizBent,

                  I guess this depends on the overlap between the 2 genomes you are analyzing If there are very similar genes, I guess they might cluster together.

                  Comment


                  • #10
                    Have you thought about just using the average kmer coverage from your original, pre-CAP3, assemblies? Even with the cap3 assemblies you could use the log files to determine the sequences that got merged, their lengths, their average kmer coverage, then a weighted average of the kmer coverage of the CAP3-merged transcript.

                    Then, you could go back through these averages and flag ones that have relatively large variances in the kmer coverage of the merged transcripts. That could be a clue into either isoforms being merged or spurious merging.

                    I thought about using CAP3 with our transcriptome assemblies for things without a reference, but I just didn't trust it. What program are you using to assembly this, btw? I've noticed that while Trinity is very selective and maybe "under-assembles" somethings, its not very redundant, especially compared to the strategy taken by ABySS/trans-abyss.

                    You'll still hit similar downstream problems with estimating abundance, but it might be a little easier if you get rid of the redundancy earlier in the assembly process.

                    Comment


                    • #11
                      Originally posted by Wallysb01 View Post
                      I thought about using CAP3 with our transcriptome assemblies for things without a reference, but I just didn't trust it. What program are you using to assembly this, btw? I've noticed that while Trinity is very selective and maybe "under-assembles" somethings, its not very redundant, especially compared to the strategy taken by ABySS/trans-abyss.

                      You'll still hit similar downstream problems with estimating abundance, but it might be a little easier if you get rid of the redundancy earlier in the assembly process.
                      Hi- so far I've been testing Trinity for my assemblies, though I was also thinking of using the Rnnotator pipeline (JGI Galaxy server), which uses Velvet. I'm not sure I understand what you mean by "redundant" - I'm new to all this, so would you mind explaining?

                      Comment


                      • #12
                        Originally posted by LizBent View Post
                        Hi- so far I've been testing Trinity for my assemblies, though I was also thinking of using the Rnnotator pipeline (JGI Galaxy server), which uses Velvet. I'm not sure I understand what you mean by "redundant" - I'm new to all this, so would you mind explaining?
                        Liz,

                        Differential coverage along your transcript and alternate splicing (plus the usual snps/indels) can lead to assemblers making several contigs out of the same gene. Sometimes they are alternate splice forms and sometimes its just an assembly artifact. Usually assemblers have some sort of merging step to try and reduce this, but again because of alternate splicing, you don't want to do this as aggressively as you can with genomic DNA.

                        From my experience Trinity does a pretty good job of giving you as complete of transcripts as possible with minimal redundancy. However, that comes at the cost of completeness. ABySS/trans-abyss does a very good job of just giving you everything, but its kinda messy. I haven't used Velvet based programs, so I can't speak to them.

                        If you don't have a reference genome, you're not done after assembly. I think you have to accept some attrition by doing things like extracting ORF and only keeping long ones (or even "complete" ones). You can also filter the contigs to only keep things that are <XX% similar and keeping only the longest contig of a the group using a tool like CD-HIT. Plus, doing a blast to take things that match up well with a closely related species. You could even filter your results to only take the best hit for each "reference" transcript, what ever you determine your reference to be.

                        It all depends on what you want the output to look like. Would you rather have fewer, more complete, non-redundant contigs at the cost of losing alternate splicing, and incomplete transcripts. Or do you want as much as possible, knowing you'll deal with redundancy.

                        Comment


                        • #13
                          Originally posted by gfmgfm View Post
                          Hello,

                          We have Illumina de novo transcriptome data of 3 different samples. We united the 3 samples and created from them contigs using different methods and united them with CAP3.
                          Now we want to check for differential expression in the 3 different samples using the contigs we defined. The problem is that there is redundancy in the contigs (due to either incomplete assembly or to real different transcripts from the same locus).
                          So it is a problem to map the reads uniquely to our contigs.
                          Any suggestions how to check for differential expression?
                          We are having a similar experience. We de novo assembled a transcriptome we are using as a "reference" but when we map reads to that we get so many multi-mapped reads that many transcripts that we know are there (RT-PCR, Northerns, In situs) do not even show up as present in our in silco analysis.

                          We have tried various methods of reducing redundancy in our reference such as taking only the longest sequence from each cluster, using various contig assembly programs (CAP3 etc.)... these help... but they do not seem to solve the problem completely.

                          Since it has been sometime since your original post I was wondering what your experience has been with this issue.

                          How far did you take your elimination of redundant transcripts?

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          30 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          32 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          28 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          53 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X