Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    From what I've seen in the RSEM documentation/output examples, I will have to manually apply calculations to normalize triplicate samples, and to determine expression change between samples. I may need help in that area.
    I still may try to create a fake transcripts.gtf file for cufflinks. A Newbler assembly using just 454 reads usually creates isogroups(genes) where ~70% are single transcript. These can easily be entered into the transcript.gtf file using fake genomic coordinates. And I suspect that around 30% of multi-transcript isogroups are retrotransposons. The rest of the isogroups I can throwout if they are irrelevant genes (such as housekeeping), or I can align them to best match rice(close relative where genome has been sequenced) transcripts to determine where exons/overlaps potentially are. Does this sound rational?
    Last edited by blindtiger454; 02-03-2011, 08:42 PM.

    Comment


    • #17
      I am also somewhat confused how transcript overlaps will affect the program. I've seen instances where a transcript will use half an exon, and another transcript will use the full exon. Then there might be instances where a reverse strand transcript overlaps transcript on other strand, and they share a coding region (I'm sure it's extremely rare, maybe in cases of paralogs). I'm not sure how it affects the statistics, possibly regarding sequence/exon length & number, and instances where a read will map to more than one transcript. I read somewhere that many programs will just throw out reads that map to more than one gene/transcript or if it can't resolve where to map. RSEM tries to resolve this.
      For now, I am interested in gene expression differences, not transcript. Once candidate genes that show expression change are singled out, a fine tuned pipeline can be devised to catch changes in isoform expression among these genes. I cringe saying this though. I can think of many circumstances where, say in switchgrass, one isoform will turn off and another will increase expression during drought, where the only difference between the two is one small exon. I'm not sure this would be detected at the gene and/or isoform level in cuffdiff without overlapping transcript coordinates, correct??

      Comment


      • #18
        Hi, Adarob,

        After I use tophat to map Human RNA-Seq to the genome, then cufflinks for the transcript analysis, I checked the file of transcripts.expr. There are 263,506 transcripts. That's a lot. Do I need to filter the results based on FPKM? What criteria should I use? 1.0 or 0.5? I have no idea about it.

        Thanks,

        Lahoman

        Comment


        • #19
          I believe you can not filter the result base on the FPKM just by 1.0, or 0.5, in the cufflink paer, they said the FPKM >=15 is considered as moderately abundant transcripts. So I guess if you want to extract abundant transcripts you can use 15 or more, but there is another exception, some transcripts is low expression but is meaningful for Human

          Comment


          • #20
            Originally posted by adarob View Post
            Cuffdiff is only meant for RNA-Seq analysis where a reference genome is available. You can probably make it work for you by altering your alignment file to give your transcripts "dummy" genomic coordinates. Alternatively, I recommend you look at RSEM (http://deweylab.biostat.wisc.edu/rsem/) which is meant to estimate abundances purely from transcript sequences.
            Hey adarob,
            i look into RSEM, but i didn't find the option that checking differential expression between samples without reference.
            can you give an example?

            thanks

            Comment


            • #21
              The examples are in the rsem-prepare-reference section of the web documention at

              There is an example where the dataset only consists of ESTs. After formatting the reference data set, you would issue the calculate commands, examples found at


              In the our case, we are using plants sequences. A simple example of formatting and expression calculation would be as follows:

              rsem-prepare-reference --no-polyA --bowtie-path /home/bowtie plantESTs.fna plantTranscripts

              rsem-calculate-expression -p 4 --phred64-quals --bowtie-path /home/bowtie /home/solexa/controlLane1.fastq plantTranscripts controlLane1Output

              Hope this helps! Then you just parse the expression values out of the output files, put them into a matrix/table, and use that as input in the bioconductor package of your choose. We are currently using DESeq

              Comment


              • #22
                Originally posted by peromhc View Post
                I have been testing an approach where I use Tophat to align reads back to contigs (treating the like many many small chromosomes in essence) and then using Cufflinks-compare-diff to look at differential expression.. This is in a novel transcriptome i.e. no reference. Seems to work OK enough-- although there may be some issue with the statistical detection of differneces that I have yet to explore.
                Using this approach have you seen any issues with cufflinks assembly of transcripts from your mapped reads? We tried the same approach and found an inverse relationship between mapping coverage and cufflinks assembly. Still looking into why.

                Comment


                • #23
                  Using Cufflinks-cuffdiff in this situation I think unnecessarily complicates matters. Particularly since Cufflinks looks for novel isoforms, but using a denovo assembled transcriptome, you don't have genomic coordinates to aid in this.

                  If all you want is to know differential expression, then why not align reads back to your contigs using any assembler (bowtie, bwa, etc). Extract the number of reads mapping to each contig, and then use the raw counts (for each contig) to find differential expression between contigs using DESeq or EdgeR?

                  Comment


                  • #24
                    Originally posted by chadn737 View Post
                    If all you want is to know differential expression, then why not align reads back to your contigs using any assembler (bowtie, bwa, etc). Extract the number of reads mapping to each contig, and then use the raw counts (for each contig) to find differential expression between contigs using DESeq or EdgeR?
                    Our "reference transcriptome" was assembled using reads combined across all time points of development. We wanted to be able to map reads from specific time points to the reference, along with expression information we wanted to try to construct transcripts to potentially find isoform variants at different time points. Maybe this was not a good idea on our part?

                    We have since moved on to Bowtie -> RSEM. The data seems to make more sense now that we are not trying to assemble transcripts from mapped reads. However, I feel like it would allay some of my doubts, if I could see that mapped fragments could be assembled into decent transcripts.

                    Comment


                    • #25
                      Originally posted by tboothby View Post
                      Our "reference transcriptome" was assembled using reads combined across all time points of development. We wanted to be able to map reads from specific time points to the reference, along with expression information we wanted to try to construct transcripts to potentially find isoform variants at different time points. Maybe this was not a good idea on our part?

                      We have since moved on to Bowtie -> RSEM. The data seems to make more sense now that we are not trying to assemble transcripts from mapped reads. However, I feel like it would allay some of my doubts, if I could see that mapped fragments could be assembled into decent transcripts.
                      Do you have a reference genome.....I'm assuming not?

                      Comment


                      • #26
                        Originally posted by chadn737 View Post
                        Do you have a reference genome.....I'm assuming not?
                        Unfortunately, we don't.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        18 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        22 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        17 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        49 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X