Seqanswers Leaderboard Ad

**blindtiger454** · 02-03-2011, 08:39 PM

From what I've seen in the RSEM documentation/output examples, I will have to manually apply calculations to normalize triplicate samples, and to determine expression change between samples. I may need help in that area.
I still may try to create a fake transcripts.gtf file for cufflinks. A Newbler assembly using just 454 reads usually creates isogroups(genes) where ~70% are single transcript. These can easily be entered into the transcript.gtf file using fake genomic coordinates. And I suspect that around 30% of multi-transcript isogroups are retrotransposons. The rest of the isogroups I can throwout if they are irrelevant genes (such as housekeeping), or I can align them to best match rice(close relative where genome has been sequenced) transcripts to determine where exons/overlaps potentially are. Does this sound rational?

**blindtiger454** · 02-04-2011, 11:48 AM

I am also somewhat confused how transcript overlaps will affect the program. I've seen instances where a transcript will use half an exon, and another transcript will use the full exon. Then there might be instances where a reverse strand transcript overlaps transcript on other strand, and they share a coding region (I'm sure it's extremely rare, maybe in cases of paralogs). I'm not sure how it affects the statistics, possibly regarding sequence/exon length & number, and instances where a read will map to more than one transcript. I read somewhere that many programs will just throw out reads that map to more than one gene/transcript or if it can't resolve where to map. RSEM tries to resolve this.
For now, I am interested in gene expression differences, not transcript. Once candidate genes that show expression change are singled out, a fine tuned pipeline can be devised to catch changes in isoform expression among these genes. I cringe saying this though. I can think of many circumstances where, say in switchgrass, one isoform will turn off and another will increase expression during drought, where the only difference between the two is one small exon. I'm not sure this would be detected at the gene and/or isoform level in cuffdiff without overlapping transcript coordinates, correct??

**lahoman** · 02-08-2011, 12:50 PM

Hi, Adarob,

After I use tophat to map Human RNA-Seq to the genome, then cufflinks for the transcript analysis, I checked the file of transcripts.expr. There are 263,506 transcripts. That's a lot. Do I need to filter the results based on FPKM? What criteria should I use? 1.0 or 0.5? I have no idea about it.

Thanks,

Lahoman

**whfwind** · 03-02-2011, 04:25 AM

I believe you can not filter the result base on the FPKM just by 1.0, or 0.5, in the cufflink paer, they said the FPKM >=15 is considered as moderately abundant transcripts. So I guess if you want to extract abundant transcripts you can use 15 or more, but there is another exception, some transcripts is low expression but is meaningful for Human

**papori** · 05-30-2011, 03:33 AM

Originally posted by adarob View Post

Cuffdiff is only meant for RNA-Seq analysis where a reference genome is available. You can probably make it work for you by altering your alignment file to give your transcripts "dummy" genomic coordinates. Alternatively, I recommend you look at RSEM (http://deweylab.biostat.wisc.edu/rsem/) which is meant to estimate abundances purely from transcript sequences.

Hey adarob,
i look into RSEM, but i didn't find the option that checking differential expression between samples without reference.
can you give an example?

thanks

**blindtiger454** · 05-30-2011, 06:31 PM

The examples are in the rsem-prepare-reference section of the web documention at

rsem-prepare-reference

http://deweylab.biostat.wisc.edu/rsem/rsem-prepare-reference.html

There is an example where the dataset only consists of ESTs. After formatting the reference data set, you would issue the calculate commands, examples found at

rsem-calculate-expression

http://deweylab.biostat.wisc.edu/rsem/rsem-calculate-expression.html

In the our case, we are using plants sequences. A simple example of formatting and expression calculation would be as follows:

rsem-prepare-reference --no-polyA --bowtie-path /home/bowtie plantESTs.fna plantTranscripts

rsem-calculate-expression -p 4 --phred64-quals --bowtie-path /home/bowtie /home/solexa/controlLane1.fastq plantTranscripts controlLane1Output

Hope this helps! Then you just parse the expression values out of the output files, put them into a matrix/table, and use that as input in the bioconductor package of your choose. We are currently using DESeq

**tboothby** · 02-03-2012, 08:50 AM

Originally posted by peromhc View Post

I have been testing an approach where I use Tophat to align reads back to contigs (treating the like many many small chromosomes in essence) and then using Cufflinks-compare-diff to look at differential expression.. This is in a novel transcriptome i.e. no reference. Seems to work OK enough-- although there may be some issue with the statistical detection of differneces that I have yet to explore.

Using this approach have you seen any issues with cufflinks assembly of transcripts from your mapped reads? We tried the same approach and found an inverse relationship between mapping coverage and cufflinks assembly. Still looking into why.

**chadn737** · 02-03-2012, 09:15 AM

Using Cufflinks-cuffdiff in this situation I think unnecessarily complicates matters. Particularly since Cufflinks looks for novel isoforms, but using a denovo assembled transcriptome, you don't have genomic coordinates to aid in this.

If all you want is to know differential expression, then why not align reads back to your contigs using any assembler (bowtie, bwa, etc). Extract the number of reads mapping to each contig, and then use the raw counts (for each contig) to find differential expression between contigs using DESeq or EdgeR?

**tboothby** · 02-03-2012, 09:32 AM

Originally posted by chadn737 View Post

If all you want is to know differential expression, then why not align reads back to your contigs using any assembler (bowtie, bwa, etc). Extract the number of reads mapping to each contig, and then use the raw counts (for each contig) to find differential expression between contigs using DESeq or EdgeR?

Our "reference transcriptome" was assembled using reads combined across all time points of development. We wanted to be able to map reads from specific time points to the reference, along with expression information we wanted to try to construct transcripts to potentially find isoform variants at different time points. Maybe this was not a good idea on our part?

We have since moved on to Bowtie -> RSEM. The data seems to make more sense now that we are not trying to assemble transcripts from mapped reads. However, I feel like it would allay some of my doubts, if I could see that mapped fragments could be assembled into decent transcripts.

**chadn737** · 02-03-2012, 10:00 AM

Originally posted by tboothby View Post

Our "reference transcriptome" was assembled using reads combined across all time points of development. We wanted to be able to map reads from specific time points to the reference, along with expression information we wanted to try to construct transcripts to potentially find isoform variants at different time points. Maybe this was not a good idea on our part?

We have since moved on to Bowtie -> RSEM. The data seems to make more sense now that we are not trying to assemble transcripts from mapped reads. However, I feel like it would allay some of my doubts, if I could see that mapped fragments could be assembled into decent transcripts.

Do you have a reference genome.....I'm assuming not?

**tboothby** · 02-03-2012, 10:24 AM

Originally posted by chadn737 View Post

Do you have a reference genome.....I'm assuming not?

Unfortunately, we don't.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News