SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
RNA-Seq: DEB: A web interface for RNA-seq digital gene expression analysis. Newsbot! Literature Watch 6 02-06-2012 05:55 AM
RNA-Seq: Improving RNA-Seq expression estimates by correcting for fragment bias. Newsbot! Literature Watch 0 03-18-2011 03:00 AM
RNA-Seq: Accurate Estimation of Expression Levels of Homologous Genes in RNA-seq Expe Newsbot! Literature Watch 0 03-10-2011 04:00 AM
RNA-Seq: RNA-Seq Analysis of Gene Expression and Alternative Splicing by Double-Rando Newsbot! Literature Watch 0 03-03-2011 03:00 AM
RNA-Seq: From RNA-seq reads to differential expression results. Newsbot! Literature Watch 0 12-24-2010 03:13 AM

Reply
 
Thread Tools
Old 11-16-2010, 11:05 PM   #1
ulz_peter
Senior Member
 
Location: Graz, Austria

Join Date: Feb 2010
Posts: 219
Default Expression in RNA-seq

Hi everyone,

We recently sent some samples for RNA-sequencing (Solid 50bp strand-specific) and we realized a gene that we consider important is not covered with sequencing reads. There are some reads aligning to intronic sequences and one to exonic but it seems that this happened by accident. We analyzed the alignments with Tophat and Cufflinks and not surprisingly the FPKM for this gene is 0.
Does anyone know the chances that this can happen by technical difficulties (it's tumor tissue and normal tissue of that organ should have that gene expressed quite highly according to UCSC). Most other genes we've looked at had quite a high coverage.
We got ~100M reads.
Any suggestions?

Best regards
ulz_peter is offline   Reply With Quote
Old 11-17-2010, 07:52 AM   #2
mgogol
Senior Member
 
Location: Kansas City

Join Date: Mar 2008
Posts: 197
Default

The reads that would have aligned there could have been thrown out because they mapped too many places in the genome, is that possible?

You could test this by cutting up the gene and trying to align it to the genome. Tophat by default will throw out reads that map > 40 places.

If you think it's expressed, you could test it with qPCR too.
mgogol is offline   Reply With Quote
Old 11-17-2010, 12:53 PM   #3
malachig
Senior Member
 
Location: WashU

Join Date: Aug 2010
Posts: 117
Default

I agree with mgogol. One possible explanation is mapping, although with 50mers you would think some part of the gene would be mappable. How big is this gene? How many exons?

Try a BLAT of some of the exons of this gene against the genome (do they hit everywhere?). You could also try mapping reads directly against a database of transcripts to see whether there are any matching reads (this would again suggest that the reason they are not coming through in your Tophat analysis is that they are ambiguous in the genome...). It would also allow you to identify a subset of reads to try different mapping approaches on and maybe eliminate the possibility of some technical issue(s) with your tophat run...
malachig is offline   Reply With Quote
Old 11-17-2010, 02:22 PM   #4
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Another possibility could be small exons, and then junctions come into play. I am not sure is tophat/cufflinks take a de novo approach that overcomes this?
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Old 11-17-2010, 02:32 PM   #5
mnkyboy
Member
 
Location: Seattle, WA

Join Date: Mar 2009
Posts: 87
Default

We have seen trimming down to 25 with bowtie and tophat can help with this.
mnkyboy is offline   Reply With Quote
Old 11-19-2010, 09:47 AM   #6
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Cool

One easy way to check if it is a mappability issue it to go to UCSC for hg19,
click the Mapability link under "Mapping and Sequence Tracks"
select the 50bp option,
look at your gene.


Otherwise, maybe you just found some biology in your experiment.
RockChalkJayhawk is offline   Reply With Quote
Old 11-19-2010, 11:41 AM   #7
malachig
Senior Member
 
Location: WashU

Join Date: Aug 2010
Posts: 117
Default

Tophat/cufflinks does try to predict novel exons and junctions. I'm not sure how well it handles small exons... Mapping small reads to the genome and then trying to find small exons is a challenging problem. A read that overlaps a small exon may require an alignment into three or more short blocks with potentially large gaps between (e.g. <exon>-intron-<exon>-intron-<exon>). Doing this with full length cDNAs (never mind short reads) can be a difficult. I suspect that most methods in use right now have fairly low sensitivity for short exons. One strategy to at least capture the short exons from known transcripts is to integrate alignment to transcripts and the genome (as advocated in Griffith et al.). This way, reads that hit short exons can be aligned to a transcript sequence without gaps which is much easier. This of course does not work for novel transcripts and relies on the accuracy and completion of transcript annotations for the species being analyzed.
malachig is offline   Reply With Quote
Old 11-19-2010, 12:12 PM   #8
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Exactly, and I think doing alignments to genome and transcripts raises an important question of which one to get priority. There would certainly be reads well aligned to both, then I suppose transcript alignment would get preference.

Similar issue comes when using a separate reads junction database or contamination database.. given equal alignments of a read to multiple datasets, which one should get preference
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Old 11-19-2010, 12:17 PM   #9
malachig
Senior Member
 
Location: WashU

Join Date: Aug 2010
Posts: 117
Default

Regarding the comment that the observation may simply be due to biology. Good point! Once you convince yourself that the observation is not due to an artifact of the analysis (mapability, small exons, etc.) you should definitely consider this.

One of the benefits of RNA-seq over microarrays (IMHO) is the excellent signal-to-noise ratio. I have observed many cases where a gene appears to be turned off in one condition and on in another condition (tissue type, treatment, etc.). Even in very deep RNA-seq libraries, i have been amazed to see just how few reads are reported for the 'off' state. And since in this case we have RNA-seq libraries (analyzed by the same method) for alternate conditions that do get covered by reads we can reassure ourselves that the lack of coverage is not due to some artifact of the analysis.

For example, consider this data set (ALEXA-seq by Griffith et al) consisting of four cell types (normal luminal breast, normal myepithelial breast, hESCs, and vHMECs). These libraries have ~150 million paired-end 75-mers each. The quality of these libraries was very high. After performing differential expression analysis we can find many examples where a particular gene has 0 reads (or just a few) in one of these conditions, and thousands in one or more of the others. And we can find 'off' genes for any of the four cell types. For example, CCL2 has 0 reads in the vHMECs library and 124,664 in the myoepithelial breast library. Similarly, COL17A1 has 776,907 reads in the vHMECs library but only 142 in the luminal epithelial breast tissue. You can explore further examples in this list of DE genes.
malachig is offline   Reply With Quote
Old 11-19-2010, 12:24 PM   #10
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Yes. A paired or comparison analysis really reduces a lot of sequencing and mapping biases, so differential expression comparisons are more or less accurate.

I really wish to give this trans-Abyss tool a try, seems like a one-stop solution to rna-seq analysis, but has a huge list of pre-reqs and config files to figure out
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:36 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO