As there is no consensus in the literature on this I thought it might be worth it to try to generate some discussion here. I am interested in finding quantitative changes in alternative splicing (AS) across different genotypes or treatments. I am only interested in events and not isoforms. Essentially there are 3 main areas to deal with;
1) What to count. Split reads are tempting but make up a very small portion of the total aligned reads, so using only them you lose a lot of information and may suffer from more stochastic coverage. Read count in exons or introns has been used but what about reads that fall into both an exon or intron (thinking about intron retention in plants)? If you remove these detection of small sites, like alternative 5' or 3' sites may suffer. Has anyone thought of using a nt count approach?
2) How to normalize How should one correct for different expression levels of genes (if you are only interested in finding different AS)? How to deal with different coverage in your different libraries? I've been using total number of reads to the gene vs the number in the exon/intron of interest.
3) How to compare What stats are people using? ANOVA has been used by Harr and Turner. What about using a binomial or negative binomial? Could using the DESeq package for this be reasonable (send it read counts for all the exons/introns of one gene and find ones that are differentially expressed as oppose to sending it all the genes in a genome)?
I know that is a lot of questions for one post but you (might) know the fishermans saying "Good things come to those who bait".
1. Harr B, Turner LM. Genome-wide analysis of alternative splicing evolution among Mus subspecies. Molecular ecology. 2010;19 Suppl 1:228-39. Available at: http://www.ncbi.nlm.nih.gov/pubmed/20331782.
1) What to count. Split reads are tempting but make up a very small portion of the total aligned reads, so using only them you lose a lot of information and may suffer from more stochastic coverage. Read count in exons or introns has been used but what about reads that fall into both an exon or intron (thinking about intron retention in plants)? If you remove these detection of small sites, like alternative 5' or 3' sites may suffer. Has anyone thought of using a nt count approach?
2) How to normalize How should one correct for different expression levels of genes (if you are only interested in finding different AS)? How to deal with different coverage in your different libraries? I've been using total number of reads to the gene vs the number in the exon/intron of interest.
3) How to compare What stats are people using? ANOVA has been used by Harr and Turner. What about using a binomial or negative binomial? Could using the DESeq package for this be reasonable (send it read counts for all the exons/introns of one gene and find ones that are differentially expressed as oppose to sending it all the genes in a genome)?
I know that is a lot of questions for one post but you (might) know the fishermans saying "Good things come to those who bait".
1. Harr B, Turner LM. Genome-wide analysis of alternative splicing evolution among Mus subspecies. Molecular ecology. 2010;19 Suppl 1:228-39. Available at: http://www.ncbi.nlm.nih.gov/pubmed/20331782.
Comment