Hello all,
Wondering if anyone out there could de-mystify the inner working of eXpress and edgeR for me or give me a better suggestion on how to do something. Basically, I'm looking at strand-specific RNA-seq data and trying to identify cases where there might be antisense transcripts involved in gene regulation. I'm looking across species and across tissues and these are non-model guys, so I'm having to use a reference transcriptome for all of the mapping. Isolating the reads that map to the + and - strand is done easily enough, but then what to do is where I've started making myself run around in circles. What I'm currently testing out is using eXpress to generate read counts for each transcript and then comparing those counts with something like edgeR. In the back of head though, I'm a bit worried that I am somehow violating some corrective/normalization factor in edgeR or using the wrong stat test as this is basically a test of distribution of reads within some libraries as well as between some libraries. Seems like as much of an contigency table test or binomial test cases as anything else and I think that's what edgeR is doing, but I'm a little unsure if it's 100% appropriate. Other than that, there the issue of whether I'm using the right "count" variable as well as, perhaps, the bias correct counts from eXpress might be a more accurate means of count estimation. Any thoughts there? Usually, those methods say to use the raw counts, but if you know there is bias in your mapping shouldn't you use the "unbiased" count instead? Finally, what about the fact that transcript coverage may be very different with sense/antisense gene regulation. By this I mean that lncRNAs/miRNAs might match only a portion the targeted transcript. Any thoughts on a good way to id that kind of pattern? I'm dealing with around hundrends of thousands of predicted transcripts here, so keep that in mind as well (i.e. visualizing each transcript in IGV a no go). Thanks in advance for any thoughts and/or insight.
Wondering if anyone out there could de-mystify the inner working of eXpress and edgeR for me or give me a better suggestion on how to do something. Basically, I'm looking at strand-specific RNA-seq data and trying to identify cases where there might be antisense transcripts involved in gene regulation. I'm looking across species and across tissues and these are non-model guys, so I'm having to use a reference transcriptome for all of the mapping. Isolating the reads that map to the + and - strand is done easily enough, but then what to do is where I've started making myself run around in circles. What I'm currently testing out is using eXpress to generate read counts for each transcript and then comparing those counts with something like edgeR. In the back of head though, I'm a bit worried that I am somehow violating some corrective/normalization factor in edgeR or using the wrong stat test as this is basically a test of distribution of reads within some libraries as well as between some libraries. Seems like as much of an contigency table test or binomial test cases as anything else and I think that's what edgeR is doing, but I'm a little unsure if it's 100% appropriate. Other than that, there the issue of whether I'm using the right "count" variable as well as, perhaps, the bias correct counts from eXpress might be a more accurate means of count estimation. Any thoughts there? Usually, those methods say to use the raw counts, but if you know there is bias in your mapping shouldn't you use the "unbiased" count instead? Finally, what about the fact that transcript coverage may be very different with sense/antisense gene regulation. By this I mean that lncRNAs/miRNAs might match only a portion the targeted transcript. Any thoughts on a good way to id that kind of pattern? I'm dealing with around hundrends of thousands of predicted transcripts here, so keep that in mind as well (i.e. visualizing each transcript in IGV a no go). Thanks in advance for any thoughts and/or insight.
Comment