In some of my recent tests using cufflinks for transcript/gene expression estimates using a GTF reference I've seen an odd correlation between the number of transcripts/genes listed as "FAIL". I've seen this in two situations: 1) same sample using the first 10 million, 20 million, 30 million, etc... and a set of new samples with various total fragment counts. What I've seen, and can be seen in the figure, is that the number of "FAIL" transcripts/genes increases as the number of fragments increases. To me this seems counter intuitive and suggests some kind of an error. The other worrisome issue is the "FAIL" transcripts/genes are biased to high expressed genes compared to the "OK" genes in genes with multiple known transcript variants.
There was one other post about this issue but I was wondering if others have noticed this behavior. Is it version specific?
There was one other post about this issue but I was wondering if others have noticed this behavior. Is it version specific?
Comment