Hi All
I got some yeast RNA-IP-seq data from two conditions to analyze. I would like to do differential expression analysis by comparing the counts of the IP samples directly (without the using of the Input samples).
I used HTseq to generate counting tables, and then continue with DEseq for DE analysis.
It appears that between 70-80% of the reads are ambiguous (could have been assigned to more than one feature).
Is this something someone have seen in yeast?
The sequencing is single-stranded, and not strand specific.
After I removed duplicates, the number of ambiguous reads dropped down to ~2%. Yeast genes are relatively short (average of 1300bp) so I suppose that bulk of the duplicates are not PCR dup.
I checked the yeast gene database, and it seems that ~15% of the genes are transcribed from overlapping but opposing regions, this is high number I think, but still not enough to explain the huge number of reads falling within these genes.
I also started to think that maybe its related to some biological function of the protein we did IP against, so I checked the Input samples, and they also contain high amount of ambiguous reads.
Any thoughts???
Thanks
Mali
I got some yeast RNA-IP-seq data from two conditions to analyze. I would like to do differential expression analysis by comparing the counts of the IP samples directly (without the using of the Input samples).
I used HTseq to generate counting tables, and then continue with DEseq for DE analysis.
It appears that between 70-80% of the reads are ambiguous (could have been assigned to more than one feature).
Is this something someone have seen in yeast?
The sequencing is single-stranded, and not strand specific.
After I removed duplicates, the number of ambiguous reads dropped down to ~2%. Yeast genes are relatively short (average of 1300bp) so I suppose that bulk of the duplicates are not PCR dup.
I checked the yeast gene database, and it seems that ~15% of the genes are transcribed from overlapping but opposing regions, this is high number I think, but still not enough to explain the huge number of reads falling within these genes.
I also started to think that maybe its related to some biological function of the protein we did IP against, so I checked the Input samples, and they also contain high amount of ambiguous reads.
Any thoughts???
Thanks
Mali