Alex234 08-30-2013 02:22 AM

Decreased transcription of short intergenic patches - a common artefact?
Hello, I've compared transcription between a WT and KO ES cell line by RNA-Seq using DESeq and a GTF file assembled from cufflinks and have found a decrease in the transcription of many short intergenic regions (they only have XLOC ids, and having checked in genome browsers there do not seem to be any genes there). So my questions are:

(a) Is this some sort of common artefact?

(b) How can I look at how the reads are actually assembled within each of those regions, and would that tell me whether they are actually being transcribed or if it's just some sort of artefact?

(c) When ranked by fold change (and with a cut off of p<0.05) these short sequences are ranked near the top of the downregulated genes, but when ranked by p-value, they rank less highly - general thoughts on ranking DEGs by fold change vs. p-value?



rboettcher 08-30-2013 02:58 AM

Hi Alex,

just to be sure: did you use Cufflink's FPKM for DESeq?

(a) Did you check the quality of these reads? Did both conditions give equal output (number of reads, quality)?

(b) You can use IGV and load the GTF file from cufflinks (maybe have to convert it to BED first)

(c) Ranking by p-value is commonly done, but I would argue that statistically speaking, ranking by FC makes more sense after applying a fixed p-value cut-off (such as 0.05 or 0.01). Then, either gene expression is significantly different between two condition or it is not, so not situation where "gene A is more significant than gene B".

Alex234 08-30-2013 03:17 AM


(a) Yes both conditions gave equal quality and quantity of reads, but I'm not sure how to check the quality of the reads - HTSeq converted them to 'counts' (1 count = 1 read?)

(b) Thanks, will try!

(c) Thanks, that's what I thought!

dpryan 08-30-2013 01:20 PM

In addition to what rboettcher said, have a look at where some of these intergenic regions are in relation to known/predicted repeats motifs, such as IAPs or LINES (there are repeatmasker tracks available). I suspect that a lot of these a just these repeat elements. Depending on the nature of your KO lines, these may or may not be functionally meaningful (there's a lot of transcriptional noise that never amounts to much of anything).

Alex234 09-03-2013 02:14 AM

Thanks dpryan - which program are these repeatmasker tracks in?

dpryan 09-03-2013 03:10 AM

The easiest thing to do is to download them from the UCSC genome browser. A simple method is to use the table browser, where the repeatmasker information is under "variation and repeats".

