My colleagues and I are interested in the distribution of reads that fall within the 5'UTR, CDS and 3'UTR of a given transcript. We made a feature count table with HTSeq using the --type=<feature type> argument. The invocation looks like:
We did the same for 5'UTR and 3'UTR.
My question is, what happens when a read spans both the 5'UTR and the CDS? My understanding of the manual is that the read will be counted as ambiguous in union mode and as 5'UTR in intersection-strict or intersection-nonempty mode, but it's important for our downstream analysis to be certain of this.
Code:
python -m HTSeq.scripts.count --format bam --order pos --stranded yes --type CDS file.bam pathto/ucsc_refseq.gtf > CDS_readcount.txt
My question is, what happens when a read spans both the 5'UTR and the CDS? My understanding of the manual is that the read will be counted as ambiguous in union mode and as 5'UTR in intersection-strict or intersection-nonempty mode, but it's important for our downstream analysis to be certain of this.
Comment