Hi All,
I have ran:
tophat (w transcriptome index) -> cufflinks (-g & -M) -> cuffmerge -> cuffquant (-M) -> cuffnorm
on a set of >300 samples. In the resulting counts table I have duplicated entries for many genes (thousands).
For example:
These genes has multiple transcripts with different transcription start sites, but several overlapping exons.
These duplicated genes are also in the merged GTF resulting from the cuffmerge step. Should these not be merged, and do you have any idea of why they were not?
I've used tophat v. 2.0.12 (bowtie2 v.2.2.3), cufflinks v.2.2.1 with hg19 and a gencode GTF (v.19). No warnings or errors regarding duplicated IDs along the way.
Any ideas or comments very welcome.
Thanks,
Bo
I have ran:
tophat (w transcriptome index) -> cufflinks (-g & -M) -> cuffmerge -> cuffquant (-M) -> cuffnorm
on a set of >300 samples. In the resulting counts table I have duplicated entries for many genes (thousands).
For example:
$ grep MORN1 ./cuffnorm_out/genes.attr_table
XLOC_000065 - - XLOC_000065 MORN1 TSS126 1:2252451-2323157 -
XLOC_000067 - - XLOC_000067 MORN1 TSS129 1:2252451-2323157 -
XLOC_000068 - - XLOC_000068 MORN1 TSS130 1:2252451-2323157 -
XLOC_000069 - - XLOC_000069 MORN1 TSS131 1:2252451-2323157 -
XLOC_002153 - - XLOC_002153 MORN1,RP4-740C4.6,RP4-740C4.9 TSS5397,TSS5398,TSS5399,TSS5400 1:2252451-2323157 -
$ grep RP11-206L10.9 ./cuffnorm_out/genes.attr_table
XLOC_000013 - - XLOC_000013 RP11-206L10.9 TSS15 1:645707-762902 -
XLOC_000014 - - XLOC_000014 RP11-206L10.9 TSS16,TSS17 1:645707-762902 -
XLOC_000065 - - XLOC_000065 MORN1 TSS126 1:2252451-2323157 -
XLOC_000067 - - XLOC_000067 MORN1 TSS129 1:2252451-2323157 -
XLOC_000068 - - XLOC_000068 MORN1 TSS130 1:2252451-2323157 -
XLOC_000069 - - XLOC_000069 MORN1 TSS131 1:2252451-2323157 -
XLOC_002153 - - XLOC_002153 MORN1,RP4-740C4.6,RP4-740C4.9 TSS5397,TSS5398,TSS5399,TSS5400 1:2252451-2323157 -
$ grep RP11-206L10.9 ./cuffnorm_out/genes.attr_table
XLOC_000013 - - XLOC_000013 RP11-206L10.9 TSS15 1:645707-762902 -
XLOC_000014 - - XLOC_000014 RP11-206L10.9 TSS16,TSS17 1:645707-762902 -
These duplicated genes are also in the merged GTF resulting from the cuffmerge step. Should these not be merged, and do you have any idea of why they were not?
I've used tophat v. 2.0.12 (bowtie2 v.2.2.3), cufflinks v.2.2.1 with hg19 and a gencode GTF (v.19). No warnings or errors regarding duplicated IDs along the way.
Any ideas or comments very welcome.
Thanks,
Bo