Hi Everyone,
1. I tried to use DEXSeq to look into differential exon usage. Frist, I tried to creat my own ExonCountSet using aligned sam files, and an annotated gtf file (generated by dexseq_prepare_annotation.py from the same gtf I used for tophat alignment).
However, I got a lot of _empty. I showed part of the result in the following:
ENSG00000000003:001 387
ENSG00000000003:002 270
ENSG00000000003:003 194
ENSG00000000003:004 79
ENSG00000000003:005 36
ENSG00000000003:006 59
ENSG00000000003:007 60
ENSG00000000003:008 32
ENSG00000000003:009 42
ENSG00000000003:010 32
ENSG00000000003:011 20
ENSG00000000003:012 25
ENSG00000000003:013 1
ENSG00000000003:014 9
ENSG00000000003:015 3
ENSG00000000003:016 0
ENSG00000000003:017 0
ENSG00000000003:018 0
ENSG00000000003:019 0
ENSG00000000005:001 0
ENSG00000000005:002 0
ENSG00000000005:003 20
ENSG00000000005:004 0
ENSG00000000005:005 0
ENSG00000000005:006 0
......
ENSG00000258370:001 0
ENSG00000258370:002 0
ENSG00000258371:001 0
ENSG00000258371:002 0
ENSG00000258372:001 0
ENSG00000258372:002 0
ENSG00000258372:003 0
ENSG00000258372:004 0
ENSG00000258372:005 0
ENSG00000258372:006 0
ENSG00000258372:007 0
ENSG00000258372:008 0
ENSG00000258372:009 0
ENSG00000258372:010 0
ENSG00000258372:011 0
ENSG00000258373:001 119
ENSG00000258374:001 0
ENSG00000258374:002 0
ENSG00000258375:001 0
_ambiguous 44367
_empty 140632650
_lowaqual 7773318
_notaligned 0
Does that mean it discard 140 million reads? How could I improve this? When I use HTseq to calculate gene-level count, the no_feature is around 70 million.
2. If I want to compare two cancer samples along with their normals, i.e.
Cancer 1/normal 1 (cancer 1 replica/normal 1 rep) versus Cancer 2/normal 2 (cancer 2 replica/normal 2 rep) ....
Could DEXSeq do this? Or if other tools ?
Any suggestions are appreciated. Thanks!
1. I tried to use DEXSeq to look into differential exon usage. Frist, I tried to creat my own ExonCountSet using aligned sam files, and an annotated gtf file (generated by dexseq_prepare_annotation.py from the same gtf I used for tophat alignment).
However, I got a lot of _empty. I showed part of the result in the following:
ENSG00000000003:001 387
ENSG00000000003:002 270
ENSG00000000003:003 194
ENSG00000000003:004 79
ENSG00000000003:005 36
ENSG00000000003:006 59
ENSG00000000003:007 60
ENSG00000000003:008 32
ENSG00000000003:009 42
ENSG00000000003:010 32
ENSG00000000003:011 20
ENSG00000000003:012 25
ENSG00000000003:013 1
ENSG00000000003:014 9
ENSG00000000003:015 3
ENSG00000000003:016 0
ENSG00000000003:017 0
ENSG00000000003:018 0
ENSG00000000003:019 0
ENSG00000000005:001 0
ENSG00000000005:002 0
ENSG00000000005:003 20
ENSG00000000005:004 0
ENSG00000000005:005 0
ENSG00000000005:006 0
......
ENSG00000258370:001 0
ENSG00000258370:002 0
ENSG00000258371:001 0
ENSG00000258371:002 0
ENSG00000258372:001 0
ENSG00000258372:002 0
ENSG00000258372:003 0
ENSG00000258372:004 0
ENSG00000258372:005 0
ENSG00000258372:006 0
ENSG00000258372:007 0
ENSG00000258372:008 0
ENSG00000258372:009 0
ENSG00000258372:010 0
ENSG00000258372:011 0
ENSG00000258373:001 119
ENSG00000258374:001 0
ENSG00000258374:002 0
ENSG00000258375:001 0
_ambiguous 44367
_empty 140632650
_lowaqual 7773318
_notaligned 0
Does that mean it discard 140 million reads? How could I improve this? When I use HTseq to calculate gene-level count, the no_feature is around 70 million.
2. If I want to compare two cancer samples along with their normals, i.e.
Cancer 1/normal 1 (cancer 1 replica/normal 1 rep) versus Cancer 2/normal 2 (cancer 2 replica/normal 2 rep) ....
Could DEXSeq do this? Or if other tools ?
Any suggestions are appreciated. Thanks!
Comment