Dear all, I have 2 RNA-seq libraries (40 bp single end) and a genome annotation, I am interested in differential expression.
I run:
1- Bowtie/DESeq at the gene level
2-TopHat/Cufflinks at the transcript level.
I got very different results (not only in terms of quantification, but also "direction" of changes)-see example below. I was expecting differences, but not this much.
Which method do you think best suites the type of data I have?
Is is appropriate to try to run TopHat with 40bp-single end reads?
The mean N. of reads given by DESeq does not account for transcript length, would this prevent comparison of transcript quantification levels within a library?
thanks in advance for any reply,
Bowtie
Raw N. reads
gene Transcript Transcript length conditionA conditionB
1 1 1590 297 242
2 2 198 0 0
3 3 2048 383 500
4 4 2034 283 109
5 5-a 788 86 137
5 5-b 1268
6 6 2087 303 640
7 7 1656 0 0
8 8 1809 316 335
9 9 761 0 0
10 10-a 735 658 386
10 10-b 524
TopHat-Cufflinks
FPKM-A FPKM-B
gene Transcript Transcript length conditionA conditionB ln(fold_ch) AvB
1 1 1590 20.526 11.7229 0.560149
2 2 198 45.8285 0 1.79769e+308
3 3 2048 17.5533 9.35482 0.62935
4 4 2034 28.2751 9.71151 1.06867
5 5-a 788 6.67631 1.6504 1.39755
5 5-b 1268 32.5 4.01143 2.09209
6 6 2087 53.4758 3.36856 2.76474
7 7 1656 0.110199 0 1.79769e+308
8 8 1809 16.365 15.5165 0.0532368
9 9 761 2.85777 0 1.79769e+308
10 10-a 735 6.11169 3.07078 0.688272
10 10-b 524 818.778 1315.66 -0.474281
Bowtie-DESeq
MeanReadsA MeanReadsB
gene Transcript Transcript length conditionA conditionB Log2FCAvB
1 1 1590 284.9435568 252.2394288 -0.175228351
2 2 198 0 0 0
3 3 2048 367.4524655 521.1558445 0.503001955
4 4 2034 271.511874 113.6119741 -1.249561315
5 5-a 788 82.50890871 142.7967014 0.784028564
5 5-b 1268
6 6 2087 290.6999923 667.079481 1.195534401
7 7 1656 0 0 0
8 8 1809 303.1722692 349.1744158 0.203185051
9 9 761 0 0 0
10 10-a 735 631.2890922 402.332312 -0.648615343
10 10-b 524
I run:
1- Bowtie/DESeq at the gene level
2-TopHat/Cufflinks at the transcript level.
I got very different results (not only in terms of quantification, but also "direction" of changes)-see example below. I was expecting differences, but not this much.
Which method do you think best suites the type of data I have?
Is is appropriate to try to run TopHat with 40bp-single end reads?
The mean N. of reads given by DESeq does not account for transcript length, would this prevent comparison of transcript quantification levels within a library?
thanks in advance for any reply,
Bowtie
Raw N. reads
gene Transcript Transcript length conditionA conditionB
1 1 1590 297 242
2 2 198 0 0
3 3 2048 383 500
4 4 2034 283 109
5 5-a 788 86 137
5 5-b 1268
6 6 2087 303 640
7 7 1656 0 0
8 8 1809 316 335
9 9 761 0 0
10 10-a 735 658 386
10 10-b 524
TopHat-Cufflinks
FPKM-A FPKM-B
gene Transcript Transcript length conditionA conditionB ln(fold_ch) AvB
1 1 1590 20.526 11.7229 0.560149
2 2 198 45.8285 0 1.79769e+308
3 3 2048 17.5533 9.35482 0.62935
4 4 2034 28.2751 9.71151 1.06867
5 5-a 788 6.67631 1.6504 1.39755
5 5-b 1268 32.5 4.01143 2.09209
6 6 2087 53.4758 3.36856 2.76474
7 7 1656 0.110199 0 1.79769e+308
8 8 1809 16.365 15.5165 0.0532368
9 9 761 2.85777 0 1.79769e+308
10 10-a 735 6.11169 3.07078 0.688272
10 10-b 524 818.778 1315.66 -0.474281
Bowtie-DESeq
MeanReadsA MeanReadsB
gene Transcript Transcript length conditionA conditionB Log2FCAvB
1 1 1590 284.9435568 252.2394288 -0.175228351
2 2 198 0 0 0
3 3 2048 367.4524655 521.1558445 0.503001955
4 4 2034 271.511874 113.6119741 -1.249561315
5 5-a 788 82.50890871 142.7967014 0.784028564
5 5-b 1268
6 6 2087 290.6999923 667.079481 1.195534401
7 7 1656 0 0 0
8 8 1809 303.1722692 349.1744158 0.203185051
9 9 761 0 0 0
10 10-a 735 631.2890922 402.332312 -0.648615343
10 10-b 524
Comment