hi all, I was playing around with the very useful tool FluxSimulator to generate simulated RNA-Seq reads. And here I have two questions.
(1) I always got a large fraction of polyA reads (around 35%-38%), which seems too high (pleas see the bottom for the parameters I used). Then I tried to change the parameter "polyA scale" from the default 300 to 30. And I got few polyA reads. So I am a bit confused: 300 should be a reasonable value but why so many polyA reads are generated?
(2) My second question is about calculation of the ground truth RPKM/FPKM from the simulated reads. Assume M is the total number of reads and n is the number of reads originated from a transcript t, then RPKM=(10^9*n)/(length(t)*M)? The polyA reads should be removed from the calculation. I wonder if this is the way to get the "TRUE" transcript expressions.
Thank you!
**********PAR file************************
REF_FILE_NAME /Users/z/Downloads/FluxSimulator/chr1/another/another.gtf
PRO_FILE_NAME /Users/z/Downloads/FluxSimulator/chr1/another/another.pro
LIB_FILE_NAME /Users/z/Downloads/FluxSimulator/chr1/another/another.lib
SEQ_FILE_NAME /Users/z/Downloads/FluxSimulator/chr1/another/another.bed
GEN_DIR /Users/zhaohao/Downloads/FluxSimulator/chr1
TMP_DIR /var/folders/fY/fY+RKWXZHrqzw9LIcH2+3U+++TU/-Tmp-
NB_MOLECULES 500000
EXPRESSION_K -0.6
EXPRESSION_X0 5.0E7
EXPRESSION_X1 9500.0
TSS_MEAN 25.0
POLYA_SHAPE 2.0
POLYA_SCALE 30.0
RT_MIN 500
RT_MAX 5500
RT_PRIMER RANDOM
FRAGMENTATION NO
FRAG_B4_RT NO
FRAG_MODE PHYSICAL
FRAG_LAMBDA 900.0
FRAG_THRESHOLD 0.1
FILTERING NO
LOAD_CODING NO
LOAD_NONCODING YES
FILT_MIN 200
FILT_MAX 250
READ_NUMBER 587916
READ_LENGTH 75
PAIRED_END YES
(1) I always got a large fraction of polyA reads (around 35%-38%), which seems too high (pleas see the bottom for the parameters I used). Then I tried to change the parameter "polyA scale" from the default 300 to 30. And I got few polyA reads. So I am a bit confused: 300 should be a reasonable value but why so many polyA reads are generated?
(2) My second question is about calculation of the ground truth RPKM/FPKM from the simulated reads. Assume M is the total number of reads and n is the number of reads originated from a transcript t, then RPKM=(10^9*n)/(length(t)*M)? The polyA reads should be removed from the calculation. I wonder if this is the way to get the "TRUE" transcript expressions.
Thank you!
**********PAR file************************
REF_FILE_NAME /Users/z/Downloads/FluxSimulator/chr1/another/another.gtf
PRO_FILE_NAME /Users/z/Downloads/FluxSimulator/chr1/another/another.pro
LIB_FILE_NAME /Users/z/Downloads/FluxSimulator/chr1/another/another.lib
SEQ_FILE_NAME /Users/z/Downloads/FluxSimulator/chr1/another/another.bed
GEN_DIR /Users/zhaohao/Downloads/FluxSimulator/chr1
TMP_DIR /var/folders/fY/fY+RKWXZHrqzw9LIcH2+3U+++TU/-Tmp-
NB_MOLECULES 500000
EXPRESSION_K -0.6
EXPRESSION_X0 5.0E7
EXPRESSION_X1 9500.0
TSS_MEAN 25.0
POLYA_SHAPE 2.0
POLYA_SCALE 30.0
RT_MIN 500
RT_MAX 5500
RT_PRIMER RANDOM
FRAGMENTATION NO
FRAG_B4_RT NO
FRAG_MODE PHYSICAL
FRAG_LAMBDA 900.0
FRAG_THRESHOLD 0.1
FILTERING NO
LOAD_CODING NO
LOAD_NONCODING YES
FILT_MIN 200
FILT_MAX 250
READ_NUMBER 587916
READ_LENGTH 75
PAIRED_END YES