Dear All
I am newbie to the RNA-seq data analysis field. Currently, I'm in
charge of analyzing some human NGS samples (single end) in a disease-control comparative setting. I have 10 BAM files (biological replicates) from tophat, each having the size~4GB.
I followed the tophat-cufflinks-cuffcompare-cuffdiff pipeline (using
hg19 reference) to find the differentially expressed genes between experimental and control conditions.
However, I'm stuck at the final cuffdiff step as the program
constantly fail due to insufficient memory problem. I always got a 'bad-alloc' feedback when I tried to run cuffdiff to compare among my 10 samples using downloaded hg 19 reference (from ensembl).
I'm running on Linux unbuntu 64 system, Xeon(R) x5450 3.00 GHz 8 cores, 8GB ram.
I wonder if there is an alternative way I can bypass this insufficient memory problem when running cuffdiff. I was thinking to cuffmerge all my samples of the same group then compare final two single merged gtf files from the two conditions (experiment vs control) but I suspect that the merging of several transcripts. gtf files will mask the biological variant information provided by these biological replicates.
Can anyone give me a suggestion on this problem. How to resolve the memory problem in cuffdiff or use another way to find differentially expressed transcripts?
Since I already got result files from cufflinks for each sample. Can I just use the FPKM value from the genes.fpkm_tracking file for each sample as the gene expression value and use traditional statistical methods to identify
differentially expressed genes between two groups? (e.g. multiple
t-test, SAM analysis etc.)
Thanks a lot
I am newbie to the RNA-seq data analysis field. Currently, I'm in
charge of analyzing some human NGS samples (single end) in a disease-control comparative setting. I have 10 BAM files (biological replicates) from tophat, each having the size~4GB.
I followed the tophat-cufflinks-cuffcompare-cuffdiff pipeline (using
hg19 reference) to find the differentially expressed genes between experimental and control conditions.
However, I'm stuck at the final cuffdiff step as the program
constantly fail due to insufficient memory problem. I always got a 'bad-alloc' feedback when I tried to run cuffdiff to compare among my 10 samples using downloaded hg 19 reference (from ensembl).
I'm running on Linux unbuntu 64 system, Xeon(R) x5450 3.00 GHz 8 cores, 8GB ram.
I wonder if there is an alternative way I can bypass this insufficient memory problem when running cuffdiff. I was thinking to cuffmerge all my samples of the same group then compare final two single merged gtf files from the two conditions (experiment vs control) but I suspect that the merging of several transcripts. gtf files will mask the biological variant information provided by these biological replicates.
Can anyone give me a suggestion on this problem. How to resolve the memory problem in cuffdiff or use another way to find differentially expressed transcripts?
Since I already got result files from cufflinks for each sample. Can I just use the FPKM value from the genes.fpkm_tracking file for each sample as the gene expression value and use traditional statistical methods to identify
differentially expressed genes between two groups? (e.g. multiple
t-test, SAM analysis etc.)
Thanks a lot
Comment