kumardeep 05-10-2012 01:59 AM

Query regarding the cuffdiff command to get differential gene expression
I am using RNA-seq data to use "Differential gene expression". For that I have taken data from SRA (NCBI) raw files of human sample in some experiment (Control- 3 files; each of size ~1GB and test -3 files; size ~1GB).
(1) I downloaded corresponding *.sra files.
(2) After converting these (all 6 files) into fastq and filtering these reads, I mapped these reads using BOWTIE onto reference genome hg19.fa from UCSC browser.
Command I used is:
bowtie2 -S *.sam -x -U *.fastq_filtered
(3) Next used samtools to convert *.sam to *.bam files through this command :
samtools view -bS *.sam >*.bam
(4) Next command is :
samtools sort *.bam *.sorted
(5) 4th step gave me *.sorted.bam files for all 6 input files.
(6) Then I used cufflinks to proceed further to get RPKM values. command is
cufflinks *.sorted.bam
(7) Query is :
(a) Do I have to run the above command in 6th step for all 6 *.sorted.bam files (3 for control and 3 for test sample ) ?
(b) Also, How to use cuffdiff to finally get quantification of each gene with relative RPKM values ?
(c) As far as I understand, Gene names are not available in these files; so how to tag gene names with corresponding RPKM values (i.e. this RPKM values is for this gene or that gene) ?
(d) Also, How to say that a given gene is overexpressed or underexpressed in test sample as compared to control sample ?
(e) Also, How to average the replicates (as I have 3 control replicates and 3 test replicates) ? The cuffdiff command is not contemplated by me which says like this :

cuffdiff [options]* <transcripts.gtf> <sample1_replicate1.sam[,...,sample1_replicateM]> <sample2_replicate1.sam[,...,sample2_replicateM.sam]>... [sampleN.sam_replicate1.sam[,...,sample2_replicateM.sam]]

(f) Is it necessary to run cuffcompare command in this case ?

Thanks in advance

chknbio 05-24-2012 10:07 AM

I am also looking for an answer to (d) listed in your post. Did you find an answer to this question? Or does anyone have an answer to how to you determine, output the genes that are upregulated or downregulated compared to control or other samples in the experiment?


