Hi
I am using RNA-seq data to use "Differential gene expression". For that I have taken data from SRA (NCBI) raw files of human sample in some experiment (Control- 3 files; each of size ~1GB and test -3 files; size ~1GB).
(1) I downloaded corresponding *.sra files.
(2) After converting these (all 6 files) into fastq and filtering these reads, I mapped these reads using BOWTIE onto reference genome hg19.fa from UCSC browser.
Command I used is:
bowtie2 -S *.sam -x reference.build -U *.fastq_filtered
(3) Next used samtools to convert *.sam to *.bam files through this command :
samtools view -bS *.sam >*.bam
(4) Next command is :
samtools sort *.bam *.sorted
(5) 4th step gave me *.sorted.bam files for all 6 input files.
(6) Then I used cufflinks to proceed further to get RPKM values. command is
cufflinks *.sorted.bam
(7) Query is :
(a) Do I have to run the above command in 6th step for all 6 *.sorted.bam files (3 for control and 3 for test sample ) ?
(b) Also, How to use cuffdiff to finally get quantification of each gene with relative RPKM values ?
(c) As far as I understand, Gene names are not available in these files; so how to tag gene names with corresponding RPKM values (i.e. this RPKM values is for this gene or that gene) ?
(d) Also, How to say that a given gene is overexpressed or underexpressed in test sample as compared to control sample ?
(e) Also, How to average the replicates (as I have 3 control replicates and 3 test replicates) ? The cuffdiff command is not contemplated by me which says like this :
cuffdiff [options]* <transcripts.gtf> <sample1_replicate1.sam[,...,sample1_replicateM]> <sample2_replicate1.sam[,...,sample2_replicateM.sam]>... [sampleN.sam_replicate1.sam[,...,sample2_replicateM.sam]]
(f) Is it necessary to run cuffcompare command in this case ?
Thanks in advance
kumardeep
I am using RNA-seq data to use "Differential gene expression". For that I have taken data from SRA (NCBI) raw files of human sample in some experiment (Control- 3 files; each of size ~1GB and test -3 files; size ~1GB).
(1) I downloaded corresponding *.sra files.
(2) After converting these (all 6 files) into fastq and filtering these reads, I mapped these reads using BOWTIE onto reference genome hg19.fa from UCSC browser.
Command I used is:
bowtie2 -S *.sam -x reference.build -U *.fastq_filtered
(3) Next used samtools to convert *.sam to *.bam files through this command :
samtools view -bS *.sam >*.bam
(4) Next command is :
samtools sort *.bam *.sorted
(5) 4th step gave me *.sorted.bam files for all 6 input files.
(6) Then I used cufflinks to proceed further to get RPKM values. command is
cufflinks *.sorted.bam
(7) Query is :
(a) Do I have to run the above command in 6th step for all 6 *.sorted.bam files (3 for control and 3 for test sample ) ?
(b) Also, How to use cuffdiff to finally get quantification of each gene with relative RPKM values ?
(c) As far as I understand, Gene names are not available in these files; so how to tag gene names with corresponding RPKM values (i.e. this RPKM values is for this gene or that gene) ?
(d) Also, How to say that a given gene is overexpressed or underexpressed in test sample as compared to control sample ?
(e) Also, How to average the replicates (as I have 3 control replicates and 3 test replicates) ? The cuffdiff command is not contemplated by me which says like this :
cuffdiff [options]* <transcripts.gtf> <sample1_replicate1.sam[,...,sample1_replicateM]> <sample2_replicate1.sam[,...,sample2_replicateM.sam]>... [sampleN.sam_replicate1.sam[,...,sample2_replicateM.sam]]
(f) Is it necessary to run cuffcompare command in this case ?
Thanks in advance
kumardeep
Comment