I am looking to use publicly available GRO-seq data for analysis, located here http://trace.ncbi.nlm.nih.gov/Traces...tudy=SRP013239.
There are 8 experiments, corresponding to 4 time points each with 2 bio replicates. However, each one of the experiments has 1 to 4 SRA run files associated with it. My current understanding is that all of those SRA can be combined to represent the entire transcriptome for the sample. Is this correct? If not, what is the right way to interpret it?
The second question is about using Tophat on these. If I assume my current framework, I've thought up three ways of processing the files:
What is the correct way of using tophat? Options 1 and 2 give ostensibly different results.
Thanks for the help!
There are 8 experiments, corresponding to 4 time points each with 2 bio replicates. However, each one of the experiments has 1 to 4 SRA run files associated with it. My current understanding is that all of those SRA can be combined to represent the entire transcriptome for the sample. Is this correct? If not, what is the right way to interpret it?
The second question is about using Tophat on these. If I assume my current framework, I've thought up three ways of processing the files:
- Run tophat separately for each timepoint with comma separated lists of fastq files (1 bam file generated per timepoint)
- Run tophat separately for each timepoint with wildcard *.fastq (1 bam file generated per timepoint)
- Run tophat separately for each fastq file and cuffmerge common timepoint files
What is the correct way of using tophat? Options 1 and 2 give ostensibly different results.
Thanks for the help!
Comment