Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Tips for RNA precipitation with linear acrylamide Speese Sample Prep / Library Generation 3 08-20-2012 11:57 AM
Genome Annotation tips Ashu Bioinformatics 12 01-12-2012 03:35 AM
Metagenomics w/ 454 tips? ewilbanks Bioinformatics 9 06-02-2011 07:17 AM
Novoalign speed tips for 25bp reads cc472 Bioinformatics 0 05-10-2011 06:22 AM
Tips for mRNA-seq? seqgirl77 Illumina/Solexa 0 10-13-2009 07:28 PM

Thread Tools
Old 12-29-2012, 05:47 AM   #1
Location: Universe

Join Date: Dec 2012
Posts: 81
Default asking for tips about tophat

I have been learning tophat since lastweek. I will really appreciate if you give me tips.

Let me describe one example first.
timepoint1: lane1.fastq, lane2.fastq, lane3.fastq, lane4.fastq (all of data come from plant1.)
timepoint2: lane1.fastq, lane2.fastq, lane3.fastq, lane4.fastq (all of data come from plant2.)
timepoint3: lane1.fastq, lane2.fastq, lane3.fastq, lane4.fastq (all of data come from plant3.)

I am thinking of run the below commands.
"tophat -o [output] -G [gff] [reference] t1_lane1.fastq",
"tophat -o [output] -G [gff] [reference] t1_lane2.fastq",
"tophat -o [output] -G [gff] [reference] t1_lane3.fastq",
"tophat -o [output] -G [gff] [reference] t1_lane4.fastq",
"tophat -o [output] -G [gff] [reference] t2_lane1.fastq",
"tophat -o [output] -G [gff] [reference] t2_lane2.fastq",
"tophat -o [output] -G [gff] [reference] t2_lane3.fastq",
"tophat -o [output] -G [gff] [reference] t2_lane4.fastq",
"tophat -o [output] -G [gff] [reference] t3_lane1.fastq",
"tophat -o [output] -G [gff] [reference] t3_lane2.fastq",
"tophat -o [output] -G [gff] [reference] t3_lane3.fastq",
"tophat -o [output] -G [gff] [reference] t3_lane4.fastq".

As a next step, I am going to run cufflinks in order to assemble
t1_lane1, t1_lane2, t1_lane3, t1_lane4 into timepoint1,
t2_lane1, t2_lane2, t2_lane3, t2_lane4 into timepoint2,
t3_lane1, t3_lane2, t3_lane3, t2_lane4 into timepoint3,

As a final step, I am going to run cuffdiff to see the differential expression across different timepoints.

Do you think I understand correctly the workflow of tophat, cufflinks and cuffdiff?

2. According to the manual of tophat, the command line looks like "tophat -o [output] -G [gff] [reference] read1.fastq,read2.fastq,...,readN.fastq".
I am so confused about when multiple reads are put together into one command line.
- When is "tophat -o [output] -G [gff] [reference] read1.fastq,read2.fastq,...,readN.fastq" used?
- When is "tophat -o [output] -G [gff] [reference] read1.fastq", ..., "tophat -o [output] -G [gff] [reference] readN.fastq" used?
It will be really helpful if you give some specific design of experiment to make clear understanding.
syintel87 is offline   Reply With Quote
Old 12-29-2012, 04:35 PM   #2
Location: Berkeley, CA

Join Date: May 2010
Posts: 50

If t1, t2, and t3 are really just the same sample that got sequenced in multiple lanes, then it's more correct to do:
tophat -o [output] -G [gff] reference t1_lane1.fastq,t1_lane2.fastq,t1_lane3.fastq,t1_lane4.fastq
then run cufflinks on the single accepted_hits.bam that tophat makes.

First off, there's no good way that I'm aware of to run cufflinks on multiple alignments and get a single set of transcript abundances. Secondly, as much as we would love it to be true, the true between-sample variance will never just be the sampling noise. Ideally you would have true replicates, but if not, I don't know whether cuffdiff would be over-confident if you gave it subsamples of the same sample.
rflrob is offline   Reply With Quote
Old 12-30-2012, 05:34 AM   #3
Location: Universe

Join Date: Dec 2012
Posts: 81
Default Thank you!

Dear rflrob,
Thank you very much!
Your explanation has been really helpful to excellerate my understanding.

For the last several days, I was really confused about the concepts of pooling datasets, assembling, making links, merging, comparing, etc. (how to merge the four lanes, when to merge the four lanes, what cufflinks assembles, at which step different timepoints would be differentially analyzed, etc.)
This comfusion may be due to just reading manuals without experience of lab.

Anyhow thank you again!
syintel87 is offline   Reply With Quote


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 06:16 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO