Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • asking for tips about tophat

    I have been learning tophat since lastweek. I will really appreciate if you give me tips.



    1.
    Let me describe one example first.
    timepoint1: lane1.fastq, lane2.fastq, lane3.fastq, lane4.fastq (all of data come from plant1.)
    timepoint2: lane1.fastq, lane2.fastq, lane3.fastq, lane4.fastq (all of data come from plant2.)
    timepoint3: lane1.fastq, lane2.fastq, lane3.fastq, lane4.fastq (all of data come from plant3.)

    I am thinking of run the below commands.
    "tophat -o [output] -G [gff] [reference] t1_lane1.fastq",
    "tophat -o [output] -G [gff] [reference] t1_lane2.fastq",
    "tophat -o [output] -G [gff] [reference] t1_lane3.fastq",
    "tophat -o [output] -G [gff] [reference] t1_lane4.fastq",
    "tophat -o [output] -G [gff] [reference] t2_lane1.fastq",
    "tophat -o [output] -G [gff] [reference] t2_lane2.fastq",
    "tophat -o [output] -G [gff] [reference] t2_lane3.fastq",
    "tophat -o [output] -G [gff] [reference] t2_lane4.fastq",
    "tophat -o [output] -G [gff] [reference] t3_lane1.fastq",
    "tophat -o [output] -G [gff] [reference] t3_lane2.fastq",
    "tophat -o [output] -G [gff] [reference] t3_lane3.fastq",
    "tophat -o [output] -G [gff] [reference] t3_lane4.fastq".

    As a next step, I am going to run cufflinks in order to assemble
    t1_lane1, t1_lane2, t1_lane3, t1_lane4 into timepoint1,
    t2_lane1, t2_lane2, t2_lane3, t2_lane4 into timepoint2,
    t3_lane1, t3_lane2, t3_lane3, t2_lane4 into timepoint3,

    As a final step, I am going to run cuffdiff to see the differential expression across different timepoints.

    Do you think I understand correctly the workflow of tophat, cufflinks and cuffdiff?



    2. According to the manual of tophat, the command line looks like "tophat -o [output] -G [gff] [reference] read1.fastq,read2.fastq,...,readN.fastq".
    I am so confused about when multiple reads are put together into one command line.
    - When is "tophat -o [output] -G [gff] [reference] read1.fastq,read2.fastq,...,readN.fastq" used?
    - When is "tophat -o [output] -G [gff] [reference] read1.fastq", ..., "tophat -o [output] -G [gff] [reference] readN.fastq" used?
    It will be really helpful if you give some specific design of experiment to make clear understanding.

  • #2
    If t1, t2, and t3 are really just the same sample that got sequenced in multiple lanes, then it's more correct to do:
    tophat -o [output] -G [gff] reference t1_lane1.fastq,t1_lane2.fastq,t1_lane3.fastq,t1_lane4.fastq
    then run cufflinks on the single accepted_hits.bam that tophat makes.

    First off, there's no good way that I'm aware of to run cufflinks on multiple alignments and get a single set of transcript abundances. Secondly, as much as we would love it to be true, the true between-sample variance will never just be the sampling noise. Ideally you would have true replicates, but if not, I don't know whether cuffdiff would be over-confident if you gave it subsamples of the same sample.

    Comment


    • #3
      Thank you!

      Dear rflrob,
      Thank you very much!
      Your explanation has been really helpful to excellerate my understanding.

      For the last several days, I was really confused about the concepts of pooling datasets, assembling, making links, merging, comparing, etc. (how to merge the four lanes, when to merge the four lanes, what cufflinks assembles, at which step different timepoints would be differentially analyzed, etc.)
      This comfusion may be due to just reading manuals without experience of lab.

      Anyhow thank you again!

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Recent Advances in Sequencing Analysis Tools
        by seqadmin


        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
        05-06-2024, 07:48 AM
      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 06:35 AM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 02:46 PM
      0 responses
      18 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-07-2024, 06:57 AM
      0 responses
      17 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-06-2024, 07:17 AM
      0 responses
      19 views
      0 likes
      Last Post seqadmin  
      Working...
      X