Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • asking for tips about tophat

    I have been learning tophat since lastweek. I will really appreciate if you give me tips.



    1.
    Let me describe one example first.
    timepoint1: lane1.fastq, lane2.fastq, lane3.fastq, lane4.fastq (all of data come from plant1.)
    timepoint2: lane1.fastq, lane2.fastq, lane3.fastq, lane4.fastq (all of data come from plant2.)
    timepoint3: lane1.fastq, lane2.fastq, lane3.fastq, lane4.fastq (all of data come from plant3.)

    I am thinking of run the below commands.
    "tophat -o [output] -G [gff] [reference] t1_lane1.fastq",
    "tophat -o [output] -G [gff] [reference] t1_lane2.fastq",
    "tophat -o [output] -G [gff] [reference] t1_lane3.fastq",
    "tophat -o [output] -G [gff] [reference] t1_lane4.fastq",
    "tophat -o [output] -G [gff] [reference] t2_lane1.fastq",
    "tophat -o [output] -G [gff] [reference] t2_lane2.fastq",
    "tophat -o [output] -G [gff] [reference] t2_lane3.fastq",
    "tophat -o [output] -G [gff] [reference] t2_lane4.fastq",
    "tophat -o [output] -G [gff] [reference] t3_lane1.fastq",
    "tophat -o [output] -G [gff] [reference] t3_lane2.fastq",
    "tophat -o [output] -G [gff] [reference] t3_lane3.fastq",
    "tophat -o [output] -G [gff] [reference] t3_lane4.fastq".

    As a next step, I am going to run cufflinks in order to assemble
    t1_lane1, t1_lane2, t1_lane3, t1_lane4 into timepoint1,
    t2_lane1, t2_lane2, t2_lane3, t2_lane4 into timepoint2,
    t3_lane1, t3_lane2, t3_lane3, t2_lane4 into timepoint3,

    As a final step, I am going to run cuffdiff to see the differential expression across different timepoints.

    Do you think I understand correctly the workflow of tophat, cufflinks and cuffdiff?



    2. According to the manual of tophat, the command line looks like "tophat -o [output] -G [gff] [reference] read1.fastq,read2.fastq,...,readN.fastq".
    I am so confused about when multiple reads are put together into one command line.
    - When is "tophat -o [output] -G [gff] [reference] read1.fastq,read2.fastq,...,readN.fastq" used?
    - When is "tophat -o [output] -G [gff] [reference] read1.fastq", ..., "tophat -o [output] -G [gff] [reference] readN.fastq" used?
    It will be really helpful if you give some specific design of experiment to make clear understanding.

  • #2
    If t1, t2, and t3 are really just the same sample that got sequenced in multiple lanes, then it's more correct to do:
    tophat -o [output] -G [gff] reference t1_lane1.fastq,t1_lane2.fastq,t1_lane3.fastq,t1_lane4.fastq
    then run cufflinks on the single accepted_hits.bam that tophat makes.

    First off, there's no good way that I'm aware of to run cufflinks on multiple alignments and get a single set of transcript abundances. Secondly, as much as we would love it to be true, the true between-sample variance will never just be the sampling noise. Ideally you would have true replicates, but if not, I don't know whether cuffdiff would be over-confident if you gave it subsamples of the same sample.

    Comment


    • #3
      Thank you!

      Dear rflrob,
      Thank you very much!
      Your explanation has been really helpful to excellerate my understanding.

      For the last several days, I was really confused about the concepts of pooling datasets, assembling, making links, merging, comparing, etc. (how to merge the four lanes, when to merge the four lanes, what cufflinks assembles, at which step different timepoints would be differentially analyzed, etc.)
      This comfusion may be due to just reading manuals without experience of lab.

      Anyhow thank you again!

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      8 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      8 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      49 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      67 views
      0 likes
      Last Post seqadmin  
      Working...
      X