Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to run Tophat with annotation file

    Hi, i want to map the single-end short reads to the genome.

    first, i map the reads to the genome with bowtie.

    second, for the unaligned reads, i want to map it across the annotation file. i download the GTF file from UCSC.

    tophat -G ~/GFF3/ce6-sangerGene.gtf /ws190_index/C.elegans.ws190.dna unaligned.txt
    but failed.

    could anyone help to solve this problem?

  • #2
    Well the basic options you've set look OK. You're going to need to show us the errors you got so we can see what failed.

    One thing to note is that tophat is very picky about the exact formatting of the GTF files it will accept. I've tried a few GTF files from Ensembl and about half of them required some manual editing before tophat would accept them. You can test your GTF file by running the gtf_juncs program which comes with tophat. You simply run:

    gtf_juncs [your gtf file]

    You may see some (or possibly lots of) warnings, but if you then see output like:

    X 54671454 54675019 +
    X 54675146 54676441 +
    X 54676552 54686385 +
    X 54686464 54689586 +
    X 54689694 54690560 +

    ..then your GTF file is OK. If you get an error like:

    Error: duplicate GFF ID 'ENSMUST00000127664' (or exons too far apart)!

    Then your GTF file won't process and tophat will abort if you try to use it (but I don't think it shows you the error you'll see from running gtf_juncs).

    Hope this helps

    Comment


    • #3
      This is the output of Tophat.

      [Wed Sep 7 10:52:44 2011] Beginning TopHat run (v1.3.1)
      -----------------------------------------------
      [Wed Sep 7 10:52:44 2011] Preparing output location ./tophat_out/
      [Wed Sep 7 10:52:44 2011] Checking for Bowtie index files
      [Wed Sep 7 10:52:44 2011] Checking for reference FASTA file
      [Wed Sep 7 10:52:44 2011] Checking for Bowtie
      Bowtie version: 0.12.7.0
      [Wed Sep 7 10:52:44 2011] Checking for Samtools
      Samtools Version: 0.1.8
      [Wed Sep 7 10:52:44 2011] Generating SAM header for /home/wgf/bowtie/ws190_index/C.elegans.ws190.dna
      [Wed Sep 7 10:52:45 2011] Preparing reads
      format: fasta
      [Wed Sep 7 10:52:45 2011] Reading known junctions from GTF file
      Left reads: min. length=33, count=2951216
      Warning: you have only one segment per read
      we strongly recommend that you decrease --segment-length to about half the read length because TopHat will work better with multiple segments
      [Wed Sep 7 10:53:46 2011] Mapping left_kept_reads against C.elegans.ws190.dna with Bowtie
      [Wed Sep 7 11:08:01 2011] Processing bowtie hits
      [Wed Sep 7 11:08:35 2011] Retrieving sequences for splices
      [Wed Sep 7 11:08:57 2011] Indexing splices
      Warning: Empty input file
      Error: No unambiguous stretches of characters in the input. Aborting...
      Command: /home/wgf/bin/bowtie-build ./tophat_out/tmp/segment_juncs.fa ./tophat_out/tmp/segment_juncs
      [FAILED]
      Error: Splice sequence indexing failed with err =1

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      18 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      22 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      17 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      48 views
      0 likes
      Last Post seqadmin  
      Working...
      X