Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • tophat running problem

    Hi all,

    I'm trying to run mRNA-seq for human by tophat (v1.0.12).
    I succeeded to get proper output file in the preliminary dataset (first 100K reads from each .fq file). But I failed to get proper output in the real dataset (each contains ~17M reads).

    I would appreciate any help you could give me with this.

    Thanks in advance.

    -Yasu

    ### preliminary_test ###

    $ tophat -r 10 -p 8 -o tophat_hg19_test hg19 s_1_1.head4000000.fq,s_6_1.head4000000.fq,s_7_1.head4000000.fq s_1_2.head4000000.fq,s_6_2.head4000000.fq,s_7_2.head4000000.fq

    [Mon Feb 8 13:31:03 2010] Beginning TopHat run (v1.0.12)
    -----------------------------------------------
    [Mon Feb 8 13:31:03 2010] Preparing output location tophat_hg19_test/
    [Mon Feb 8 13:31:03 2010] Checking for Bowtie index files
    [Mon Feb 8 13:31:03 2010] Checking for reference FASTA file
    [Mon Feb 8 13:31:03 2010] Checking for Bowtie
    Bowtie version: 0.11.3.0
    [Mon Feb 8 13:31:03 2010] Checking reads
    seed length: 43bp
    format: fastq
    quality scale: --phred33-quals
    [Mon Feb 8 13:31:51 2010] Mapping reads against hg19 with Bowtie
    [Mon Feb 8 13:34:24 2010] Joining segment hits
    [Mon Feb 8 13:34:59 2010] Mapping reads against hg19 with Bowtie
    [Mon Feb 8 13:37:30 2010] Joining segment hits
    [Mon Feb 8 13:38:04 2010] Searching for junctions via segment mapping
    [Mon Feb 8 13:44:59 2010] Retrieving sequences for splices
    [Mon Feb 8 13:46:42 2010] Indexing splices
    [Mon Feb 8 13:47:58 2010] Mapping reads against segment_juncs with Bowtie
    [Mon Feb 8 13:48:47 2010] Joining segment hits
    [Mon Feb 8 13:49:26 2010] Mapping reads against segment_juncs with Bowtie
    [Mon Feb 8 13:50:15 2010] Joining segment hits
    [Mon Feb 8 13:50:52 2010] Reporting output tracks
    -----------------------------------------------
    Run complete [00:33:58 elapsed]


    ### real_data ###

    tophat -r 10 -p 8 -o tophat_hg19 hg19 s_1_1.fq,s_6_1.fq,s_7_1.fq s_1_2.fq,s_6_2.fq,s_7_2.fq

    [Mon Feb 8 14:24:47 2010] Beginning TopHat run (v1.0.12)
    -----------------------------------------------
    [Mon Feb 8 14:24:47 2010] Preparing output location tophat_hg19/
    [Mon Feb 8 14:24:47 2010] Checking for Bowtie index files
    [Mon Feb 8 14:24:47 2010] Checking for reference FASTA file
    [Mon Feb 8 14:24:47 2010] Checking for Bowtie
    Bowtie version: 0.11.3.0
    [Mon Feb 8 14:24:47 2010] Checking reads
    seed length: 43bp
    format: fastq
    quality scale: --phred33-quals
    [Mon Feb 8 14:39:23 2010] Mapping reads against hg19 with Bowtie
    [Mon Feb 8 15:24:40 2010] Joining segment hits
    [Mon Feb 8 15:35:38 2010] Mapping reads against hg19 with Bowtie
    [Mon Feb 8 16:18:44 2010] Joining segment hits
    [Mon Feb 8 16:18:44 2010] Searching for junctions via segment mapping
    Warning: junction database is empty!
    [Mon Feb 8 18:01:42 2010] Joining segment hits
    [Mon Feb 8 18:11:38 2010] Joining segment hits
    [Mon Feb 8 18:11:38 2010] Reporting output tracks
    [FAILED]
    Error: Report generation failed with err = 1
    Traceback (most recent call last):
    File "/bin/tophat", line 1518, in ?
    sys.exit(main())
    File "/bin/tophat", line 1490, in main
    params.gff_annotation)
    File "/bin/tophat", line 936, in compile_reports
    exit(1)
    TypeError: 'str' object is not callable

  • #2
    I add the report.log file from real_data (failed one).

    ### Real_data (reports.log) ###

    tophat_reports v1.0.12
    ---------------------------------------
    Error: cannot open map file for reading

    #####################

    Comparing with the run.log files from preliminary_test (succeeded one) and from real_data (failed one), "/bin/segment_juncs" doesn't work well.

    Can somebody give me any help?

    Thanks,

    -Yasu

    Comment


    • #3
      The fact that TopHat thinks the seed length is 43bp is concerning. The default is 25, and it shouldn't be different unless you specified --segment-length, which you didn't. TopHat currently requires that FASTQ files have records where all of the nucleotides for each read appear on a single line. Same goes for the quality strings - all the quality characters need to be on one line. This is a limitation I haven't had time to fix yet. Can you verify that your FASTQ file is formatted this way?

      Comment


      • #4
        Thanks for your kind help!!

        My fastq file is something like this. I omitted the sequence+position id from the line after "+". Does this make the things bad?

        -Yasu

        ###########
        @HWI-EAS368:1:1:9:316#0/1
        CTGGATGATAACATTCCAGAAGATGACTCAGGTGTCCCCACCC
        +
        BB66AB9ACBB@BCBAAA><BBBAAB?BBB@@@BA?B@BB@AB
        @HWI-EAS368:1:1:9:424#0/1
        CTCCCTGCCAGATATCGAGGAGGTGAAAGACCAGAGCAGGAAC
        +
        BCBBB>?>@CBABBB;A877??.:<<B@;@@?=>?A>?6AAA?
        @HWI-EAS368:1:1:9:1060#0/1
        TGGATGGTTCAGGATAATCACCTGAGCAGTGAAGCCAGCTGCT
        +
        BBBBB=?@BBB?=@A9AA@CAA><<@:5>7?=A?A@?=A???@
        @HWI-EAS368:1:1:9:410#0/1
        CGGAGGCGGAGGCTTGGGTGCGTTCAAGATTCAGCTTCACCCG
        +
        AA9AAA=:A7'=7=?4+366=AA@:A>999B:=2,=>1014>7
        @HWI-EAS368:1:1:9:807#0/1
        CGAACATTTCTGGCCCCCAAGTGTCAGCCCATTCACGTAAAAA
        +
        BBBBBBC@@<:;6BC>:@2<B=BBB@7;BB=:C@799:BBB?%
        @HWI-EAS368:1:1:9:405#0/1
        TGTAAAGCCTGAAACAGCTGCCTGTGTGGGACTGAGATGCAGG
        +
        ?=>B=AA@AB@AAB?88>@@@BB>?B>A<=>?A81<-<@@B@@

        Comment


        • #5
          I added the '--segment-length 25', but the comment is still as follows;

          [Tue Feb 9 16:47:40 2010] Checking reads
          seed length: 43bp
          format: fastq
          quality scale: --phred33-quals

          Did I go some wrong way?

          -Yasu

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 09:21 AM
          0 responses
          9 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          40 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 08:48 AM
          0 responses
          30 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-01-2024, 06:45 AM
          0 responses
          48 views
          0 likes
          Last Post seqadmin  
          Working...
          X