Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • tophat fails with --no-discordant

    I'm trying to run tophat2.0.10 on some PE RNA-seq samples. However, tophat_reports fails at the reporting output tracks step with the error below. If I run tophat without the --no-discordant option then it runs fine. Any ideas? Thanks

    [2014-01-07 21:09:50] Joining segment hits
    [2014-01-07 21:14:07] Reporting output tracks
    [FAILED]
    Error running /N/hd01/jake/Mason/programs/tophat-2.0.10/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir ./tophat_out/ --max-multihits 1 --max-seg-multihits 10 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p8 --inner-dist-mean 50 --inner-dist-std-dev 20 --no-closure-search --no-coverage-search --no-microexon-search --library-type fr-firststrand --sam-header ./tophat_out/tmp/genome_genome.bwt.samheader.sam --report-mixed-alignments --samtools=/N/u/jake/Mason/programs/samtools-0.1.19/samtools --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 /N/dc2/projects/RNA/jake/UCSC_mm10_bowtie/Sequence/Bowtie2Index/genome.fa ./tophat_out/junctions.bed ./tophat_out/insertions.bed ./tophat_out/deletions.bed ./tophat_out/fusions.out ./tophat_out/tmp/accepted_hits ./tophat_out/tmp/left_kept_reads.mapped.bam,./tophat_out/tmp/left_kept_reads.candidates ./tophat_out/tmp/left_kept_reads.bam ./tophat_out/tmp/right_kept_reads.mapped.bam,./tophat_out/tmp/right_kept_reads.candidates ./tophat_out/tmp/right_kept_reads.bam
    Loaded 147930 junctions

  • #2
    I don't think tophat_report has the '--no-discordant' option.

    Comment


    • #3
      Sorry I run tophat2 with the --no-discordant option, but it seems to fail when it has to call tophat_reports.

      Comment


      • #4
        I've had problems with tophat dying at various points near the end of the run, but a few times I was able to recover with the -R option. Made my day.

        One thing I noticed was missing from the error message (which I think is the tophat.log file) was the --no-discordant option itself, in the full command string. Maybe it's a bug? Seems like it should be in there. Does it show up in the run.log file?

        Comment


        • #5
          I tried resume and it didn't work, but I'm retrying with more memory. The --no-discordant option shows up for tophat in the run log, but not for tophat_reports.

          Comment


          • #6
            Looks like someone found a solution, if you want to edit the code to patch it yourself:

            http://seqanswers.com/forums/showthr...t=24205&page=2

            This is the thread that Alex Williams referenced in the google groups thread:

            https://groups.google.com/forum/#!ms...8/UtLS0XcWKnUJ

            Comment


            • #7
              Thanks, took a look at that, but I'm not sure where to make those edits.

              Comment


              • #8
                (I had some spare time waiting on some jobs to finish on the cluster…)

                Try this. I had to rename it as tophat.txt to upload it. Having no idea what your experience level is with this kind of thing (I'm no expert…yet), I'll go verbose:

                Unzip and copy this to your cluster (or have an admin do it).
                You can compare it to the original to see what changes I've made
                Code:
                diff /path/to/original/tophat tophat    #modified version is the 2nd one
                1427c1427
                <         bowtie_sam_header_filename = tmp_dir + idx_prefix.split('/')[-1]
                ---
                >         bowtie_sam_header_filename = tmp_dir + str(idx_prefix).split('/')[-1]
                2114c2114
                <     bwt_idx_name = bwt_idx_prefix.split('/')[-1]
                ---
                >     bwt_idx_name = str(bwt_idx_prefix).split('/')[-1]
                Then replace the original with this one and make it executable
                Code:
                chmod 777 tophat    #or whatever permissions are appropriate
                You should be able to resume the run, according to the other thread.

                caveat emptor: I haven't tried it myself…
                Attached Files

                Comment


                • #9
                  So I tried that and it did not fix the problem. Right now my sample is split into multiple fastq.gz files so I tried running each file separately. All of them ran fine except for one pair, which failed at the same step, so I think this pair was holding up the entire run. However, I tried mapping this same pair with STAR and it mapped fine so I don't think the file is completely corrupted. Any ideas? Thanks

                  Comment


                  • #10
                    Hmm...running out of ideas. How much data does this one file represent, if you were to omit it completely? Or, you could downsample this one file and maybe get lucky enough to remove the reads that are causing the problematic alignments. I hate losing data, but I also hate debugging and waiting on software to finish. Depends on your needs.

                    You could use split to break it down, run the pieces through, and omit the chunk(s) that are causing problems:
                    Code:
                    zcat wonky.fq.gz | split -l 4000000 -    #4M lines = 1M reads = 500k pairs if interleaved
                    or something like that. You could do this iteratively on the bad segment to minimize data loss.

                    Comment


                    • #11
                      So someone else suggested I try an older version of tophat. I ran my files with 2.0.8b and 2.0.9 and everything ran fine so I guess I will just use that.

                      Comment


                      • #12
                        How to deal with discordant alignments?

                        I've had exactly the same problem as everyone else on this thread. Tophat 2.0.9 output fails when running large, paired-end datasets with --no discordant set to yes (ie, to not report discordant alignments). However, it runs fine when Tophat is set to report discordant alignments. After much troubleshooting and thread browsing, it seems like this can only be a bug in Tophat. Short of waiting for a bug fix, does anyone know of an easy way to filter out discordant alignments from the acceptedhits.bam file post hoc?

                        Comment


                        • #13
                          You can use BBMap instead, which is faster and more accurate than Tophat anyway.

                          (index)
                          bbmap.sh ref=reference.fasta -Xmx29g

                          (map)
                          bbmap.sh in1=reads1.fq in2=reads2.fq outm=mapped.sam outu1=unmapped1.fq outu2=unmapped2.fq -Xmx29g po=t maxindel=100000 xstag=unstranded intronlen=10 ambig=random

                          The "-Xmx29g" indicates how much memory java is allowed to use; set that at ~85% of your physical memory.

                          The "outm" and "outu" flags indicate streams for mapped and unmapped reads, respectively. "po=t" means "paired only = true" which tells it to consider unpaired reads as unmapped, so they will go to the outu rather than outm streams. This should be analgous to Tophat's "--no-discordant" flag. The outu streams are optional and you can leave them off. "maxindel" tells it the longest expected intron length to look for; "intronlen" tells it the minimum intron length (deletions at least that length get the cigar symbol 'N' rather than 'D'). If your data is stranded, you can set the xstag to "firststrand" or "secondstrand" instead of unstranded.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          30 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          32 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          28 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          53 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X