Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat fails with CIGAR error but read does not exist

    Hello all,

    I am using the latest versions of Tophat, Samtools and Bowtie2.

    I have a set of 8 samples that I am trying to map with Tophat. Every sample maps without issue except one. This sample repeatedly fails with the following error at the tophat-reports stage:

    [2015-06-15 13:42:44] Mapping right_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/4)



    [2015-06-15 13:42:50] Mapping right_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/4)

    [2015-06-15 13:42:57] Mapping right_kept_reads.m2g_um_seg3 to genome segment_juncs with Bowtie2 (3/4)

    [2015-06-15 13:43:03] Mapping right_kept_reads.m2g_um_seg4 to genome segment_juncs with Bowtie2 (4/4)

    [2015-06-15 13:43:10] Joining segment hits

    [2015-06-15 13:43:49] Reporting output tracks

    [FAILED]

    Error running /lustre4/home/markravinet/local/bin/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 5000 --min-isoform-fraction 0.15 --output-dir ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 5000 --min-segment-intron 50 --max-segment-intron 5000 --read-mismatches 36 --read-gap-length 24 --read-edit-dist 50 --read-realign-edit-dist 51 --max-insertion-length 12 --max-deletion-length 12 -z gzip -p10 --inner-dist-mean 400 --inner-dist-std-dev 40 --gtf-annotations /home/markravinet/data/ninespine/gac_transcript/Gac_genes.gff --gtf-juncs ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/tmp/Gac_genes.juncs --no-closure-search --no-coverage-search --no-microexon-search --sam-header ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/tmp/gac_edit2_genome.bwt.samheader.sam --report-discordant-pair-alignments --report-mixed-alignments --samtools=/lustre4/home/markravinet/local/bin/samtools_0.1.18 --bowtie2-max-penalty 3 --bowtie2-min-penalty 1 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 3 --bowtie2-ref-gap-cont 2 /home/markravinet/data/ninespine/gac_edit2.fa ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/junctions.bed ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/insertions.bed ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/deletions.bed ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/fusions.out ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/tmp/accepted_hits ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/tmp/left_kept_reads.m2g.bam,./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/tmp/left_kept_reads.m2g_um.mapped.bam,./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/tmp/left_kept_reads.m2g_um.candidates ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/tmp/left_kept_reads.bam ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/tmp/right_kept_reads.m2g.bam,./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/tmp/right_kept_reads.m2g_um.mapped.bam,./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/tmp/right_kept_reads.m2g_um.candidates ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/tmp/right_kept_reads.bam

    Error: CIGAR and sequence length are inconsistent!(TCAAGACAGGATCTTCTCTTCCAGTCTCCTCTGGTTCTACAGACAGATATTCCTGTTATTTAAGAGGATCTTATGGTCTCCGTTCTCCTGCAGTGTGTAAG)
    This seems to be a relatively common samtools/tophat error and the simplest solution seems to be to delete the problem read. However I can't do that because... I can't find it! This read is not present in any of the original fastq files, the mapped reads or the accepted hits file. The last of these gives an EOF error, presumably because it was truncated when this error occurred. But I cannot find the source of this error.

    Can anyone shed light on this issue?

  • #2
    Try looking for the reverse-complement of that sequence. Or... you can download BBDuk (part of BBTools) and do this:

    Code:
    bbduk.sh in=reads.fq outm=bad.fq outu=good.fq k=100 mm=f literal=TCAAGACAGGATCTTCTCTTCCAGTCTCCTCTGGTTCTACAGACAGATATTCCTGTTATTTAAGAGGATCTTATGGTCTCCGTTCTCCTGCAGTGTGTAAG
    BBDuk does not actually support 10mers, it only goes up to 31-mers; but that command will require 70 consecutive matching 31-mers. It automatically looks for both the forward and reverse-complement of kmers.

    However - I encourage you to give BBMap a try; it's faster and more sensitive than Tophat.
    Last edited by Brian Bushnell; 06-16-2015, 12:24 PM.

    Comment


    • #3
      Hi Brian, thanks very much for your response. BBDuk worked perfectly and Tophat completed without issue. BBMap seems like a great tool and I will certainly give it a try.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      10 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      9 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      49 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      67 views
      0 likes
      Last Post seqadmin  
      Working...
      X