Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat fails with CIGAR error but read does not exist

    Hello all,

    I am using the latest versions of Tophat, Samtools and Bowtie2.

    I have a set of 8 samples that I am trying to map with Tophat. Every sample maps without issue except one. This sample repeatedly fails with the following error at the tophat-reports stage:

    [2015-06-15 13:42:44] Mapping right_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/4)



    [2015-06-15 13:42:50] Mapping right_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/4)

    [2015-06-15 13:42:57] Mapping right_kept_reads.m2g_um_seg3 to genome segment_juncs with Bowtie2 (3/4)

    [2015-06-15 13:43:03] Mapping right_kept_reads.m2g_um_seg4 to genome segment_juncs with Bowtie2 (4/4)

    [2015-06-15 13:43:10] Joining segment hits

    [2015-06-15 13:43:49] Reporting output tracks

    [FAILED]

    Error running /lustre4/home/markravinet/local/bin/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 5000 --min-isoform-fraction 0.15 --output-dir ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 5000 --min-segment-intron 50 --max-segment-intron 5000 --read-mismatches 36 --read-gap-length 24 --read-edit-dist 50 --read-realign-edit-dist 51 --max-insertion-length 12 --max-deletion-length 12 -z gzip -p10 --inner-dist-mean 400 --inner-dist-std-dev 40 --gtf-annotations /home/markravinet/data/ninespine/gac_transcript/Gac_genes.gff --gtf-juncs ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/tmp/Gac_genes.juncs --no-closure-search --no-coverage-search --no-microexon-search --sam-header ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/tmp/gac_edit2_genome.bwt.samheader.sam --report-discordant-pair-alignments --report-mixed-alignments --samtools=/lustre4/home/markravinet/local/bin/samtools_0.1.18 --bowtie2-max-penalty 3 --bowtie2-min-penalty 1 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 3 --bowtie2-ref-gap-cont 2 /home/markravinet/data/ninespine/gac_edit2.fa ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/junctions.bed ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/insertions.bed ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/deletions.bed ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/fusions.out ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/tmp/accepted_hits ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/tmp/left_kept_reads.m2g.bam,./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/tmp/left_kept_reads.m2g_um.mapped.bam,./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/tmp/left_kept_reads.m2g_um.candidates ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/tmp/left_kept_reads.bam ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/tmp/right_kept_reads.m2g.bam,./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/tmp/right_kept_reads.m2g_um.mapped.bam,./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/tmp/right_kept_reads.m2g_um.candidates ./tophat_out/ppun/P_pun_Biwa2014_m_3_paired/tmp/right_kept_reads.bam

    Error: CIGAR and sequence length are inconsistent!(TCAAGACAGGATCTTCTCTTCCAGTCTCCTCTGGTTCTACAGACAGATATTCCTGTTATTTAAGAGGATCTTATGGTCTCCGTTCTCCTGCAGTGTGTAAG)
    This seems to be a relatively common samtools/tophat error and the simplest solution seems to be to delete the problem read. However I can't do that because... I can't find it! This read is not present in any of the original fastq files, the mapped reads or the accepted hits file. The last of these gives an EOF error, presumably because it was truncated when this error occurred. But I cannot find the source of this error.

    Can anyone shed light on this issue?

  • #2
    Try looking for the reverse-complement of that sequence. Or... you can download BBDuk (part of BBTools) and do this:

    Code:
    bbduk.sh in=reads.fq outm=bad.fq outu=good.fq k=100 mm=f literal=TCAAGACAGGATCTTCTCTTCCAGTCTCCTCTGGTTCTACAGACAGATATTCCTGTTATTTAAGAGGATCTTATGGTCTCCGTTCTCCTGCAGTGTGTAAG
    BBDuk does not actually support 10mers, it only goes up to 31-mers; but that command will require 70 consecutive matching 31-mers. It automatically looks for both the forward and reverse-complement of kmers.

    However - I encourage you to give BBMap a try; it's faster and more sensitive than Tophat.
    Last edited by Brian Bushnell; 06-16-2015, 12:24 PM.

    Comment


    • #3
      Hi Brian, thanks very much for your response. BBDuk worked perfectly and Tophat completed without issue. BBMap seems like a great tool and I will certainly give it a try.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      27 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      30 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      26 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X