Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • tophat 2.0.0 CIGAR and sequence length are inconsistent

    We get the following error while running Tophat 2.0.0 on a sequence. STAR aligner works fine on this sequence file. Here is the part of the log:

    [2013-04-12 10:40:14] Preparing reads
    left reads: min. length=101, max. length=101, 183210621 kept reads (230841 discarded)
    ....
    Line 171124311, sequence length 74 vs 101 from CIGAR
    Parse error at line 171124311: CIGAR and sequence length are inconsistent
    [FAILED]
    Error running bowtie:

    The offending reads looks fine to me!

    @D8CXHXP1:240:C0DBWACXX:5:1208:19908:68173 1:N:0:
    CGAGGCTGGCATTTGTGTTCTCTTAGGGTCTGCATAACATCTGTCCAGGATCTTCTGGCTTTCATATTCTCTGGTGAGAAGTCTGGTGTAATTCTAATAGG
    +
    CCCFFFFFHGHHHJJGIIGIJJJJJJIJFGGIIJJJJIJJJJJEHJJJJEIIJJJJJJJIJJJJJJJJJJJJHHH?EEFFFFFFEECECDEEFEEEEEDDD


    Any idea why this might be happening?
    we are running:
    TopHat v2.0.0
    Bowtie2 version 2.0.0-beta5

  • #2
    You appear to be running older versions of both programs. Any possibility you can upgrade both (Tophat now in v.2.0.8 and Bowtie2 v.2.1.0) and see if the error goes away.

    Comment


    • #3
      i already have upgraded version but another sample from the same batch worked fine with older versions I used. I will give it a try too
      -best
      -Lax

      Comment


      • #4
        I am having the exact same issue. Has anyone else seen or resolved this? In my case, the offending mapped read looks as follows:

        59 69 * 0 255 * * 0 0 CGCAAGGGCATCTCTGGGAAAGGACCTGGGGCTGGTGAGGGGCCCGGAGGAGTCTTTGCCCGCGCGTGAGACTCCA @?@BD@DDB?B;CCCF3C3AAF>AEFFB?C0?D@<)99?FFFI@;;/'=B;5(6>;>B################## ZN:Z:SRR821602.59

        and the output is:

        and the output is:

        [2013-07-18 13:34:28] Beginning TopHat run (v2.0.8b)
        -----------------------------------------------
        [2013-07-18 13:34:28] Checking for Bowtie
        Bowtie version: 2.1.0.0
        [2013-07-18 13:34:28] Checking for Samtools
        Samtools version: 0.1.19.0
        [2013-07-18 13:34:28] Checking for Bowtie index files
        [2013-07-18 13:34:28] Checking for Bowtie index files
        [2013-07-18 13:34:28] Checking for reference FASTA file
        [2013-07-18 13:34:28] Generating SAM header for /home/unix/dylkot/genomes/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome
        format: fastq
        quality scale: phred33 (default)
        [2013-07-18 13:35:31] Reading known junctions from GTF file
        [2013-07-18 13:35:37] Preparing reads
        left reads: min. length=76, max. length=76, 2500 kept reads (0 discarded)
        right reads: min. length=76, max. length=76, 2498 kept reads (2 discarded)
        [2013-07-18 13:35:37] Using pre-built transcriptome index..
        [2013-07-18 13:35:41] Mapping left_kept_reads to transcriptome genome with Bowtie2
        [2013-07-18 13:35:49] Mapping right_kept_reads to transcriptome genome with Bowtie2
        [2013-07-18 13:35:56] Reporting output tracks
        [FAILED]
        Error running /home/unix/dylkot/bin/tophat-2.0.8b.Linux_x86_64/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir /broad/hptmp/dkotliar/GTEX/mapped/debug/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 5 --read-gap-length 2 --read-edit-dist 7 --read-realign-edit-dist 8 --max-insertion-length 3 --max-deletion-length 3 -z gzip --inner-dist-mean 67 --inner-dist-std-dev 231 --gtf-annotations /home/unix/dylkot/genomes/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome.gff --gtf-juncs /home/unix/dylkot/mapped/debug/tmp/genome.juncs --no-closure-search --no-coverage-search --no-microexon-search --sam-header /home/unix/dylkot/mapped/debug/tmp/genome_genome.bwt.samheader.sam --report-discordant-pair-alignments --report-mixed-alignments --samtools=/home/unix/dylkot/bin/samtools-0.1.19/samtools --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 /home/unix/dylkot/genomes/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome.fa /home/unix/dylkot/mapped/debug/junctions.bed /home/unix/dylkot/mapped/debug/insertions.bed /home/unix/dylkot/mapped/debug/deletions.bed /home/unix/dylkot/mapped/debug/fusions.out /home/unix/dylkot/mapped/debug/tmp/accepted_hits /home/unix/dylkot/mapped/debug/tmp/left_kept_reads.m2g.bam /home/unix/dylkot/mapped/debug/tmp/left_kept_reads.bam /home/unix/dylkot/mapped/debug/tmp/right_kept_reads.m2g.bam //home/unix/dylkot/mapped/debug/tmp/right_kept_reads.bam
        Error: CIGAR and sequence length are inconsistent!(CGCAAGGGCATCTCTGGGAAAGGACCTGGGGCTGGTGAGGGGCCCGGAGGAGTCTTTGCCCGCGCGTGAGACTCCA)

        Any idea what could be going on or how I can best get around this problem?

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        31 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        33 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Working...
        X