Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • tophat 2.0.0 CIGAR and sequence length are inconsistent

    We get the following error while running Tophat 2.0.0 on a sequence. STAR aligner works fine on this sequence file. Here is the part of the log:

    [2013-04-12 10:40:14] Preparing reads
    left reads: min. length=101, max. length=101, 183210621 kept reads (230841 discarded)
    ....
    Line 171124311, sequence length 74 vs 101 from CIGAR
    Parse error at line 171124311: CIGAR and sequence length are inconsistent
    [FAILED]
    Error running bowtie:

    The offending reads looks fine to me!

    @D8CXHXP1:240:C0DBWACXX:5:1208:19908:68173 1:N:0:
    CGAGGCTGGCATTTGTGTTCTCTTAGGGTCTGCATAACATCTGTCCAGGATCTTCTGGCTTTCATATTCTCTGGTGAGAAGTCTGGTGTAATTCTAATAGG
    +
    CCCFFFFFHGHHHJJGIIGIJJJJJJIJFGGIIJJJJIJJJJJEHJJJJEIIJJJJJJJIJJJJJJJJJJJJHHH?EEFFFFFFEECECDEEFEEEEEDDD


    Any idea why this might be happening?
    we are running:
    TopHat v2.0.0
    Bowtie2 version 2.0.0-beta5

  • #2
    You appear to be running older versions of both programs. Any possibility you can upgrade both (Tophat now in v.2.0.8 and Bowtie2 v.2.1.0) and see if the error goes away.

    Comment


    • #3
      i already have upgraded version but another sample from the same batch worked fine with older versions I used. I will give it a try too
      -best
      -Lax

      Comment


      • #4
        I am having the exact same issue. Has anyone else seen or resolved this? In my case, the offending mapped read looks as follows:

        59 69 * 0 255 * * 0 0 CGCAAGGGCATCTCTGGGAAAGGACCTGGGGCTGGTGAGGGGCCCGGAGGAGTCTTTGCCCGCGCGTGAGACTCCA @?@BD@DDB?B;CCCF3C3AAF>AEFFB?C0?D@<)99?FFFI@;;/'=B;5(6>;>B################## ZN:Z:SRR821602.59

        and the output is:

        and the output is:

        [2013-07-18 13:34:28] Beginning TopHat run (v2.0.8b)
        -----------------------------------------------
        [2013-07-18 13:34:28] Checking for Bowtie
        Bowtie version: 2.1.0.0
        [2013-07-18 13:34:28] Checking for Samtools
        Samtools version: 0.1.19.0
        [2013-07-18 13:34:28] Checking for Bowtie index files
        [2013-07-18 13:34:28] Checking for Bowtie index files
        [2013-07-18 13:34:28] Checking for reference FASTA file
        [2013-07-18 13:34:28] Generating SAM header for /home/unix/dylkot/genomes/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome
        format: fastq
        quality scale: phred33 (default)
        [2013-07-18 13:35:31] Reading known junctions from GTF file
        [2013-07-18 13:35:37] Preparing reads
        left reads: min. length=76, max. length=76, 2500 kept reads (0 discarded)
        right reads: min. length=76, max. length=76, 2498 kept reads (2 discarded)
        [2013-07-18 13:35:37] Using pre-built transcriptome index..
        [2013-07-18 13:35:41] Mapping left_kept_reads to transcriptome genome with Bowtie2
        [2013-07-18 13:35:49] Mapping right_kept_reads to transcriptome genome with Bowtie2
        [2013-07-18 13:35:56] Reporting output tracks
        [FAILED]
        Error running /home/unix/dylkot/bin/tophat-2.0.8b.Linux_x86_64/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir /broad/hptmp/dkotliar/GTEX/mapped/debug/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 5 --read-gap-length 2 --read-edit-dist 7 --read-realign-edit-dist 8 --max-insertion-length 3 --max-deletion-length 3 -z gzip --inner-dist-mean 67 --inner-dist-std-dev 231 --gtf-annotations /home/unix/dylkot/genomes/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome.gff --gtf-juncs /home/unix/dylkot/mapped/debug/tmp/genome.juncs --no-closure-search --no-coverage-search --no-microexon-search --sam-header /home/unix/dylkot/mapped/debug/tmp/genome_genome.bwt.samheader.sam --report-discordant-pair-alignments --report-mixed-alignments --samtools=/home/unix/dylkot/bin/samtools-0.1.19/samtools --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 /home/unix/dylkot/genomes/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome.fa /home/unix/dylkot/mapped/debug/junctions.bed /home/unix/dylkot/mapped/debug/insertions.bed /home/unix/dylkot/mapped/debug/deletions.bed /home/unix/dylkot/mapped/debug/fusions.out /home/unix/dylkot/mapped/debug/tmp/accepted_hits /home/unix/dylkot/mapped/debug/tmp/left_kept_reads.m2g.bam /home/unix/dylkot/mapped/debug/tmp/left_kept_reads.bam /home/unix/dylkot/mapped/debug/tmp/right_kept_reads.m2g.bam //home/unix/dylkot/mapped/debug/tmp/right_kept_reads.bam
        Error: CIGAR and sequence length are inconsistent!(CGCAAGGGCATCTCTGGGAAAGGACCTGGGGCTGGTGAGGGGCCCGGAGGAGTCTTTGCCCGCGCGTGAGACTCCA)

        Any idea what could be going on or how I can best get around this problem?

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 11:49 AM
        0 responses
        15 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-24-2024, 08:47 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        62 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Working...
        X