Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat v1.1.4 potential error with sam to bam conversion?

    Using tophat v1.1.4, I have run into some issues with running tophat on the Illumina Human Body Map data. Here is some information from the log files from the run.

    ---
    Last few lines of output from the tophat run:

    [Sat Jan 8 02:13:21 2011] Mapping reads against segment_juncs with Bowtie
    [Sat Jan 8 02:16:29 2011] Joining segment hits
    [Sat Jan 8 02:22:00 2011] Reporting output tracks
    Error: could not convert to BAM with samtools
    ---


    ---
    Last line from run.log:

    samtools view -S -b ./trim_to_75_trim_s_8_sequence_end1_tophat_out//tmp/accepted_hits.sam > ./trim_to_75_trim_s_8_sequence_end1_tophat_out//tmp/file9j1TKZ
    ---


    ---
    Error message From accepted_hits_sam_to_bam.log:

    [samopen] SAM header is present: 25 sequences.
    Parse error at line 1129818: CIGAR and sequence length are inconsistent
    ---

    Any idea what is going on and/or how I should go about solving it?

    Edit:
    Another thing I want to point out is that I saw that there was an accepted_hits.sam file in the tmp folder that remained after the tophat runs failed to complete. I tried to run this in Cufflinks instead and was getting errors that the sam file was not ordered correctly. I am posting this information in case that can help with understanding what might be happening with this issue.
    Last edited by jb2; 01-11-2011, 05:15 PM.

  • #2
    Hi jb2,

    Sometimes, I also faced to the same problem. The latest version of TopHat often report erroneous read alignments. The problem in sam to bam file conversion seems to be caused by the erroneous CIGAR strings.

    In my case, I mapped 76bp single-end reads to a reference sequence and the error in sam->bam conversion was occurred in the following line in tmp/accepted_hits.sam file. The CIGAR string means that this read has 536,870,957 bp in length.

    XXXXX:3:15:3749:16676#0 0 chr1 21668580 3 28M47N536870907M43N22M *
    0 0 GGCGTGTATTTGGGTTGAAGTTAAGCAACTGGTTCATGGACTGTG GGGGGGGGGGGGGGGGGGGGGGGDFFFEEAF=FDEEDE?DDAEBE NM:i:1
    XS:A:- NH:i:2 CC:Z:= CP:i:21668580

    So, in my analysis, I removed such lines with erroneous CIGAR strings in SMA file by checking the discrepancies in length between CIGAR and read. After that, I manually convert sam to bam and sort bam file by using SAMtools.

    Comment


    • #3
      This isn't an issue with samtools version? I'll have to check the exact version but I know our recent version of Tophat did not work with the most recent samtools version as tophat would not recognize the version with a letter such as the current version samtools-0.1.12a.

      Comment


      • #4
        Originally posted by luxmare View Post
        Hi jb2,

        Sometimes, I also faced to the same problem. The latest version of TopHat often report erroneous read alignments. The problem in sam to bam file conversion seems to be caused by the erroneous CIGAR strings.

        In my case, I mapped 76bp single-end reads to a reference sequence and the error in sam->bam conversion was occurred in the following line in tmp/accepted_hits.sam file. The CIGAR string means that this read has 536,870,957 bp in length.




        So, in my analysis, I removed such lines with erroneous CIGAR strings in SMA file by checking the discrepancies in length between CIGAR and read. After that, I manually convert sam to bam and sort bam file by using SAMtools.
        Thanks for your help on this. Is there a quick tool for removing lines with problematic cigar strings? I'm sure I could throw a script together in perl pretty quickly, otherwise, but just curious.

        Comment


        • #5
          To remove erroneous lines from TopHat SAM file, I wrote a perl script by myself. But it's not quick tool. I also want to have a quick tool for that.

          This isn't an issue with samtools version? I'll have to check the exact version but I know our recent version of Tophat did not work with the most recent samtools version as tophat would not recognize the version with a letter such as the current version samtools-0.1.12a.
          Also in our system, the latest version of TopHat (v1.1.4) does not work with the latest version of SAMtools (v0.1.12a). We may have to use old version of SAMtools (v0.1.11).

          Comment


          • #6
            Hi, this is an old thread but I am still finding the same error with the newest versions.

            I am using Tophat 1.3.1, Bowtie 0.12.7, Cufflinks 1.1.0 and samtools 0.1.16 and still need to convert the accepted_hits.bam to sam format, otherwise I get the

            SAM error on line 9454: CIGAR op has zero length


            Is there a compatibility issue?


            Thanks

            Dave

            Comment


            • #7
              CIGAR and Sequence length are inconsistent

              Dear,

              I am trying to convert sam file to bam file. I am pretty new to this conversion. But i got the following error :CIGAR and SEQUENCE length are inconsistent.
              Is there a quick tool for removing lines with problematic cigar strings? because my sam file is almost 30 GB and doing it through perl script could be time consuming. And even if i remove these problemetic lines, Do i have to take care of some header lines (beginning of file) or i can remove them simply?

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              25 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              27 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X