SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
sam to bam conversion error, no @SQ lines in the header, missing header? efoss Bioinformatics 17 12-03-2015 04:28 AM
BWA sam and Samtools sam->bam conversion problem maasha Bioinformatics 6 06-05-2013 07:39 AM
samtools: parse error in SAM to BAM conversion chrisW Bioinformatics 12 01-14-2013 06:16 PM
Conversion from Complete Genomics Data to SAM/BAM jflores Bioinformatics 3 04-28-2011 05:31 PM
samtool sam to bam error! baohua100 Bioinformatics 0 08-20-2009 01:10 PM

Reply
 
Thread Tools
Old 01-11-2011, 04:03 PM   #1
jb2
Member
 
Location: Boston, MA

Join Date: Jun 2010
Posts: 25
Default Tophat v1.1.4 potential error with sam to bam conversion?

Using tophat v1.1.4, I have run into some issues with running tophat on the Illumina Human Body Map data. Here is some information from the log files from the run.

---
Last few lines of output from the tophat run:

[Sat Jan 8 02:13:21 2011] Mapping reads against segment_juncs with Bowtie
[Sat Jan 8 02:16:29 2011] Joining segment hits
[Sat Jan 8 02:22:00 2011] Reporting output tracks
Error: could not convert to BAM with samtools
---


---
Last line from run.log:

samtools view -S -b ./trim_to_75_trim_s_8_sequence_end1_tophat_out//tmp/accepted_hits.sam > ./trim_to_75_trim_s_8_sequence_end1_tophat_out//tmp/file9j1TKZ
---


---
Error message From accepted_hits_sam_to_bam.log:

[samopen] SAM header is present: 25 sequences.
Parse error at line 1129818: CIGAR and sequence length are inconsistent
---

Any idea what is going on and/or how I should go about solving it?

Edit:
Another thing I want to point out is that I saw that there was an accepted_hits.sam file in the tmp folder that remained after the tophat runs failed to complete. I tried to run this in Cufflinks instead and was getting errors that the sam file was not ordered correctly. I am posting this information in case that can help with understanding what might be happening with this issue.

Last edited by jb2; 01-11-2011 at 04:15 PM.
jb2 is offline   Reply With Quote
Old 01-11-2011, 08:15 PM   #2
luxmare
Member
 
Location: Japan

Join Date: Feb 2009
Posts: 10
Default

Hi jb2,

Sometimes, I also faced to the same problem. The latest version of TopHat often report erroneous read alignments. The problem in sam to bam file conversion seems to be caused by the erroneous CIGAR strings.

In my case, I mapped 76bp single-end reads to a reference sequence and the error in sam->bam conversion was occurred in the following line in tmp/accepted_hits.sam file. The CIGAR string means that this read has 536,870,957 bp in length.

Quote:
XXXXX:3:15:3749:16676#0 0 chr1 21668580 3 28M47N536870907M43N22M *
0 0 GGCGTGTATTTGGGTTGAAGTTAAGCAACTGGTTCATGGACTGTG GGGGGGGGGGGGGGGGGGGGGGGDFFFEEAF=FDEEDE?DDAEBE NM:i:1
XS:A:- NH:i:2 CC:Z:= CP:i:21668580

So, in my analysis, I removed such lines with erroneous CIGAR strings in SMA file by checking the discrepancies in length between CIGAR and read. After that, I manually convert sam to bam and sort bam file by using SAMtools.
luxmare is offline   Reply With Quote
Old 01-12-2011, 07:11 AM   #3
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 279
Default

This isn't an issue with samtools version? I'll have to check the exact version but I know our recent version of Tophat did not work with the most recent samtools version as tophat would not recognize the version with a letter such as the current version samtools-0.1.12a.
Jon_Keats is offline   Reply With Quote
Old 01-12-2011, 01:55 PM   #4
jb2
Member
 
Location: Boston, MA

Join Date: Jun 2010
Posts: 25
Default

Quote:
Originally Posted by luxmare View Post
Hi jb2,

Sometimes, I also faced to the same problem. The latest version of TopHat often report erroneous read alignments. The problem in sam to bam file conversion seems to be caused by the erroneous CIGAR strings.

In my case, I mapped 76bp single-end reads to a reference sequence and the error in sam->bam conversion was occurred in the following line in tmp/accepted_hits.sam file. The CIGAR string means that this read has 536,870,957 bp in length.




So, in my analysis, I removed such lines with erroneous CIGAR strings in SMA file by checking the discrepancies in length between CIGAR and read. After that, I manually convert sam to bam and sort bam file by using SAMtools.
Thanks for your help on this. Is there a quick tool for removing lines with problematic cigar strings? I'm sure I could throw a script together in perl pretty quickly, otherwise, but just curious.
jb2 is offline   Reply With Quote
Old 01-12-2011, 02:37 PM   #5
luxmare
Member
 
Location: Japan

Join Date: Feb 2009
Posts: 10
Default

To remove erroneous lines from TopHat SAM file, I wrote a perl script by myself. But it's not quick tool. I also want to have a quick tool for that.

Quote:
This isn't an issue with samtools version? I'll have to check the exact version but I know our recent version of Tophat did not work with the most recent samtools version as tophat would not recognize the version with a letter such as the current version samtools-0.1.12a.
Also in our system, the latest version of TopHat (v1.1.4) does not work with the latest version of SAMtools (v0.1.12a). We may have to use old version of SAMtools (v0.1.11).
luxmare is offline   Reply With Quote
Old 09-15-2011, 12:04 AM   #6
dnusol
Senior Member
 
Location: Spain

Join Date: Jul 2009
Posts: 133
Default

Hi, this is an old thread but I am still finding the same error with the newest versions.

I am using Tophat 1.3.1, Bowtie 0.12.7, Cufflinks 1.1.0 and samtools 0.1.16 and still need to convert the accepted_hits.bam to sam format, otherwise I get the

SAM error on line 9454: CIGAR op has zero length


Is there a compatibility issue?


Thanks

Dave
dnusol is offline   Reply With Quote
Old 11-17-2011, 12:52 AM   #7
pd
Member
 
Location: Leuven, Belgium

Join Date: Jan 2010
Posts: 17
Default CIGAR and Sequence length are inconsistent

Dear,

I am trying to convert sam file to bam file. I am pretty new to this conversion. But i got the following error :CIGAR and SEQUENCE length are inconsistent.
Is there a quick tool for removing lines with problematic cigar strings? because my sam file is almost 30 GB and doing it through perl script could be time consuming. And even if i remove these problemetic lines, Do i have to take care of some header lines (beginning of file) or i can remove them simply?
pd is offline   Reply With Quote
Reply

Tags
bam, sam, samtools, tophat, tophat cigar bug

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:34 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO