Unconfigured Ad

**robs** · 06-22-2010, 04:56 PM

I forgot to mention that I am using bwa-0.5.8.

**robs** · 06-22-2010, 09:31 PM

One more question:
Does it make sense to have an insert after a soft clipping region?

For example, I get the CIGAR string 156S15I264M13I189M.

Any help is appreciated. :-)

**lh3** · 06-23-2010, 05:19 PM

internally your reference sequences are concatenated. bwasw may map sequences to the junction of adjacent sequences. bwasw identifies and fixes cigar afterwards. unfortunately, it does not always work properly. this will be fixed in future. due to this flaw, bwasw would not work well for many very short references. i am sorry.

perhaps you may post your references and one read, so can I fix when have time. thanks.

**robs** · 06-23-2010, 05:57 PM

Thanks for the explanation. :-)

The sequences were aligned to the unique sequences in the Watson genome (http://www.ncbi.nlm.nih.gov/sites/en...m=ABKV01000000). There are two example sequences in my first post. If you need more, I can either send you a SAM file with lots of weird looking CIGAR strings or just some FASTA file. Would be happy to get this problem solved, so I can use BWA-SW. :-)

Any change that this will be fixed in the next few weeks, or should I take a look at the source code myself?
Is there a function to get the actual alignment instead of the CIGAR string? Maybe, this would help to figure out the problem faster.

**lh3** · 06-23-2010, 06:12 PM

I am afraid that I cannot fix that soon. If you would like to have a look, you may check the fix_cigar() function in bwtsw2_aux.c. This function trims the part of cigar standing out of the reference sequence. It is moderately standalone.

Thank you.

**glacierbird** · 07-13-2010, 12:14 AM

I have the same problem with CIGAR by using TOPHAT

tophat -p 4 --solexa1.3-quals -o directory index fastq

there are several lines cause error msg when I want to convert .sam to .bam

samtools view -bS -t index.fa.fai -o tmp.bam accepted_hits.sam

Parse error at line 2103: CIGAR and sequence length are inconsistent

And this line 2103 looks like this:

HWI-EAS210R:5:55:1018:1930#0 0 scaffold00002 428427 1 36M92N36M473N536870911M * 0 0 CTCATCTCTCATCTCAGAGAGGTCCTCCAGGAGCAGGAATATAAGTTGGATCAAGTACGCAATCATCTTCA 992;93?5B=<B=;7;?;:988:146.4429;50?;;17@=A@6=7::<8?(=4+45535659'5525:74 NM:i:0 XS:A:+ NS:i:0

**glacierbird** · 07-13-2010, 12:15 AM

I have tried to delete
536870911M

But it still doesn't work.

I have no clue where is these 536970911 from.

**robs** · 07-13-2010, 03:24 PM

The problem is not solved by removing the last term of the cigar string, since then you will miss part of the alignment. You could calculate the length of the query and replace the last M value with the difference of cigar and query length.

I am playing around with the BWA code to fix the problem for BWA. Not sure about Tophat, though.

I assume the problem comes from the concatenated references as well or simply by miscalculating the cigar string when going over the boundaries.

Please keep us updated if you find a solution.

**glacierbird** · 07-14-2010, 04:28 AM

Hi,

Originally posted by robs View Post

The problem is not solved by removing the last term of the cigar string, since then you will miss part of the alignment. You could calculate the length of the query and replace the last M value with the difference of cigar and query length.

what do you mean of query length? I suppose query length = read length. If so, every read length=72

From this CIGAR "36M92N36M473N536870911M"
36M+36M=72, that's why I want to completely delete 536870911M

I am playing around with the BWA code to fix the problem for BWA. Not sure about Tophat, though.

I think the error is not produced by tophat, it actually happened when tophat runs bowtie.

I assume the problem comes from the concatenated references as well or simply by miscalculating the cigar string when going over the boundaries.

Please keep us updated if you find a solution.

I did modify the reference fasta file. becasue of samtools.
samtools faidx fasta_file.fa
[fai_build_core] line length exceeds 65535 in sequence

I used Bio:seqIO to reform the reference fasta file with defaults line wrapping of 60bp.

I am not sure that's the reason for the error msg.

**robs** · 07-14-2010, 10:33 AM

6

M+36M=72, that's why I want to completely delete 536870911M

Then you also need to remove the part with N, I think. Just give it a try.

I did modify the reference fasta file. becasue of samtools.
samtools faidx fasta_file.fa
[fai_build_core] line length exceeds 65535 in sequence

I used Bio:seqIO to reform the reference fasta file with defaults line wrapping of 60bp.

I am not sure that's the reason for the error msg.

The line wrapping should not make a difference, since the sequences should be converted into internal file formats when generating a database or index. Maybe this is a restriction from the C code for parsing lines, but not sure.
The only reason why the reformating might have changed anything is if your file contained dos/win line breaks instead of unix ones and the program does not check for it.

**robs** · 08-05-2010, 04:49 PM

Originally posted by lh3 View Post

I am afraid that I cannot fix that soon. If you would like to have a look, you may check the fix_cigar() function in bwtsw2_aux.c. This function trims the part of cigar standing out of the reference sequence. It is moderately standalone.

I made some changes/fixes to several files related to the BWA-SW part and was wondering if you could give me permission to commit my changes to your sourceforge SVN. Please send me a privat message if you need more details about the changes. Thanks!

**Guidobot** · 03-21-2011, 03:26 PM

Originally posted by glacierbird View Post

I have the same problem with CIGAR by using TOPHAT

tophat -p 4 --solexa1.3-quals -o directory index fastq

there are several lines cause error msg when I want to convert .sam to .bam

samtools view -bS -t index.fa.fai -o tmp.bam accepted_hits.sam

Parse error at line 2103: CIGAR and sequence length are inconsistent

And this line 2103 looks like this:

HWI-EAS210R:5:55:1018:1930#0 0 scaffold00002 428427 1 36M92N36M473N536870911M * 0 0 CTCATCTCTCATCTCAGAGAGGTCCTCCAGGAGCAGGAATATAAGTTGGATCAAGTACGCAATCATCTTCA 992;93?5B=<B=;7;?;:988:146.4429;50?;;17@=A@6=7::<8?(=4+45535659'5525:74 NM:i:0 XS:A:+ NS:i:0

I'm having a very similar issue with my SAM file produced from a BWA mapping.
Whenever there there is a more complex CIGAR string put out the quality score string is corrupted or missing, resulting in a "sequence and quality are inconsistent" error when attempting to convert to BAM.

I've been removing the offending mapping lines but this means repeatedly editing a large file (500Mb), which takes considerable time (e.g. using head/tail).

**lukas1848** · 01-13-2012, 05:07 AM

Does anyone know whether this issue has been resolved in newer versions of BWA?

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, Today, 11:08 AM	0 responses 6 views 0 reactions	Last Post by SEQadmin2 Today, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 53 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

CIGAR string from BWA-SW output incorrect ?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News