SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tophat2 read-gap-length and read-mismatches max acceptable values ElizabethRoss RNA Sequencing 0 10-13-2014 03:38 PM
CIGAR and Sequence length incosistent bnfoguy Bioinformatics 8 02-12-2014 04:33 AM
Tophat: options -N --read-edit-dist --read-gap-length Pradhaun Bioinformatics 0 01-04-2013 07:58 AM
picard error: Mismatch between read length and quals length writing read shawpa Bioinformatics 0 08-20-2012 05:52 AM
bowtie2 read length in score-min okyoon Bioinformatics 0 04-09-2012 08:53 AM

Reply
 
Thread Tools
Old 05-24-2015, 10:55 PM   #1
wisense
Member
 
Location: Shanghai,China

Join Date: Sep 2011
Posts: 30
Default CIGAR inconsist with read length. Does Bowtie2 reedit read sequence?

Hi, guys

I used bowtie2 to align paired-end reads to hg19 genome, and got the sam format output. Then when I tried to convert the sam into bam, I found some error information:
Code:
[samopen] SAM header is present: 25 sequences.
Line 817574, sequence length 124 vs 125 from CIGAR
Parse error at line 817574: CIGAR and sequence length are inconsistent
So, I looked into the sam file, and found the line 817574 is like:
Code:
HISEQ04:185:C62CTANXX:3:1101:8017:19749	99	chr2	184686606	42	125M	=	184686727	246	TAGAAAAACTAAACAATGAACCGATAAAAAAACTACAACAACTTTTTAAGACGTAGATAATATAATACAATGTAAATAGAATCAACAAAAATTTTTTAAAATGAATGGTAAGAAAGGATTATTA	BBBBBFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFBFBFFFFFFFFFFBFFFFFFFFFFFFBFFFFFFFFBFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFBFFFF<F7F<FFFF<BF<F	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:NM:i:0	MD:Z:125	YS:i:0	YT:Z:CP
The CIGAR is 125M, which means the read is 125bp and each base is match/mismatch, while the read sequence is only 124bp. As I know, my paired-end reads are 125bp in sequencing. I think bowtie2 has reedited the sequence of the read, so I compared the sequence in the original fastq file and the bowtie2 alignment:
Code:
Seq in Fastq:
TAGAAAAACTAAACAATGAACCGATAAAAAAACTACAACAACTTTTTAAGACGTAGATAATATAATACAATGTAAATAGAATCAACAAAAATTTTTTAAAATGGCACGATGAAGTTAAGGCATAG
Seq in Bowtie2 alignment:
TAGAAAAACTAAACAATGAACCGATAAAAAAACTACAACAACTTTTTAAGACGTAGATAATATAATACAATGTAAATAGAATCAACAAAAATTTTTTAAAATGAATGGTAAGAAAGGATTATTA
It's quite strange the tail bases of the sequence in the alignment are quite different from the fastq file. Somebody know how do this happen? Does Bowtie2 reedit the sequence when it align the read to the reference?

If you have any idea, please reply me, thanks~
wisense

Last edited by wisense; 05-24-2015 at 11:03 PM.
wisense is offline   Reply With Quote
Old 05-25-2015, 07:52 AM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

It sounds like the sam file got corrupted somehow; I'd suggest re-mappping. And, perhaps, upgrading to the latest version first in case that was a version-specific bug.
Brian Bushnell is offline   Reply With Quote
Old 05-25-2015, 05:28 PM   #3
wisense
Member
 
Location: Shanghai,China

Join Date: Sep 2011
Posts: 30
Default

Quote:
Originally Posted by Brian Bushnell View Post
It sounds like the sam file got corrupted somehow; I'd suggest re-mappping. And, perhaps, upgrading to the latest version first in case that was a version-specific bug.
Hi, Brian
Thanks for your reply.
My Bowtie 2 version is 2.2.4, I will try the latest version 2.2.5 to see if there's any difference. By the way, there're about 33437 alignments in the sam file their sequence lengths are not 125bp, maybe ranging from 80-120 bp. Quite strange to me.
wisense is offline   Reply With Quote
Old 05-26-2015, 09:45 AM   #4
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

It sounds to me like two processes ran at the same time, one with 125bp reads and one with variable-length reads, and were writing to the same file... just a guess. I suggest you confirm the read count and length distribution of what you think was the input fastq - should be the same as the output sam. The BBMap package has a tool called readlength.sh that you can use to do that quickly.

Last edited by Brian Bushnell; 05-26-2015 at 09:48 AM.
Brian Bushnell is offline   Reply With Quote
Old 05-26-2015, 05:21 PM   #5
wisense
Member
 
Location: Shanghai,China

Join Date: Sep 2011
Posts: 30
Default

Quote:
Originally Posted by Brian Bushnell View Post
It sounds to me like two processes ran at the same time, one with 125bp reads and one with variable-length reads, and were writing to the same file... just a guess. I suggest you confirm the read count and length distribution of what you think was the input fastq - should be the same as the output sam. The BBMap package has a tool called readlength.sh that you can use to do that quickly.
Before bowtie2 alignment, I used FastQC to analyze the paired-end reads and both read1 and read2 fastq are 125bp, not with variable-length. BTW, I have used the latest version of bowtie2 (v2.2.5) to re-align the reads to the genome, there's no such error in the alignment. Perhaps, this is a version-specific bug of bowtie2. Thanks for your good suggestion.
wisense is offline   Reply With Quote
Old 05-26-2015, 11:35 PM   #6
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,478
Default

This is either a very very obscure bug or, more likely, Brian guessed correctly that you have two processes overwriting each other's output (or something along those lines). You might just rerun the alignment.
dpryan is offline   Reply With Quote
Reply

Tags
bam, bowtie2, cigar, sam

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:25 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO