SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > SOLiD



Similar Threads
Thread Thread Starter Forum Replies Last Post
extracting unmapped reads from BAM using Samtools? Lspoor Bioinformatics 17 08-25-2013 12:22 AM
Lots of unmapped reads - SOLiD bacterial RNA-seq and bowtie mapping Jean RNA Sequencing 10 01-17-2013 11:11 AM
What's causing malformed reads Rubal7 Bioinformatics 7 11-29-2012 01:17 AM
unmapped reads - bowtie frymor Bioinformatics 1 01-10-2011 05:38 PM
Finding unmapped reads using samtools Ash Bioinformatics 2 10-28-2010 07:20 AM

Reply
 
Thread Tools
Old 02-11-2010, 07:24 AM   #1
wimufi
Junior Member
 
Location: maryland, usa

Join Date: Sep 2009
Posts: 4
Default unmapped reads in Bowtie causing problems in SAMtools?

Hi, I am having some trouble running some SOLiD data through bowtie. I've converted from csfasta/qual files to fastq using solid2fastq (v0.6.3c) and get this:

@424_1953_1910
T23311000033011331110003320301300033203123032220022
+
<99>=9=::=7<8;:5,,4/<<77@37-52=5.):.7$91450)=4:%&:


Now I know the first base is the adaptor, but I assume the difference in length of the sequence and qual data is to be expected.

Then I run through bowtie (v0.12.2) using just the -S -C options. But when I run 'samtools import' on the resulting SAM file, I get:

Parse error at line 96: sequence and quality are inconsistent

It's only the unmapped reads that cause this problem, the mapped ones are ok. Here's an example:

424_1953_1812 4 * 0 0 * * 0 0 TAGGACAAGAGCATACTCTGCTAGCAAAATCTAGATGCCAGATCTGGAG 948;<<:4:>:<<;>8:;:=5:><1;;<95089:22/8:36;2198;+^@ XM:i:0

The '^@' is present in the unmapped reads but not the mapped ones.

So, (a) is this a bug in bowtie or samtools? and (b) is there a way to suppress the unmapped reads in the bowtie SAM output, which would work around this problem.

Thanks!

Will
wimufi is offline   Reply With Quote
Old 02-11-2010, 08:00 AM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by wimufi View Post
Hi, I am having some trouble running some SOLiD data through bowtie. I've converted from csfasta/qual files to fastq using solid2fastq (v0.6.3c) and get this:

@424_1953_1910
T23311000033011331110003320301300033203123032220022
+
<99>=9=::=7<8;:5,,4/<<77@37-52=5.):.7$91450)=4:%&:


Now I know the first base is the adaptor, but I assume the difference in length of the sequence and qual data is to be expected.

Then I run through bowtie (v0.12.2) using just the -S -C options. But when I run 'samtools import' on the resulting SAM file, I get:

Parse error at line 96: sequence and quality are inconsistent

It's only the unmapped reads that cause this problem, the mapped ones are ok. Here's an example:

424_1953_1812 4 * 0 0 * * 0 0 TAGGACAAGAGCATACTCTGCTAGCAAAATCTAGATGCCAGATCTGGAG 948;<<:4:>:<<;>8:;:=5:><1;;<95089:22/8:36;2198;+^@ XM:i:0

The '^@' is present in the unmapped reads but not the mapped ones.

So, (a) is this a bug in bowtie or samtools? and (b) is there a way to suppress the unmapped reads in the bowtie SAM output, which would work around this problem.

Thanks!

Will
I would use the conversion script available in bowtie (is there one?) rather than BFAST. The BFAST conversion script was designed for BFAST and I have not tested it with BWA/bowtie etc.

It looks like bowtie keeps the adaptor sequence in the base space representation. This is incorrect since it is not part of the DNA fragment being sequenced. You should send a bug report to the bowtie authors.

Last edited by nilshomer; 02-11-2010 at 09:25 PM. Reason: spelling
nilshomer is offline   Reply With Quote
Old 02-11-2010, 08:21 AM   #3
wimufi
Junior Member
 
Location: maryland, usa

Join Date: Sep 2009
Posts: 4
Default

FYI, the aligned read in question looks like this in the fastq file:

@424_1953_1812
T03022010020210301313213021000031302032110203132202
+
=;948;<<:4:>:<<;>8:;:=5:><1;;<95089:22/8:36;2198;+

and looks like this in the csfasta/qual file:

>424_1953_1812_F3
T03022010020210301313213021000031302032110203132202

>424_1953_1812_F3
28 26 24 19 23 26 27 27 25 19 25 29 25 27 27 26 29 23 25 26 25 28 20 25 29 27 16 26 26 27 24 20 15 23 24 25 17 17 14 23 25 18 21 26 17 16 24 23 26 10

So I agree, the adaptor sequence looks like it is retained in the SAM file. There's no fastq conversion tool with Bowtie as far as I know. I know there were some bugs about trimming in the 0.12.0 and 0.12.1 versions so maybe some remain. Will report it, thanks.

Will
wimufi is offline   Reply With Quote
Old 02-11-2010, 08:35 AM   #4
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 200
Default

Hi Will,

Nils is right that the problem is coming from mixing BFAST's tools with Bowtie's. The " 2" at the end of the color sequence seems to be BFAST-specific, and Bowtie doesn't know what to do with it. Please use e.g. Galaxy to convert your reads, as is recommended in the manual.

Re: "looks like bowtie keeps the adaptor sequence in the base space representation" - Bowtie trims the primer base automatically along with the first color. See manual for details. What are you seeing that makes you think otherwise?

Thanks,
Ben
Ben Langmead is offline   Reply With Quote
Old 02-11-2010, 09:00 AM   #5
wimufi
Junior Member
 
Location: maryland, usa

Join Date: Sep 2009
Posts: 4
Default

For some odd reason the SeqAnswers formatting is screwing up the stuff I've been posting. There's no space in this fastq sequence between the last 0 and 2 (despite what it looks like below...).

@424_1953_1812
T03022010020210301313213021000031302032110203132202
+
=;948;<<:4:>:<<;>8:;:=5:><1;;<95089:22/8:36;2198;+


But I will try Galaxy, too, thanks.

Will

Last edited by wimufi; 02-11-2010 at 09:08 AM.
wimufi is offline   Reply With Quote
Old 02-11-2010, 09:55 AM   #6
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

" [ code ] [ /code ] " tags are your friend when formatting is to be preserved.

Code:
@424_1953_1812
T03022010020210301313213021000031302032110203132202
+
=;948;<<:4:>:<<;>8:;:=5:><1;;<95089:22/8:36;2198;+
ECO is offline   Reply With Quote
Old 02-11-2010, 01:04 PM   #7
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

This is a bug in bowtie - it seems it trims the adaptor T and first color and likewise the first two quals which gives read and qual different length (adaptor T has no quality since it is not sequenced).

You can get around this by removing unaligned reads:
awk '$2 != 4 {print $0}' reads.sam > aligned_reads_only.sam

It is nice that you can get unaligned reads in a new fastq (to align with BFAST...) but it would be good to have an option to report only aligned reads as well to save space.
Chipper is offline   Reply With Quote
Old 02-11-2010, 09:26 PM   #8
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by Chipper View Post
It is nice that you can get unaligned reads in a new fastq (to align with BFAST...) but it would be good to have an option to report only aligned reads as well to save space.
You might as well just align with BFAST from the start ->
nilshomer is offline   Reply With Quote
Old 03-21-2011, 03:16 PM   #9
Guidobot
Junior Member
 
Location: San Diego

Join Date: Jan 2011
Posts: 8
Default

I am having the original issue of converting a SAM to BAM file, as produced by BWA:
Code:
SAM header is present: 1 sequences.
Parse error at line 9829: sequence and quality are inconsistent
Aborted
(Note, the result of color2fasta uses ACGT to encode colors.)

The problem appears to be a bug in BWA occurring after an untypical CIGAR string is output, e.g. "2S3M2D10M2I26M". For such line lines the quality string was partially or completely missing.

To proceed, I simply removed the offending read lines, i.e. delete line 9829 (using head/tail/cat).
However, I wouldn't recommend this solution as you have to repeat the cycle for each re-attempt of 'samtools view ...'.
Guidobot is offline   Reply With Quote
Old 09-29-2011, 06:30 PM   #10
alig
Member
 
Location: adelaide

Join Date: Sep 2008
Posts: 43
Default Parse error at line x, sequence and quality are inconsistent

Hi,

Error as follows:

[samopen] SAM header is present: 84 sequences.
Parse error at line 86: sequence and quality are inconsistent

There have been a few people coming across this error when trying to convert a SAM file to a BAM file, but from searching there doesn't seem to be a good solution yet.

I originally ran bwa aln on my SOLiD paired end reads with -q 0 & did not get this error.
But from the alignment we realised that we need to do read trimming, so then ran the bwa aln command using -q 20
Unfortunately this means I get the sequence and quality are inconsistent error & cannot progress.
My sam file is very large, 12.5 Gb gzipped so it's not feasible to just remove the offending line, & I have a feeling that there will be many more lines with this error.

Can someone help please

Thanks alig
alig is offline   Reply With Quote
Reply

Tags
solid bowtie sam samtools

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:34 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO