SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BWA causing shut down? jasonbcold Bioinformatics 10 03-06-2012 01:43 PM
GATK: VCF has a malformed header ameynert Bioinformatics 0 02-16-2012 06:51 AM
unmapped reads in Bowtie causing problems in SAMtools? wimufi SOLiD 9 09-29-2011 06:30 PM
PubMed: Origins of the E. coli Strain Causing an Outbreak of Hemolytic-Uremic Syndrom Newsbot! Literature Watch 0 07-29-2011 02:00 AM

Reply
 
Thread Tools
Old 01-30-2012, 02:56 AM   #1
Rubal7
Member
 
Location: Europe

Join Date: Jan 2012
Posts: 15
Default What's causing malformed reads

Hello everyone,

My first post here so please excuse any etiquette mistakes. I'm working through a GATK pipeline for sequence data from multiple individuals. I have got to the local indel realignment phase and midway through the realignment process (target locator already run) I get an error message which kills the process:

ERROR MESSAGE: SAM/BAM file SAMFileReader{..file path} is malformed: BAM file has a read with mismatching number of bases and base qualities. Offender: T_SOLEXA-GA02:6:9:1538:8018 [1 bases] [0 quals]

I have found a way to get around this using -filterMBQ which skips malformed reads. But I am curious about the underlying cause of the problem. Is it most likely that something I have done incorrectly during the pipeline involving file formatting has created a mismatch between bases and base qualities, or is it the case that these mismatches can occur at low frequency as a normal part of the sequencing process? As the Malformed read filter exists it makes me think that these can just occur 'naturally' but I have no idea why.

Any thoughts or those with experience of this problem I'd really appreciate hearing from you. I'm apprehensive about moving on with the pipeline without understanding the root of the problem.

Best,

Rubal7
Rubal7 is offline   Reply With Quote
Old 01-30-2012, 03:47 AM   #2
ulz_peter
Senior Member
 
Location: Graz, Austria

Join Date: Feb 2010
Posts: 219
Default

looks pretty strange: he found a read having only one base and no associated quality. Do you do any kind of adaptor sequence removal or quality trimming? Anyways I've never seen that error...
ulz_peter is offline   Reply With Quote
Old 01-30-2012, 04:25 AM   #3
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Along the same line of inquiry as ulz_peter, have a look in the SAM/BAM file you used as input to see if the original read is malformed or if this is being introduced along the way. It's odd for a read to be only 1 base long.
dpryan is offline   Reply With Quote
Old 01-30-2012, 04:42 AM   #4
Rubal7
Member
 
Location: Europe

Join Date: Jan 2012
Posts: 15
Default

Thanks guys, checking both these things now
Rubal7 is offline   Reply With Quote
Old 01-30-2012, 07:01 AM   #5
Rubal7
Member
 
Location: Europe

Join Date: Jan 2012
Posts: 15
Default

The offending read:
T_SOLEXA-GA01_r:6:9:1538:8018 528 chr7 111016499 0 1M * 0 0 C * XT:A:R NM:i:0 XN:i:1 X0:f:1.36217e+08 XM:i:0 XO:i:0 XG:i:0 MD:A:1 RG:Z:NR_49w XI:Z:AACTCCG YI:Z:.--/-2/ ZQ:A:L
Rubal7 is offline   Reply With Quote
Old 01-30-2012, 07:57 AM   #6
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I'm not surprised that the "doesn't pass QC" flag is set on that read. A * by itself in the QUAL field like that normally would mean "no quality stored", which would indeed be a malformed line. However, a single * is ambiguous in this case, since it's also a possible QUAL+33 score (for a crappy base call).

Frankly, you'd be well off removing such short reads, since their mapping is going to be totally unreliable and they won't contribute anything meaningful to your results. Presumably whatever program you're using to do the adaptor trimming is capable of not returning reads below a certain size.
dpryan is offline   Reply With Quote
Old 01-30-2012, 09:43 AM   #7
Rubal7
Member
 
Location: Europe

Join Date: Jan 2012
Posts: 15
Default

Thanks, I'll probably remove short reads like you suggest as they are likely to do more harm than good!
Rubal7 is offline   Reply With Quote
Old 11-29-2012, 01:17 AM   #8
jfb
Junior Member
 
Location: SF bay area

Join Date: Nov 2011
Posts: 7
Default

I too am seeing this error using GATK (v1.4-5-g253a07f) during indel realignment. I've never encountered it until today: 24 out of 28 files processed fine, but 4 of them fail prematurely due to a 'malformed' bam error on entries that are supposedly missing the quality score but have between 30 and 68 bases.
jfb is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:42 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO