SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Insert size != Fragment size? Boel Bioinformatics 6 12-12-2013 08:28 AM
Problem with BWA mapping of Illumina PE short insert size fragments (FFPE material) LadyGray Bioinformatics 2 10-22-2012 01:20 AM
bwa insert size estimation athena.uci Bioinformatics 2 11-07-2011 08:49 AM
About Insert, Insert size and MIRA mates.file aarthi.talla 454 Pyrosequencing 1 08-01-2011 01:37 PM
bwa sampe max insert size zlu Bioinformatics 0 10-27-2009 07:35 AM

Reply
 
Thread Tools
Old 12-21-2011, 09:23 PM   #1
oiiio
Senior Member
 
Location: USA

Join Date: Jan 2011
Posts: 104
Default BWA sampe shows extremely large insert size

I have an illumina sequenced genome that is being mapped paired-end. The other 4 parts that I merge with it to create the fully mapped genome all look normal. When at the sampe step of BWA for this last part of the genome, I noticed it was taking an EXTREMELY long time. Here is the output concerning isize:


[infer_isize] (25, 50, 75) percentile: (30225, 52418, 70153)
[infer_isize] low and high boundaries: 45 and 150009 for estimating avg and std
[infer_isize] inferred external isize from 129 pairs: 49974.163 +/- 26814.967
[infer_isize] skewness: -0.056; kurtosis: -0.892; ap_prior: 1.00e-05
[infer_isize] inferred maximum insert size: 199065 (5.56 sigma)

The inferred size is definitely not what it should be (~250). Has anyone had this kind of problem before? Is the sequencing data for this sample usable?

Last edited by oiiio; 12-21-2011 at 09:30 PM. Reason: typo
oiiio is offline   Reply With Quote
Old 12-21-2011, 09:45 PM   #2
marcowanger
Senior Member
 
Location: Hong Kong

Join Date: Dec 2008
Posts: 350
Default

Which BWA version do you use?
__________________
Marco
marcowanger is offline   Reply With Quote
Old 12-22-2011, 12:44 PM   #3
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

bwa assumes that the first read of fastq 1 is the mate of the first read of fastq 2 and so on. If that's not true, that could explain what you see.

So you could start by running bwa samse on each, and spot-checking the first few lines of each sam, to see if the reads are paired up properly.

I think I saw that error once when I misnamed one of the sai files. So I'd double check that the command line is calling for the right two sai files.
swbarnes2 is offline   Reply With Quote
Old 12-22-2011, 03:05 PM   #4
oiiio
Senior Member
 
Location: USA

Join Date: Jan 2011
Posts: 104
Default

I use the latest version of BWA and I'll give samse a shot now
oiiio is offline   Reply With Quote
Old 12-23-2011, 11:28 AM   #5
dp05yk
Member
 
Location: Brock University

Join Date: Dec 2010
Posts: 66
Default

Sometimes insert sizes receive extremely high estimates for certain datasets - you can use the -A parameter to disable isize estimating altogether and enforce your own, IIRC.

And I believe case-sensitivity matters, ie. -a is specified in the manual, but -A works.
dp05yk is offline   Reply With Quote
Old 12-23-2011, 03:34 PM   #6
oiiio
Senior Member
 
Location: USA

Join Date: Jan 2011
Posts: 104
Default

Interestingly, the single end BAMs seemed alright. I'll try -A as well. Still quite suspicious about the integrity of the mapping though...
oiiio is offline   Reply With Quote
Old 12-24-2011, 01:29 PM   #7
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Try another test with just the first 200 or so fastq entries from each read, making single end and paired end .bams. And early error in a fastq file could throw everything off from that point on.

If the distance you work out manually from the single end bams look right, and doesn't agree with the crazy ones from the paired end .bam, the most likely problem is just a brain fart on your part, where you have mistyped one of the file names. So double-check those.
swbarnes2 is offline   Reply With Quote
Old 12-26-2011, 01:22 PM   #8
oiiio
Senior Member
 
Location: USA

Join Date: Jan 2011
Posts: 104
Default

@ swbarnes2

What I found using this method was there is actually a gradual increase in the detected insert size rather than an abrupt jump. I didn't notice any obvious FASTQ errors in this range, and the BAM is running now ( although taking a horrifically long time due to the failed insert size detection)
oiiio is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:26 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO