SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
About Insert, Insert size and MIRA mates.file aarthi.talla 454 Pyrosequencing 1 08-01-2011 01:37 PM
insert size polystone Sample Prep / Library Generation 0 05-04-2010 10:07 AM
insert size for illumina 72-SE jgibbons1 Sample Prep / Library Generation 0 04-01-2010 12:20 PM
insert size adrian Bioinformatics 1 03-18-2010 04:55 PM
Insert size important? 454andSolid De novo discovery 2 01-22-2010 01:29 AM

Reply
 
Thread Tools
Old 11-15-2011, 06:34 AM   #1
Boel
Member
 
Location: Stockholm, Sweden

Join Date: Oct 2009
Posts: 62
Default Insert size != Fragment size?

Hi All,

This is a very simple question, or should be, but there seems to be some confusion out there.

My understanding is that insert size is the number of bases between paired end reads (example: 300 bp fragments, 2*75 reads -> insert size 150). However, looking trough different threads here and information elsewhere the fragment is sometime referred to as insert as well, making insert size == fragment size.

I wonder whether different programs (BWA, picards CollectInsertSizeMetrics, bfast, samtools etc) have different definitions of insert size, which might make things messy. Right now I am especially interested in picards definition.

Any ideas?

Thanks,
Boel
Boel is offline   Reply With Quote
Old 11-15-2011, 06:42 AM   #2
cw11
Member
 
Location: Massachusetts

Join Date: Sep 2011
Posts: 12
Default

I'm not sure about Picard specifically, but I found a thread that discusses insert size here, which seems to suggest that the insert size is the stretch of sequence between the adapters (so in your example, 300 would be correct). However, certain tuxedo suite programs (tophat/cufflinks/bowtie) take a --mean-inner-dist option, defined as fragment length - reads.
cw11 is offline   Reply With Quote
Old 11-15-2011, 06:46 AM   #3
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

Yes, I've seen both definitions used by different software. I know Bowtie uses insert size == fragment size for example.

Personally I like this definition because you could sequence the same library with different read lengths, or you could have variable length reads (Ion Torrent) or you could trim your 3' read tips. Each step would vary the insert size, but the fragment size would remain constant.

I think part of the difficulty comes from the difference between Illumina paired-end protocols (e.g. bidirectional sequencing) where insert size is always related to fragment size and the long mate-pair/jumping protocols, where the insert size relates instead to the sizing step (e.g. 8kb gel-cut) and is independent of fragment length.
nickloman is offline   Reply With Quote
Old 11-15-2011, 07:37 AM   #4
Boel
Member
 
Location: Stockholm, Sweden

Join Date: Oct 2009
Posts: 62
Default

According to samtools help (which includes picard):
"For Illumina paired-end data, the inferred insert size would be the difference between the 5' positions of the two reads." This translates to 300 bases in my previous example, since nucleotides are added in 5' to 3' direction.

Thanks for replying!
Boel is offline   Reply With Quote
Old 11-15-2011, 08:11 AM   #5
cw11
Member
 
Location: Massachusetts

Join Date: Sep 2011
Posts: 12
Default

Yup - Glad you found an answer!
cw11 is offline   Reply With Quote
Old 12-12-2013, 06:46 AM   #6
jfostel
Junior Member
 
Location: Boston, MA

Join Date: Aug 2010
Posts: 7
Default

Whether insert size = fragment size does vary from tool to tool, I would specifically look it up for whatever you're using.

Regardless, the total adaptor-insert-adaptor length is useful as the best predictor of a library's amplification behavior (both in qPCR QC and on the flowcell). For example, you wouldn't want to pool together two libraries with identical 200bp inserts but very different adaptor + index lengths (unless it was acceptable for the majority of the reads to come from the smaller construct).
jfostel is offline   Reply With Quote
Old 12-12-2013, 08:28 AM   #7
rskr
Senior Member
 
Location: Santa Fe, NM

Join Date: Oct 2010
Posts: 249
Default

Don't bother, use a different term. You will be misunderstood. When you mean fragment size, say "fragment size", when mean the distance between the pairs. Say, "distance between the pairs".
rskr is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:20 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO