SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tophat insert size of paired-end reads ozs2006 Bioinformatics 15 07-30-2013 06:40 PM
About Insert, Insert size and MIRA mates.file aarthi.talla 454 Pyrosequencing 1 08-01-2011 01:37 PM
Regarding paired -end assembly insert-size calculation on newbler2.3 ganga.jeena Bioinformatics 2 03-01-2011 05:34 AM
Velvet insert length on Illumina NGS Paired end reads sari_khaleel Illumina/Solexa 0 10-29-2010 08:12 AM
Insert size for paired end sequencing for identification of structural variation mimi_lupton Sample Prep / Library Generation 0 08-31-2010 06:29 AM

Reply
 
Thread Tools
Old 07-31-2011, 10:53 PM   #1
louis7781x
Member
 
Location: taipei

Join Date: Oct 2010
Posts: 74
Default The insert-size in paired-end data

Hi,

I have a question about term "insert-size"


If |-----75----|------------------------100-----------------|-----75-----|

paired-end data both 75 mer,In this example, The insert-size is 100 or 250?

I am confused with fragment size

Thanks

Best regard!

Last edited by louis7781x; 08-01-2011 at 12:51 AM.
louis7781x is offline   Reply With Quote
Old 07-31-2011, 11:58 PM   #2
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 585
Default

The insert is normally the stretch of sequence between the paired-end adapters, so in your case the insert size would be 250 bp (2x75 bp reads + 100 bp unsequenced middle piece). The fragment size (which you need to select for during a gel purification for example) would be the insert size + length of both adapters (around 120 bp extra for both Illumina adapters).
fkrueger is offline   Reply With Quote
Old 08-01-2011, 12:57 AM   #3
louis7781x
Member
 
Location: taipei

Join Date: Oct 2010
Posts: 74
Default

Quote:
Originally Posted by fkrueger View Post
The insert is normally the stretch of sequence between the paired-end adapters, so in your case the insert size would be 250 bp (2x75 bp reads + 100 bp unsequenced middle piece). The fragment size (which you need to select for during a gel purification for example) would be the insert size + length of both adapters (around 120 bp extra for both Illumina adapters).
Hi,Does adapter also sequence too? I mean row data has adapter sequence?
louis7781x is offline   Reply With Quote
Old 08-01-2011, 01:02 AM   #4
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 585
Default

Normally sequencing starts right after the adapter but does not include adapter sequence.
fkrueger is offline   Reply With Quote
Old 08-01-2011, 05:17 AM   #5
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,123
Default

Quote:
Originally Posted by louis7781x View Post
Hi,

I have a question about term "insert-size"


If |-----75----|------------------------100-----------------|-----75-----|

paired-end data both 75 mer,In this example, The insert-size is 100 or 250?

I am confused with fragment size

Thanks

Best regard!
In your example I would say the insert size is 250bp. But as fkrueger noted above there is more than one way to describe things. When the wet lab sends data to me they report the library fragment size which includes the ligated Illumina adapters; continuing with your example the fragment size in this case would have been 320bp. Certain software may use different measurements. For example TopHat requests the mate inner distance, the length between the two sequence reads, which in your example is 100bp.

The lesson is to be very clear about what is being asked or reported.
kmcarr is offline   Reply With Quote
Old 09-23-2013, 09:09 PM   #6
arkilis
Senior Member
 
Location: Australia

Join Date: Jul 2013
Posts: 119
Wink

Quote:
Originally Posted by louis7781x View Post
Hi,

I have a question about term "insert-size"


If |-----75----|------------------------100-----------------|-----75-----|

paired-end data both 75 mer,In this example, The insert-size is 100 or 250?

I am confused with fragment size

Thanks

Best regard!
As far as I know it is the 250..

150 is a typo...sorry

Last edited by arkilis; 09-24-2013 at 03:44 PM.
arkilis is offline   Reply With Quote
Old 09-24-2013, 03:53 AM   #7
mcnelson.phd
Senior Member
 
Location: Connecticut

Join Date: Jul 2011
Posts: 162
Default

As noted, when most analysis programs ask for an insert size, they are referring to the size of your fragment with the adapters excluded, 250bp in your case. However, some programs use the term insert size to mean the gap distance between the 3' end of the two reads (assuming standard forward/reverse orientation), which in your case is 100bp. Most programs are decently documented enough to state which version they mean when they say insert size, but you shouldn't assume that it's interchangeable. The term pair-distance is also used, and just like insert size has been taken to mean both the size of the fragment minus the adapters (250bp) or the gap distance (100bp).

For assemblies interchanging the two value won't cause huge problems, but for read mapping methods where you want to look for insertions/deletions or splice variation then inputting the correct value can be very important.
mcnelson.phd is offline   Reply With Quote
Old 09-24-2013, 04:12 AM   #8
Yue Xu
Member
 
Location: china

Join Date: Jun 2013
Posts: 16
Default Expression quintiles

Sorry, I am recently study some about transcription assembly. Can you tell me the meaning of Expression quintiles? Thank you very much.
Yue Xu is offline   Reply With Quote
Old 09-24-2013, 04:13 AM   #9
Yue Xu
Member
 
Location: china

Join Date: Jun 2013
Posts: 16
Default

Quote:
Originally Posted by Yue Xu View Post
Sorry, I am recently study some about transcription assembly. Can you tell me the meaning of Expression quintiles? Thank you very much.
Oh, sorry, I post it wrongly.
Yue Xu is offline   Reply With Quote
Old 09-24-2013, 04:21 AM   #10
thomasblomquist
Member
 
Location: Ohio

Join Date: Jul 2012
Posts: 68
Default

P5 --- Index/Barcode1 --- Read 1 Primer --- Insert/TargetFragment --- Read 2 Primer --- Index/Barcode2 --- P7


The Insert/TargetFragment region needs to be less than the size of the base length sequencing kit you're using. For example if you use a 2 x 100 PE kit, and you require at least 20 bases of overlap from Read 1 and Read 2, your insert fragments cannot be larger than 180 bases in length.

As stated above, the P5 --- Index/Barcode1 --- Read 1 Primer, and Read 2 Primer --- Index/Barcode2 --- P7 add about 120-130 bases of length onto your insert fragment (depending on the size of the index barcodes and type of read 1 and read 2 primers you have chosen).

Keep in mind that others have reported/observed, and myself included, that the efficiency and success rate of the clustering step is significantly reduced when a final library template molecule is <250 or >800 bases. Thus, make sure the sum of the lengths falls between these ranges if possible. Quantitation between deletion/insertion alleles that straddle these upper and lower ranges cannot be trusted for reproducibility between different library preps (Just an FYI, personal observation).

-Tom
thomasblomquist is offline   Reply With Quote
Old 09-24-2013, 09:45 AM   #11
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,123
Default

Quote:
Originally Posted by thomasblomquist View Post
P5 --- Index/Barcode1 --- Read 1 Primer --- Insert/TargetFragment --- Read 2 Primer --- Index/Barcode2 --- P7


The Insert/TargetFragment region needs to be less than the size of the base length sequencing kit you're using. For example if you use a 2 x 100 PE kit, and you require at least 20 bases of overlap from Read 1 and Read 2, your insert fragments cannot be larger than 180 bases in length.
It is not required that the two reads overlap. For most applications you do not, in fact want them to overlap and thus want an insert size larger than 2x read length.

Quote:
Keep in mind that others have reported/observed, and myself included, that the efficiency and success rate of the clustering step is significantly reduced when a final library template molecule is <250 or >800 bases. Thus, make sure the sum of the lengths falls between these ranges if possible.
-Tom
Having found over the years a metric crap-ton of adapter dimers (120 bp fragment size) in read data where none is visible in the Bioanalyzer trace of the library I would say that fragments ≤ 150bp cluster and amplify efficiently as hell.
kmcarr is offline   Reply With Quote
Old 09-24-2013, 10:21 AM   #12
thomasblomquist
Member
 
Location: Ohio

Join Date: Jul 2012
Posts: 68
Default

Quote:
Originally Posted by kmcarr View Post
It is not required that the two reads overlap. For most applications you do not, in fact want them to overlap and thus want an insert size larger than 2x read length.
Correct, I did not place the "if you need overlap" qualifier.

Quote:
Originally Posted by kmcarr View Post
Having found over the years a metric crap-ton of adapter dimers (120 bp fragment size) in read data where none is visible in the Bioanalyzer trace of the library I would say that fragments ≤ 150bp cluster and amplify efficiently as hell.
LMAO. Yes, they do indeed cluster. I think, and I'm just surmising here, that the adapter dimers (ssDNA), heterodimerize with actual target template (dsDNA). My evidence to this statement is that in my amplicon libraries, wherein I stop the PCR prep in early cycles, when the target size peak is just starting to crop up on the electropherogram on the bioanalyzer DNA chip, then size extract that target peak, I get virtually no primer/adapter dimers sequenced. However, as the target peak begins to reach plateau in PCR, the dimer peak starts to diminish a bit, and my thoughts are that the adapter dimer, is non-specifically annealing to other target-specific templates. These electrophorese on the Bioanalyzer at or around the target specific size, and in a non-denaturing size-based extraction, will be pulled into the final library. In these latter cases with over-shooting the cycles in the PCR based library prep, I see a ton of adapter or read1/2 dimer products formed.

As for ligation type approach, my assumption is that it is probably fairly easy to subsequently accidently denature and reanneal a complex library and the adapter/read primer dimers get heterodimerized with other large complexes.

The key then is to pull out the ssDNA that is the target length. PAGE purification? But yield tends to be too low.

Thus, I tend to aim for a low minimal number of PCR cycles, and keeping the prepped library cool to minimize this issue.

Good point to bring up! :-)

-Tom
thomasblomquist is offline   Reply With Quote
Old 10-01-2013, 08:16 AM   #13
mohiuddinbdfh
Junior Member
 
Location: Canada

Join Date: Jun 2013
Posts: 2
Default

Hi,
I am a newbie in metagenomics. I just sequenced my soil DNA samples through Illumina HiSeq2000 (2X151 bp). Now I need to assemble my sequences and for doing that I need the insert size, the minimum and maximum distance between the sequences. I asked the sequencing facility about this but they send me the bioanalyzer result which looks complicated to me. I attached the bioanalyzer result here. I will appreciate if anyone can explain this bioanalyzer result.

Thanks
mohiuddinbdfh is offline   Reply With Quote
Old 10-11-2014, 12:19 AM   #14
ymc
Senior Member
 
Location: Hong Kong

Join Date: Mar 2010
Posts: 489
Default

Is this a case of running 2x250 on MiSeq but is getting 150-350 PE reads???

http://www.ncbi.nlm.nih.gov/sra/?term=SRR1145846
ymc is offline   Reply With Quote
Old 02-05-2017, 09:36 PM   #15
sunguk
Junior Member
 
Location: Korea

Join Date: Nov 2016
Posts: 1
Default

Quote:
Originally Posted by mohiuddinbdfh View Post
Hi,
I am a newbie in metagenomics. I just sequenced my soil DNA samples through Illumina HiSeq2000 (2X151 bp). Now I need to assemble my sequences and for doing that I need the insert size, the minimum and maximum distance between the sequences. I asked the sequencing facility about this but they send me the bioanalyzer result which looks complicated to me. I attached the bioanalyzer result here. I will appreciate if anyone can explain this bioanalyzer result.

Thanks
It may mean the size of your DNA before sequencing. By observing the sizes of DNAs, we can check contaminants. And they will fragment DNAs and sequence them. Later you will get the sequencing data.
By the way, your data looks strange.
And this result does not have nothing with Illumina library insertion data.
Generally, the insertion size can be 180-350 bp.
You better BLAST both sequences of the same id and manually check the insertion size.

Last edited by sunguk; 02-06-2017 at 01:15 AM.
sunguk is offline   Reply With Quote
Old 02-06-2017, 08:42 AM   #16
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,197
Default

Quote:
Originally Posted by sunguk View Post
It may mean the size of your DNA before sequencing. By observing the sizes of DNAs, we can check contaminants. And they will fragment DNAs and sequence them. Later you will get the sequencing data.
By the way, your data looks strange.
And this result does not have nothing with Illumina library insertion data.
Generally, the insertion size can be 180-350 bp.
You better BLAST both sequences of the same id and manually check the insertion size.
What is strange about it?

The 35bp and ~10.3kb peaks, are the spike-in size standards that Bioanalyzers use for DNA high sensitivity chips.

Normally TruSeq "PCR-Free" libraries will produce a range of product sizes similar to this. Of course, the longer products will fail to produce sequence-able clusters. They seem to be unable to compete with the shorter products.

Of course the pure-programmers probably aren't reading this sub-forum. But to any who are: Please don't require insert sizes to run your assembler/mappers! If your algorithms really need that information, figure it out! The bioanalyzer results don't give you an accurate assessment of the insert sizes, for the reasons I describe above. So asking the lab what the average size of the inserts were is not terribly useful!

--
Phillip
pmiguel is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:02 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO