![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
HiSeq X different insert size? | JBKri | Illumina/Solexa | 5 | 11-05-2016 07:42 AM |
1000bp insert size with Illumina TruSeq DNA PCR-Free Library prep kit | Lovro | Sample Prep / Library Generation | 13 | 10-03-2016 12:10 PM |
HiSeq insert size | njlodato | Sample Prep / Library Generation | 0 | 09-04-2015 11:03 AM |
Maximum insert size with Hiseq 3000 | upendra_35 | Illumina/Solexa | 6 | 07-23-2015 04:44 AM |
150bp-1.3kb insert size PE on HiSeq | jmugford | Illumina/Solexa | 5 | 04-19-2012 08:43 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: US Join Date: Apr 2013
Posts: 222
|
![]()
Hello, I did a illumina HiSeq 2X150 bp metagenomic sequencing recently. I have some questions.
1>I have got my sequencing report back (from sequencing center). The report says the average insert size is about 600bp, which means majority of reads that was prepare to be sequenced are around 600bp. I am confused about it. You know, after I got my fastq files back (R1 and R2). I firstly merged paired ends. I have >60% of reads that can be join together successfully. I don't how could this happen. Since the method only sequence 150 bp, and the fragment is 600 bp. There will be no overlaps (150 X 2 = 300 bp << 600 bp). Why I can still get so many reads joined. Let says, if I want to join more paired - end reads, the fragment size should be designed less than 300 bp right? 2> The report also says "300 cycles using the HiSeq system". This straight-forward. I suppose for R1 and R2 is 150 cycles, receptively. Each cycle will add one nucleotide and 150 cycle will be 150 bp. The sequencing center says they can also do maximum 500 cycles, which means 2X250 bp sequencing. I was wondering why they don't run more cycles such as 1000 cycles, so we could get 2X500 bp. This will give us longer reads. I don't know which factors restrict the illumina reads lengths? For the reports, it seems we can increase cycles to get longer reads. Thanks, |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]()
Use BBMap to estimate insert sizes. There are two methods described here. That estimate of 600 bp is clearly wrong since you would not have been able to merge the R1/R2 reads otherwise.
2x250 is maximum supported length on HiSeq 2500 and 2 x 300 on MiSeq. One can't get longer sequencing lengths on currently available Illumina sequencing kits. One could run asymmetric runs (e.g. 1 x 600 bp) but that is not generally recommended due to drops in quality you are bound to experience towards the end of such runs. Last edited by GenoMax; 03-07-2017 at 09:25 AM. |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: US Join Date: Apr 2013
Posts: 222
|
![]()
Hi GenoMax,
Yes, I know the bioinformatic tools BBMAP. According to their report, it says they determine the size of library using Agilent 2100 Bioanalyzer. I have never used a Bioanalyzer. I would guess it is kind of instrument that can do physical measurement (not a bioinformatic tool). Do you suggest that their reports or measurements are wrong. I should use bioinformatic tools to check it? Is it common that bioanalyzer gives you a wrong number? So, I am correct, right? To join 2X150 bp, most of inserts should be less than 300bp. |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]()
BBMap is going to give you an absolute answer by actually using the data that is there. There is no ambiguity involved. It will work if you have a reference available or without. Only case it won't work is if you have reads that don't merge and you don't have a reference available.
If you are able to join the PE reads then there are some inserts there that are smaller than 300 bp. While you library may have had fragments in the 600 bp range, if there were any that were of a smaller size (as indicated by tails on bioanalyzer traces, you don't get an an absolute answer from bioanalyzer, AFAIK) then those fragments will preferentially bind and form clusters. Last edited by GenoMax; 03-07-2017 at 10:10 AM. |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: US Join Date: Apr 2013
Posts: 222
|
![]()
Hi Genomax,
Thanks. What you said makes me think the sequencing center send me a wrong report. They might mean the largest fragment. It doesn't make any sense for them to build so large fragment. 2X150bp only can sequence 300 bp maximum. If they build a library size of 600 bp, there are 300bp gaps out there. The coverage won't be very good. Thanks, |
![]() |
![]() |
![]() |
#6 | |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]() Quote:
Choice of insert sizes depends on what you are trying to do. If you have a reference available then making the libraries so the two ends do not overlap makes sense since you can sample a larger region. If you must have the entire region covered by the two reads (i.e. reads need to overlap) then you would want to make inserts smaller. Which of these two cases were you wanting to do? |
|
![]() |
![]() |
![]() |
#7 | |
Senior Member
Location: US Join Date: Apr 2013
Posts: 222
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]()
What kind of samples are these and what will you be doing with them (assembly?) downstream?
|
![]() |
![]() |
![]() |
#9 | |
Senior Member
Location: US Join Date: Apr 2013
Posts: 222
|
![]() Quote:
PS, I don't understand in your previous post about "If you have a reference available then making the libraries so the two ends do not overlap makes sense since you can sample a larger region". Just curious. I don't do model organisms and so normally there is no reference database. However, if they chose 2X150bp and have a reference database, but use 600 bp inserts. You can only sequence 150 bp from either end, but I still can't get information about 300 bp in the middle of the fragment. Why would they build a larger fragment library? |
|
![]() |
![]() |
![]() |
#10 |
Member
Location: CA Join Date: Jul 2013
Posts: 74
|
![]()
Are you sure they subtracted the adapter length from the fragment sizes to get the insert sizes (meaning, are you sure they're reporting insert size from the bioanalyzer?)? If the fragments themselves are an average of 600bp with a fairly wide distribution, it wouldn't be surprising if 60% of your reads merged with 150bp PE.
That said, we've (very rarely) had libraries that gave drastically different results between bioanalyzer, fragment analyzer, and tapestation, with the empirical insert size distributions determined after sequencing not agreeing with any of them. |
![]() |
![]() |
![]() |
#11 | ||
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]() Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#12 |
Senior Member
Location: Cambridge Join Date: Sep 2010
Posts: 116
|
![]()
Also do not forget the clustering efficiency dependency on insert size.
Basically despite your library having 600bp fragments, they would clusters less efficiently (~10x?) than 200-300 bp fragments present in the sample. As a result one gets a peak on FLASH histogram in the area that is ~1/3x on the rising side of the bell curve produces by bioanalyzer. (You get enrichment of the smaller fragments during the clustering stage.) PS: with latest iteration of the Illumina instruments (Hiseq4000/NovaSeq) they seem to continue to support libraries with up to 350 bp insert size - Shorter insets give you smaller and brighter (clusters/wells) + less likely to be long enough to jump to neighbouring wells - so can be sequenced on higher densities. As the result we get max 2x150 bp max. support from (Hiseq4000/NovaSeq). If you need 2x250 stick with HiSeq2500 or MiSeq. |
![]() |
![]() |
![]() |
#13 |
Member
Location: florida Join Date: Jan 2013
Posts: 67
|
![]()
for our soil samples, the assembled reads normally account for ~50% of the original reads. BTW, our data is >10 Gb per sample.
|
![]() |
![]() |
![]() |
#14 | |
Senior Member
Location: US Join Date: Apr 2013
Posts: 222
|
![]()
Hi, thanks. Can you explain more about the clustering stage. I don't know much details about HiSeq? Clustering stage -- do you mean it is a step of library building or Bridge amplification?
Quote:
|
|
![]() |
![]() |
![]() |
#15 |
Senior Member
Location: Cambridge Join Date: Sep 2010
Posts: 116
|
![]()
Clustering means bridge amplification for pre ExAmp (non-patterned flowcells) - in situ PCR on the flow cell surface oligos lawn. Has similar rukes/laws to a regular PCR, only the product stays in situ, forming a forest from DNA strands.
For ExAmp Chemistry (patterned flowcells) - Clustering means cluster formation using Isothermal Amplification. (In theory only on the occupied nanowell, in practice, especially at low loading concentrations a few neighbours may join in too...). Have a read about ExAmp & Hiseq4000: http://core-genomics.blogspot.co.uk/...d-to-know.html |
![]() |
![]() |
![]() |
#16 | |
Senior Member
Location: US Join Date: Apr 2013
Posts: 222
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#17 |
Senior Member
Location: California Join Date: Jul 2014
Posts: 198
|
![]()
Out of curiosity, why are you joining the read pairs? A lot of the metagenomics software out there now supports paired end reads as input. The metaSPAdes assembler @GenoMax mentioned requires paired end data IIRC.
|
![]() |
![]() |
![]() |
#18 |
Senior Member
Location: US Join Date: Apr 2013
Posts: 222
|
![]()
Hey, I did try metaSPAdes, less than 1% of total reads assembled. A lot of people tried alternative methods, joined paired-ends and get long reads, but don't assembled reads. Then, use the long merged reads to do BLAST or other annotations.
|
![]() |
![]() |
![]() |
#19 |
Senior Member
Location: California Join Date: Jul 2014
Posts: 198
|
![]()
Would something like kraken or CLARK not be helpful? Are you trying to assemble and annotate de novo genomes? Or trying to figure out the microbial composition and functional content? I guess my point is you would discard ~40% of your data in the joining process, which may not be necessary depending on your task of interest.
|
![]() |
![]() |
![]() |
#20 | |
Senior Member
Location: US Join Date: Apr 2013
Posts: 222
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|