SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Can I get the Illumina model from a machines name? Greg Illumina/Solexa 6 12-01-2014 11:00 AM
Comparing the quality of de novo assembly from two runs' data NGS group Bioinformatics 3 05-10-2013 08:17 AM
Read length of different illumina machines Seq_g Illumina/Solexa 4 01-30-2013 09:28 AM
Sequencing machines and electropherograms robekubica Bioinformatics 3 08-19-2011 04:05 AM
sequence coverage profile/gaps for various sequencing machines mjpablo23 Illumina/Solexa 1 04-08-2011 01:33 AM

Reply
 
Thread Tools
Old 04-18-2018, 09:48 AM   #1
Reid
Junior Member
 
Location: Flagstaff, Arizona, USA

Join Date: Mar 2015
Posts: 5
Default comparing sequencing runs across Illumina machines

Hello SEQanswers community,

I am preparing to conduct a RRBS study of a conifer (with an estimated ~30Gb genome), and I would like to first complete a pilot study with very few samples on a Miseq (due to the lower cost of a Miseq run) to plan how many experimental samples I can eventually assess through one run of a Hiseq4000 or Nextseq500. I am targeting 103,000 loci of interest across the genome, and would like to obtain 60x coverage.

I have heard mixed opinions on the advisability of the cross-platform planning approach I am considering, so I am interested to know if others have found that type of approach helpful, and/or can provide reasoning to help guide this decision.

Thanks very much in advance for your input.
Reid is offline   Reply With Quote
Old 04-19-2018, 03:18 AM   #2
nucacidhunter
Senior Member
 
Location: Melbourne

Join Date: Jan 2013
Posts: 1,166
Default

First of all standard RRBS using MspI may not be a good approach for plants as RRBS interrogates methylation status of C in CpG context which is most relevant for mammalians.

In a pilot RRBS you will be interested in knowing restriction fragment numbers in a sample library to estimate sequencing requirement. This can be done in any platform. Choice of sequencer for large scale experiment will depend on sample number and cost. I donít know about iSeq but other Illumina platforms can sequence bisulfite converted DNA equally well.
nucacidhunter is offline   Reply With Quote
Old 04-19-2018, 08:26 AM   #3
Reid
Junior Member
 
Location: Flagstaff, Arizona, USA

Join Date: Mar 2015
Posts: 5
Default

Thanks for your response, nucacidhunter.

What I gathered from your response is that the pilot test I propose using a MiSeq will be useful mainly for providing an estimate of the number of restriction fragments I am actually attempting to sequence, given that I currently only have theoretical estimates via in-silico digests. In that case, it makes sense that there should be no problem in using a MiSeq to assess the number of restriction fragments I achieve through my double digest. Knowing fragment number will then allow me to calculate the number of samples to assess in parallel on a higher throughput Illumina machine. Thanks for that clarification.

Regarding the first portion of your reply:

While plants do maintain a number of methylated sequence contexts, CpG appears to remain an important sequence context for studies of differential methylation in plants.

See:
Gugger, P.F., Fitz‐Gibbon, S., PellEgrini, M. and Sork, V.L., 2016. Species‐wide patterns of DNA methylation variation in Quercus lobata and their association with climate gradients. Molecular ecology, 25(8), pp.1665-1680

That said, I am using HindIII and Taq(alpha)I for my double digests, as MspI is sensitive to certain methylation contexts present in the conifer genome I study.
Reid is offline   Reply With Quote
Old 04-19-2018, 02:20 PM   #4
nucacidhunter
Senior Member
 
Location: Melbourne

Join Date: Jan 2013
Posts: 1,166
Default

I should correct that by fragment numbers I meant fragments that are flanked by both RE and are in size selection window. With a 4x6 cutter in a 30 Gb genome I guess there will a lot more restriction fragments than 100k. Thanks for the reference as well.

Assuming bisulfite conversion will be after adapter ligation I wonder how you go around high cost of methylated adapters.
nucacidhunter is offline   Reply With Quote
Old 04-25-2018, 09:25 AM   #5
Reid
Junior Member
 
Location: Flagstaff, Arizona, USA

Join Date: Mar 2015
Posts: 5
Default

nucacidhunter:

My in-silico digest (simRAD in R) returned only ~100k fragments that meet a size-selection criteria of ~150-550 bp, which I'll target for my sequencing runs. That criteria is surely why there were only 100k fragments instead of millions.

To answer your question, I do not know of an alternative to using methylated adapters for this kind of work, so that is precisely what I've chosen to use...for now.

I'd like to raise another aspect of my initial question on this thread: I have inexpensive & easy access to a NextSeq500, but have the sense that the HiSeq4000 might be the best platform for my full sequencing effort once this pilot study is complete. Given that the number of reads obtained from one NextSeq500 run (~400M) versus one lane of HiSeq4000 (~300M) are roughly comparable, what other factors should be considered for deciding which platform would be best for a ddRADseq approach?
Reid is offline   Reply With Quote
Old 04-25-2018, 01:04 PM   #6
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 442
Default

I am very surprised that you see just 100k fragments for a 6-cutter plus 4-cutter and a size selection of 150-550 bp. The 4-cutter will cut primarily every 150 bp - 350 bp, so your size selection will include many if not most of the 6 cutter sites, as it is likely they will have a 4-cutter in the 150-550bp flanking sequence. A 6-cutter will cut every 4kb, so I would expect >5M sites. Even with a skewed GC composition in the genome compared to the cut sites, it is hard to imagine going from 5M sites to 100k.

We like the HiSeq4000 compared to the NextSeq because of the better quality data. For an RRBS study, though, will you just do short reads to tag the loci and see if they are present in the methylation-sensitive library? In that case the error doesn't matter much. But the HiSeq is cheaper per nucleotide where we are (https://gc3f.uoregon.edu/illumina-sequencing), compare $1700 for a Hiseq4000 lane of 150 bp to $2,800 for a comparable NextSeq lane. They run a ton of RADSeq and ddRAD libraries as well.
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 04-25-2018, 01:43 PM   #7
Reid
Junior Member
 
Location: Flagstaff, Arizona, USA

Join Date: Mar 2015
Posts: 5
Default

Thanks for your reply, SNPsaurus.

I've actually been in touch with U of O core center personnel regarding this project, and if I go HiSeq, that's where I'll send my samples.

Your question is pertinent to my decision regarding which platform to use: "will you just do short reads to tag the loci and see if they are present in the methylation-sensitive library?"

I'd like to obtain sequence data of high enough quality, coverage, and length to accomplish the following objectives:
1) detect variation in methylation status among loci
2) search for CG/CHH/CHG etc. sequence contexts surrounding or underlying differentially methylated loci
3) call SNPs and blast differentially methylated regions to the closest reference genomes available for my non-model study organism (which, for now, will have to be another species of white pine) to infer rates of differential methylation in different functional genomic regions

I've found published studies that achieved the above goals using 100 cycle single-end HiSeq2000 runs, and I'm honestly unclear as to whether paired-end sequencing and/or longer reads would better enable me to achieve my objectives. The only reason I am considering NextSeq is that I have access to that machine for only the price of the reagents, making it a bit more convenient than HiSeq. So, if the machine is appropriate for my needs, I would opt for NextSeq...but not at the cost of foregoing any of my research objectives.

I certainly welcome any insights on these points.
Reid is offline   Reply With Quote
Old 04-25-2018, 02:13 PM   #8
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 442
Default

We got good results with both NextSeq and HiSeq4000 runs, so I think you are safe either way. Given the size of the genome, I'd want as much sequence information at each locus to help improve the mapping accuracy. I can't quite remember how the NextSeq does these days with invariant nucleotides (the cut site). It used to be an issue and require lots of phiX. I think they fixed it to some extant but I'd be careful if running the lane yourself and look into it.

Will you pair the methylation-sensitive and insensitive libraries for every sample? Given ddRAD's propensity for locus drop out (from size selection variation and SNPs in the cut sites and locus sequence) you will need to have a control library for each sample and it should be in the same size selection run.

Can you share details on how you got the 100k sites from the simulation? That one still has me wondering.
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Reply

Tags
cross-platform comparison, illumina, rrbs

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:55 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO