Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Reid
    Junior Member
    • Mar 2015
    • 5

    comparing sequencing runs across Illumina machines

    Hello SEQanswers community,

    I am preparing to conduct a RRBS study of a conifer (with an estimated ~30Gb genome), and I would like to first complete a pilot study with very few samples on a Miseq (due to the lower cost of a Miseq run) to plan how many experimental samples I can eventually assess through one run of a Hiseq4000 or Nextseq500. I am targeting 103,000 loci of interest across the genome, and would like to obtain 60x coverage.

    I have heard mixed opinions on the advisability of the cross-platform planning approach I am considering, so I am interested to know if others have found that type of approach helpful, and/or can provide reasoning to help guide this decision.

    Thanks very much in advance for your input.
  • nucacidhunter
    Jafar Jabbari
    • Jan 2013
    • 1250

    #2
    First of all standard RRBS using MspI may not be a good approach for plants as RRBS interrogates methylation status of C in CpG context which is most relevant for mammalians.

    In a pilot RRBS you will be interested in knowing restriction fragment numbers in a sample library to estimate sequencing requirement. This can be done in any platform. Choice of sequencer for large scale experiment will depend on sample number and cost. I don’t know about iSeq but other Illumina platforms can sequence bisulfite converted DNA equally well.

    Comment

    • Reid
      Junior Member
      • Mar 2015
      • 5

      #3
      Thanks for your response, nucacidhunter.

      What I gathered from your response is that the pilot test I propose using a MiSeq will be useful mainly for providing an estimate of the number of restriction fragments I am actually attempting to sequence, given that I currently only have theoretical estimates via in-silico digests. In that case, it makes sense that there should be no problem in using a MiSeq to assess the number of restriction fragments I achieve through my double digest. Knowing fragment number will then allow me to calculate the number of samples to assess in parallel on a higher throughput Illumina machine. Thanks for that clarification.

      Regarding the first portion of your reply:

      While plants do maintain a number of methylated sequence contexts, CpG appears to remain an important sequence context for studies of differential methylation in plants.

      See:
      Gugger, P.F., Fitz‐Gibbon, S., PellEgrini, M. and Sork, V.L., 2016. Species‐wide patterns of DNA methylation variation in Quercus lobata and their association with climate gradients. Molecular ecology, 25(8), pp.1665-1680

      That said, I am using HindIII and Taq(alpha)I for my double digests, as MspI is sensitive to certain methylation contexts present in the conifer genome I study.

      Comment

      • nucacidhunter
        Jafar Jabbari
        • Jan 2013
        • 1250

        #4
        I should correct that by fragment numbers I meant fragments that are flanked by both RE and are in size selection window. With a 4x6 cutter in a 30 Gb genome I guess there will a lot more restriction fragments than 100k. Thanks for the reference as well.

        Assuming bisulfite conversion will be after adapter ligation I wonder how you go around high cost of methylated adapters.

        Comment

        • Reid
          Junior Member
          • Mar 2015
          • 5

          #5
          nucacidhunter:

          My in-silico digest (simRAD in R) returned only ~100k fragments that meet a size-selection criteria of ~150-550 bp, which I'll target for my sequencing runs. That criteria is surely why there were only 100k fragments instead of millions.

          To answer your question, I do not know of an alternative to using methylated adapters for this kind of work, so that is precisely what I've chosen to use...for now.

          I'd like to raise another aspect of my initial question on this thread: I have inexpensive & easy access to a NextSeq500, but have the sense that the HiSeq4000 might be the best platform for my full sequencing effort once this pilot study is complete. Given that the number of reads obtained from one NextSeq500 run (~400M) versus one lane of HiSeq4000 (~300M) are roughly comparable, what other factors should be considered for deciding which platform would be best for a ddRADseq approach?

          Comment

          • SNPsaurus
            Registered Vendor
            • May 2013
            • 525

            #6
            I am very surprised that you see just 100k fragments for a 6-cutter plus 4-cutter and a size selection of 150-550 bp. The 4-cutter will cut primarily every 150 bp - 350 bp, so your size selection will include many if not most of the 6 cutter sites, as it is likely they will have a 4-cutter in the 150-550bp flanking sequence. A 6-cutter will cut every 4kb, so I would expect >5M sites. Even with a skewed GC composition in the genome compared to the cut sites, it is hard to imagine going from 5M sites to 100k.

            We like the HiSeq4000 compared to the NextSeq because of the better quality data. For an RRBS study, though, will you just do short reads to tag the loci and see if they are present in the methylation-sensitive library? In that case the error doesn't matter much. But the HiSeq is cheaper per nucleotide where we are (https://gc3f.uoregon.edu/illumina-sequencing), compare $1700 for a Hiseq4000 lane of 150 bp to $2,800 for a comparable NextSeq lane. They run a ton of RADSeq and ddRAD libraries as well.
            Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

            Comment

            • Reid
              Junior Member
              • Mar 2015
              • 5

              #7
              Thanks for your reply, SNPsaurus.

              I've actually been in touch with U of O core center personnel regarding this project, and if I go HiSeq, that's where I'll send my samples.

              Your question is pertinent to my decision regarding which platform to use: "will you just do short reads to tag the loci and see if they are present in the methylation-sensitive library?"

              I'd like to obtain sequence data of high enough quality, coverage, and length to accomplish the following objectives:
              1) detect variation in methylation status among loci
              2) search for CG/CHH/CHG etc. sequence contexts surrounding or underlying differentially methylated loci
              3) call SNPs and blast differentially methylated regions to the closest reference genomes available for my non-model study organism (which, for now, will have to be another species of white pine) to infer rates of differential methylation in different functional genomic regions

              I've found published studies that achieved the above goals using 100 cycle single-end HiSeq2000 runs, and I'm honestly unclear as to whether paired-end sequencing and/or longer reads would better enable me to achieve my objectives. The only reason I am considering NextSeq is that I have access to that machine for only the price of the reagents, making it a bit more convenient than HiSeq. So, if the machine is appropriate for my needs, I would opt for NextSeq...but not at the cost of foregoing any of my research objectives.

              I certainly welcome any insights on these points.

              Comment

              • SNPsaurus
                Registered Vendor
                • May 2013
                • 525

                #8
                We got good results with both NextSeq and HiSeq4000 runs, so I think you are safe either way. Given the size of the genome, I'd want as much sequence information at each locus to help improve the mapping accuracy. I can't quite remember how the NextSeq does these days with invariant nucleotides (the cut site). It used to be an issue and require lots of phiX. I think they fixed it to some extant but I'd be careful if running the lane yourself and look into it.

                Will you pair the methylation-sensitive and insensitive libraries for every sample? Given ddRAD's propensity for locus drop out (from size selection variation and SNPs in the cut sites and locus sequence) you will need to have a control library for each sample and it should be in the same size selection run.

                Can you share details on how you got the 100k sites from the simulation? That one still has me wondering.
                Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  06-02-2026, 10:05 AM
                • SEQadmin2
                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                  by SEQadmin2


                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                  Introduction

                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                  05-22-2026, 06:42 AM
                • SEQadmin2
                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                  by SEQadmin2

                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                  05-06-2026, 09:04 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Today, 08:59 AM
                0 responses
                7 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-02-2026, 12:03 PM
                0 responses
                21 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-02-2026, 11:40 AM
                0 responses
                14 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-28-2026, 11:40 AM
                0 responses
                29 views
                0 reactions
                Last Post SEQadmin2  
                Working...