Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • comparing sequencing runs across Illumina machines

    Hello SEQanswers community,

    I am preparing to conduct a RRBS study of a conifer (with an estimated ~30Gb genome), and I would like to first complete a pilot study with very few samples on a Miseq (due to the lower cost of a Miseq run) to plan how many experimental samples I can eventually assess through one run of a Hiseq4000 or Nextseq500. I am targeting 103,000 loci of interest across the genome, and would like to obtain 60x coverage.

    I have heard mixed opinions on the advisability of the cross-platform planning approach I am considering, so I am interested to know if others have found that type of approach helpful, and/or can provide reasoning to help guide this decision.

    Thanks very much in advance for your input.

  • #2
    First of all standard RRBS using MspI may not be a good approach for plants as RRBS interrogates methylation status of C in CpG context which is most relevant for mammalians.

    In a pilot RRBS you will be interested in knowing restriction fragment numbers in a sample library to estimate sequencing requirement. This can be done in any platform. Choice of sequencer for large scale experiment will depend on sample number and cost. I don’t know about iSeq but other Illumina platforms can sequence bisulfite converted DNA equally well.

    Comment


    • #3
      Thanks for your response, nucacidhunter.

      What I gathered from your response is that the pilot test I propose using a MiSeq will be useful mainly for providing an estimate of the number of restriction fragments I am actually attempting to sequence, given that I currently only have theoretical estimates via in-silico digests. In that case, it makes sense that there should be no problem in using a MiSeq to assess the number of restriction fragments I achieve through my double digest. Knowing fragment number will then allow me to calculate the number of samples to assess in parallel on a higher throughput Illumina machine. Thanks for that clarification.

      Regarding the first portion of your reply:

      While plants do maintain a number of methylated sequence contexts, CpG appears to remain an important sequence context for studies of differential methylation in plants.

      See:
      Gugger, P.F., Fitz‐Gibbon, S., PellEgrini, M. and Sork, V.L., 2016. Species‐wide patterns of DNA methylation variation in Quercus lobata and their association with climate gradients. Molecular ecology, 25(8), pp.1665-1680

      That said, I am using HindIII and Taq(alpha)I for my double digests, as MspI is sensitive to certain methylation contexts present in the conifer genome I study.

      Comment


      • #4
        I should correct that by fragment numbers I meant fragments that are flanked by both RE and are in size selection window. With a 4x6 cutter in a 30 Gb genome I guess there will a lot more restriction fragments than 100k. Thanks for the reference as well.

        Assuming bisulfite conversion will be after adapter ligation I wonder how you go around high cost of methylated adapters.

        Comment


        • #5
          nucacidhunter:

          My in-silico digest (simRAD in R) returned only ~100k fragments that meet a size-selection criteria of ~150-550 bp, which I'll target for my sequencing runs. That criteria is surely why there were only 100k fragments instead of millions.

          To answer your question, I do not know of an alternative to using methylated adapters for this kind of work, so that is precisely what I've chosen to use...for now.

          I'd like to raise another aspect of my initial question on this thread: I have inexpensive & easy access to a NextSeq500, but have the sense that the HiSeq4000 might be the best platform for my full sequencing effort once this pilot study is complete. Given that the number of reads obtained from one NextSeq500 run (~400M) versus one lane of HiSeq4000 (~300M) are roughly comparable, what other factors should be considered for deciding which platform would be best for a ddRADseq approach?

          Comment


          • #6
            I am very surprised that you see just 100k fragments for a 6-cutter plus 4-cutter and a size selection of 150-550 bp. The 4-cutter will cut primarily every 150 bp - 350 bp, so your size selection will include many if not most of the 6 cutter sites, as it is likely they will have a 4-cutter in the 150-550bp flanking sequence. A 6-cutter will cut every 4kb, so I would expect >5M sites. Even with a skewed GC composition in the genome compared to the cut sites, it is hard to imagine going from 5M sites to 100k.

            We like the HiSeq4000 compared to the NextSeq because of the better quality data. For an RRBS study, though, will you just do short reads to tag the loci and see if they are present in the methylation-sensitive library? In that case the error doesn't matter much. But the HiSeq is cheaper per nucleotide where we are (https://gc3f.uoregon.edu/illumina-sequencing), compare $1700 for a Hiseq4000 lane of 150 bp to $2,800 for a comparable NextSeq lane. They run a ton of RADSeq and ddRAD libraries as well.
            Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

            Comment


            • #7
              Thanks for your reply, SNPsaurus.

              I've actually been in touch with U of O core center personnel regarding this project, and if I go HiSeq, that's where I'll send my samples.

              Your question is pertinent to my decision regarding which platform to use: "will you just do short reads to tag the loci and see if they are present in the methylation-sensitive library?"

              I'd like to obtain sequence data of high enough quality, coverage, and length to accomplish the following objectives:
              1) detect variation in methylation status among loci
              2) search for CG/CHH/CHG etc. sequence contexts surrounding or underlying differentially methylated loci
              3) call SNPs and blast differentially methylated regions to the closest reference genomes available for my non-model study organism (which, for now, will have to be another species of white pine) to infer rates of differential methylation in different functional genomic regions

              I've found published studies that achieved the above goals using 100 cycle single-end HiSeq2000 runs, and I'm honestly unclear as to whether paired-end sequencing and/or longer reads would better enable me to achieve my objectives. The only reason I am considering NextSeq is that I have access to that machine for only the price of the reagents, making it a bit more convenient than HiSeq. So, if the machine is appropriate for my needs, I would opt for NextSeq...but not at the cost of foregoing any of my research objectives.

              I certainly welcome any insights on these points.

              Comment


              • #8
                We got good results with both NextSeq and HiSeq4000 runs, so I think you are safe either way. Given the size of the genome, I'd want as much sequence information at each locus to help improve the mapping accuracy. I can't quite remember how the NextSeq does these days with invariant nucleotides (the cut site). It used to be an issue and require lots of phiX. I think they fixed it to some extant but I'd be careful if running the lane yourself and look into it.

                Will you pair the methylation-sensitive and insensitive libraries for every sample? Given ddRAD's propensity for locus drop out (from size selection variation and SNPs in the cut sites and locus sequence) you will need to have a control library for each sample and it should be in the same size selection run.

                Can you share details on how you got the 100k sites from the simulation? That one still has me wondering.
                Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM
                • seqadmin
                  The Impact of AI in Genomic Medicine
                  by seqadmin



                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                  02-26-2024, 02:07 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-14-2024, 06:13 AM
                0 responses
                32 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-08-2024, 08:03 AM
                0 responses
                71 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-07-2024, 08:13 AM
                0 responses
                80 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-06-2024, 09:51 AM
                0 responses
                68 views
                0 likes
                Last Post seqadmin  
                Working...
                X