Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Considerations for a pilot study: HiSeq SR100 vs. MiSeq PE300

    I am involved in a pilot experiment to examine the metagenomes of two environmental samples. This is largely exploratory at this point, with the intent being cataloging the genomic potential of the samples. Based on preliminary 16S clone libraries, approximately half of the community appears to be comprised of two OTUs.

    Our budget is quite limited, and at this point I can do one lane of HiSeq 100 bp single-end sequencing. If it would be very beneficial, that could be stretched to 150 bp single-end, but I realize this will reduce the total number of reads from 180-200 million to ~140-150 million.

    Another possibility that I am unfamiliar with is a MiSeq 2x300 run. Should this be a consideration?

    Given the relatively low complexity, how likely is the assembly of a draft bacterial genome?

    Thanks for your comments!

  • #2
    You will only get around 25M reads out of a 2x300 run on the MiSeq.

    Are you attempting to assemble the genome of a species that falls outside of the 2 major OTUs?

    Comment


    • #3
      Originally posted by anth View Post
      Our budget is quite limited, and at this point I can do one lane of HiSeq 100 bp single-end sequencing. If it would be very beneficial, that could be stretched to 150 bp single-end, but I realize this will reduce the total number of reads from 180-200 million to ~140-150 million.
      I'm not sure about this. Our lab sequences mainly 2x150bp on HiSeq and to the best of my knowledge it does not reduce the read count. No reason why it should, unless the machine is deliberately underloaded when doing longer reads, or you throw away all the reads with low quality tails (which you shouldn't; they should be trimmed instead).

      I think you need to look at the quality profiles of the expected output. If you can get 150bp reads with high quality out to near the end (say, avg Q20 at pos 140) then you should absolutely get 150bp reads; they would give you higher total coverage and much higher kmer coverage at a K of say 63. If the 150bp reads would be low quality then go with 100bp.

      Another possibility that I am unfamiliar with is a MiSeq 2x300 run. Should this be a consideration?
      That depends on the complexity of the metagenome. Longer reads (and paired reads) are great for metagenomes, but MiSeq yield is much lower. High-complexity metagenomes need extremely high coverage (billions of reads). One Miseq run might work fine for an artificial community with only a couple dozen organisms.

      Given the relatively low complexity, how likely is the assembly of a draft bacterial genome?
      It's likely that with the correct procedure you can recover a decent assembly of the #1 most abundant organism. Beyond that it's really hard to say without actual numbers
      Last edited by Brian Bushnell; 05-07-2014, 09:42 AM.

      Comment


      • #4
        Thanks for your replies. As for aims, the primary interest is in gene inventory / genomic potential, and we would be thrilled to be able to make a reasonable draft assembly of the two most common OTUs. At this point, assembly of the less-well-represented OTUs is not a priority.

        By my math, with the MiSeq run (in an ideal world), we should get ~ 13.75 gigabases of data, given 550 bp fragments and 25 million reads. A 100 bp SR lane on a miseq should yield about 18 gigabases.

        As such, given out budget constraint and the apparent benefit of longer reads, even at a hit of ~30% of total sequence data, I am planning on going ahead with the MiSeq 2x300. Billions of reads on any platform would be out of budget at this point.

        A new question:
        Which fragment size would be best for this project?


        I am not completely sure on the appropriate fragment size to use, to facilitate enough overlap to join the read pairs into mini-contigs, without sacrificing an excessive number of bases in the process. Is 550 bp a reasonable fragment size (size selected using a Pippin Prep) a good compromise? Or would a bit shorter be better, taking into account quality falloff towards the end of the reads? I do understand that some fraction of read pairs won't be able to be joined, given that the size selection is really the peak of a distribution, and read quality considerations.

        Thanks again!

        Comment


        • #5
          If you want to merge the reads, aim for a peak at closer to 450-500bp insert. The last 50bp of 2x300 Miseq can be very low quality. But merging does not always help; it depends on what assembler you use.

          If you choose to not merge the reads, and just use them as pairs, then the longer the better, for scaffolding! I'm told that Illumina's max for fragment libraries is around 800bp. So, I would aim for either 450-500bp or 800bp, and nothing else. Remember that your reads may end up as an effective 2x250bp after quality-trimming. Also remember that it's more likely for a library to come out unexpectedly short than unexpectedly long.

          P.S. A shorter insert size of, say, 450bp is not necessarily a bad thing even if your reads are high quality out to 300bp. The more they overlap, the more reads will successfully merge, and the more accurate the merges will be. So going from 550bp target to 500bp target, for example, will probably INCREASE the total number of bases in merged reads. Dropping to 450 may slightly decrease it but the increased accuracy may make that worthwhile. Particularly in low-complexity areas, you will not be able to merge reads successfully without a good amount of overlap. 270bp 2x150bp libraries, for example, only target a 30bp overlap, and often less than 40% of the reads get successfully merged.
          Last edited by Brian Bushnell; 05-12-2014, 03:35 PM. Reason: Changed suggestion from 500bp to 450-500bp.

          Comment


          • #6
            I will just add that if paired reads are not merging (longer distance, low quality at 3’ end of reads) it will be compensated by other overlapping reads as their 5’ end will cover that region considering the library fragmnets are random.

            Comment


            • #7
              Wow, thanks for all of the helpful replies!

              I have one further question - some of our samples are quite low concentration (as low as 10 ng/ul, in 50 ul (total of 0.5 ug). What would be the best approach for fragmentation, size selection, and library construction given intent to construct a 500 bp insert library for a MiSeq 2*300 run?

              The core facility has some doubts about making a Truseq DNA library w/ Blue Pippin automated size selection with less than 2-3 ug of starting material. Is there a better option I should be looking at? Perhaps a Truseq Nano kit? It's my understanding that the Nextera library prep cannot get as tight of fragment distribution as is possible with Truseq... Decisions, decisions.

              Thanks again, this really is a wonderful community.

              Comment


              • #8
                500 ng DNA is plenty of input for current library prep kits. There are two options here that you have mentioned both. Nextera requires only 50 ng input and it produces libraries from 150-1500 bp size distribution. The trick here would be to clean up that library with 0.5X beads which will cut at around 500 bp and given efficiency of smaller fragments during clustering, the average insert size would be around 550 bp spanning up to 950 bp inserts. Other option with Nextera would be to do few tagmentation reactions and pool followed by a precise size selection with a Pippin instrument to have a diverse library. This may require a bit of development as the tagmented fragments may not migrate like a dsDNA in gel because of single-strandedness at the ends of tagments. Other option would be using TruSeq Nano kit. DNA can be sheared to 500 bp with Covaris, size selected with Pippin instead of bead followed by the rest of Nano protocol. Assuming that 5-10% of input is collected by size selection, there will be 25-50 ng DNA to go through library prep and that is enough to produce a divers library.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                25 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                24 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X