Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by cement_head View Post
    PacBio's quality is atrocious - maybe worse than Ion Torrent
    The idea with synthetic long reads is to fill the genome gaps in de novo assemblies which mostly results from presence of repeated sequences throughout the genome. In this case using long PacBio reads for hybrid assembly would be advantageous because the reads would pass through such regions and sequencing errors can be corrected by good quality short reads. It does not seem that PacBio quality to be so bad either, as there are lots of publications that have used PacBio only reads for good bacterial genome assemblies.
    Last edited by nucacidhunter; 09-04-2014, 08:23 AM.

    Comment


    • #17
      Comparative numbers for pricing

      Here is what I was able to come up with, someone please correct me if I am off.

      Illumina is saying that the process takes in 30-100 Gb of short reads and puts out at least 600 Mb of long reads. If I'm reading it right reagents are around $1000.

      [URL="http://omicsomics.blogspot.com/2013/03/pacbio-back-of-envelope-numbers.html"This post[/URL] says 100 Mb per PacBio SMRT cell.

      So the numbers don't seem that far apart to me. I think we are paying for a lane of HiSeq about 3-4 times what I have seen for PacBio (We haven't used it yet).

      So I am not convinced cost is a factor, possible PCR bias might push one towards PacBio while error rate would push for Illumina (0.01% for the long reads).

      Comment


      • #18
        Originally posted by cement_head View Post
        PacBio's quality is atrocious - maybe worse than Ion Torrent
        I think if you were to look only at individual reads from a PacBio run you might come up with the assessment of 'atrocious'. A 15% error rate will do that! However, as the error model appears to be almost perfectly stochastic, aligning all the reads to together can yield a very low error rate consensus sequence.

        In general, the community seems to have come around to PacBio's way of thinking - given enough reads, PacBio produces better data. The problem they have is cost - it's a very expensive machine and has a much higher "per Gb" cost (not perhaps the best measure as you could argue 1Gb of PacBio sequence is 'worth more' than 1Gb of HiSeq sequence - but the raw difference is large enough that it still matters).
        AllSeq - The Sequencing Marketplace
        [email protected]
        www.AllSeq.com

        Comment


        • #19
          Moleculo and PacBio reads are not really comparable for assembly. PacBio will allow you to sequence and span repetitive areas; Moleculo won't, because the reads from such areas will get misassembled with collapsed repeats.

          For other purposes, like phasing and looking for structural variations, they are probably similarly useful. Moleculo may in fact be better for phasing due to the lower error rate.

          Comment


          • #20
            Originally posted by AllSeq View Post
            The problem they have is cost - it's a very expensive machine and has a much higher "per Gb" cost (not perhaps the best measure as you could argue 1Gb of PacBio sequence is 'worth more' than 1Gb of HiSeq sequence - but the raw difference is large enough that it still matters).
            Right. For some applications (pulling together contigs, say) a single PacBio long read is worth a lot more. But for others, you may need 10 Gb of PacBio to compare to 1 Gb of HiSeq, since you need to collapse the PacBio reads to get a consensus with similar quality.
            Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

            Comment


            • #21
              Recently published paper http://www.plosone.org/article/info:...l.pone.0106689 which is modified version of earlier publication http://biorxiv.org/content/biorxiv/e...01834.full.pdf , co-authored by the kit supplier employee has some interesting and somehow contradictory points such as:

              "1- Our study demonstrates that TruSeq synthetic long-reads enable accurate assembly of complex, highly-repetitive TE sequences.

              2- We also observed relatively uniform coverage across both the euchromatic and heterochromatic portions of the autosomes, with an expected reduced coverage of the heterochromatin. This observation is explained by the fact that heterochromatin is generally more repetitive and therefore more difficult to assemble into synthetic long-reads from underlying short read data.

              3- Despite general uniformity in synthetic long-read coverage, we identified important biases resulting in coverage gaps and reductions in synthetic long-read coverage in repeat-dense regions with relatively low average GC content."

              And it seems that they point to the fact that there might be better options:
              "4- By directly sequencing long molecules, these third-generation technologies will likely outperform TruSeq synthetic long-reads in certain capacities, such as assembly contiguity enabled by homogeneous genome coverage. Indeed, preliminary results from the assembly of a different y; cn, bw, sp substrain of D. melanogaster using corrected PacBio data achieved an N50 contig length of 15.3 Mbp and closed two of the remaining gaps in the euchromatin of the Release 5 reference sequence [63]. While not yet systematically assessed, it is likely that PacBio long-reads will also help resolve high-identity repeats, though current raw error rates may be limiting."

              Moleculo was very interesting when it was revealed, but technology has moved on and other platforms can outperform it in read length, sequencing highly repetitive region and cost, though for phasing as Brian has mentioned it seems to be a good option.
              Last edited by nucacidhunter; 09-05-2014, 03:00 AM. Reason: Spelling correction

              Comment


              • #22
                As to cost, Illumina is advertising $1000 human genomes at 30X coverage. The first human genome sequenced by PacBio at 54X coverage costs around $60k-80k. The price difference is still considerable. Nonetheless, talking about long reads, we should compare PacBio to moleculo reads, not short reads. Moleculo reads are probably much more costly than short reads as you need some coverage to achieve local assembly.

                Comment


                • #23
                  Originally posted by SNPsaurus View Post
                  Right. For some applications (pulling together contigs, say) a single PacBio long read is worth a lot more. But for others, you may need 10 Gb of PacBio to compare to 1 Gb of HiSeq, since you need to collapse the PacBio reads to get a consensus with similar quality.
                  True, but I'm not sure the ratio is 10:1. In fact, I'd claim that the ratio gets to 1:1 pretty quickly. I'm not sure where the cutover point is, but wouldn't you rather have 30X PacBio coverage than 30X HiSeq coverage?
                  AllSeq - The Sequencing Marketplace
                  [email protected]
                  www.AllSeq.com

                  Comment


                  • #24
                    Originally posted by Brian Bushnell View Post
                    Moleculo and PacBio reads are not really comparable for assembly. PacBio will allow you to sequence and span repetitive areas; Moleculo won't, because the reads from such areas will get misassembled with collapsed repeats.

                    For other purposes, like phasing and looking for structural variations, they are probably similarly useful. Moleculo may in fact be better for phasing due to the lower error rate.
                    This may depend somewhat on the characteristics of the repeat, in fact the main point of that Drosophila paper I referenced above is success assembling repetitive transposable elements.

                    Comment


                    • #25
                      I think it is useful to focus on what happens during TruSeq Synthetic read library construction. Basically 2 steps:
                      (1) Fragment genome to ~10kb chunks and attach special adapters to these. Assay very accurately the numbers of these amplicons and dilute them so that about 300 can be put into each well of a 384-well plate.
                      (2) Do long-range PCR on these and then create normal Illumina libs using Nextera on the amplified DNA.

                      So then you can see that each well becomes a separate assembly of ~300 ten kb fragments. So 3 megabases (in 10 kb segments). At what coverage? I'm guessing the intention here is that each 384-well plate of libraries gets put into a Rapid chemistry flowcell? So that would be about 50 billion bases of raw sequence? If all went well you ended up with 3 megabases/well of fragments. 3 x 384= 1152 megabases. So, nearly 50x coverage on your ~1.1 billion bases of fragments.

                      How about repetitive DNA? The trick here is that you have reduced the portion of your genome represented in each well down to 3 million bases in ~10kb fragments. If your organism has a 3 billion base genome, then each well represents a random 0.1% sampling of that genome. A short, repetitive segment of that genome could still cause assembly issues if you got 2 or more of them in a single well. For highly repetitive elements (>1000 copies/genome) you can still have problems. But, realistically, they are going to be much less than you would see in a full genome shotgun situation.

                      But you could just dilute your initial library that you intend to long-range-PCR amplify down farther -- say to 30 copies/well. Then you could avoid repetitive DNA issues better. Of course then your coverage in 10kb reads drops 10X.

                      About bias: this should be an issue. You have two processes, long range PCR and Nextera library construction, that one would expect some bias from. So chaining them together will probably produce some bias.

                      Another way to go would be to just skip the long range PCR stuff and add about 1-10ng of BAC DNA for your species of interest (if available) to each of the wells and only do the Nextera library construction part. Maybe with a little tinkering you could get the insert sizes of the Nextera libs to be big enough to benefit from a MiSeq600 cycle run? You could do that for less than $5K, probably. Not too shabby. You could get a whole plate of BACs sequenced for less than $15 each. At least the shotgun phase.

                      --
                      Phillip

                      Comment


                      • #26
                        Originally posted by cliffbeall View Post
                        This may depend somewhat on the characteristics of the repeat, in fact the main point of that Drosophila paper I referenced above is success assembling repetitive transposable elements.
                        I think authors negate the title of the paper which I have quoted few of them in my previous post. One also should bear in mind that Drosophila genome is relatively small and repetitive regions are shorter than some other genomes such as plants. For example, try assembling or mapping rice data set from BaseSpace to see how weak it is when it comes to representing repeat regions.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Essential Discoveries and Tools in Epitranscriptomics
                          by seqadmin




                          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                          Yesterday, 07:01 AM
                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        58 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        53 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        45 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        55 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X