Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • sequencing a sample containing ONLY sequence repeats

    What's the best NGS way to go (if even possible) to produce a precise sequencing report from a sample containing ONLY sequence repeats; various lengths of 3-nucl repeats (e.g. 90, 150, 270 bases etc) so all lengths are in multiples of 3. The goal is to get the distribution of each length in the assay. Is this a case for techs like Hellicos and PacBio, which use single molecule sequencing ? Or can Illumina tech help also?
    Thanks.

  • #2
    Helicos won't help; it has very short read lengths.

    I think PacBio is your best bet. It wouldn't work well for single nucleotide repeats, but if you have di/tri/tetra or higher I'm guessing it would work. 454 would be a candidate as well, though you would need to pay attention to any issues in the PCR step.

    Comment


    • #3
      If you are going to sequence "de novo" you will need to keep in mind PacBio's high error rate (10-15%).

      If you have reference sequence available then illumina would likely work as well.
      Last edited by GenoMax; 08-31-2011, 11:18 AM. Reason: add info

      Comment


      • #4
        Yeah, PacBio's high error rate + being based on assembling sub reads of the single molecule, makes it unfitting for this specific scenario because of the repeats. Regarding the 454, as krobison suggested, the PCR step can ruin the sample, cause the whole point is to have an accurate assessment of the distribution of each of the lengths.
        In a way, I'm starting to rethink the Illumnina option. Using a GAII Single reads I could get reliable reads upto 150 bases; maybe that's a partial solution.

        Comment


        • #5
          If you're only interested in the repeat number for a relatively small number of loci and samples, there's an old-fashioned technique called 'Sanger sequencing' :-).

          Seriously, unless the copy length is longer than the read length, Illumina sequencing should work. A couple of PCR-free protocols have been published by (IIRC) Wash U. and the Broad Institute, so you can avoid amplification problems.

          Comment


          • #6
            I still think PacBio would be an interesting experiment, though for any of these you'll want to have some reference sequences of known length & see how accurately you can measure them. If PacBio mostly drops out single nucleotides, then in a di or trinucleotide repeat array you could detect those & infer them -- if you know the array should be pure AT repeats and you see ATATATTATATAT, then it is reasonable to infer that one A was dropped. At some frequency in my scenario above the real answer will be a spurious T insertion, so my solution would cause a miscount by one repeat. For tri and higher, that shouldn't be a problem -- the probability of the correct two nucleotides being falsely called in a row is low.

            It is also a question of what precision do you require? If being off by one repeat unit isn't a problem then this approach would definitely work.

            Comment


            • #7
              Added info

              Added info:
              we can include some of the non-CAG flanking regions in both sides of the DNA segments.
              I'm not sure this can expand the solution spectrum. Maybe now GA2 PE can be more relevant ... and PacBio also. Because now we have flanking regions. but we still have the main CAG-repeats area which is the one we want to reliably account for.
              ------
              As for the Q if a +/- one triplet resolution acceptable. I am not sure. That would mean for ex that a few 270b segments are counted as 300b segments or v-versa. Maybe it's fine, but I'm not sure. I'd have to check.

              Comment


              • #8
                For accurate repeat counts, the read has to span the entire repeated sequence AND include the non-repeat flanking sequences on both sides. Paired-end reads won't help unless you know your insert size to base-pair resolution. Here's an example for clarity (assuming Illumina PE-151bp sequencing):

                Sample: 5'-10bp unique - 70 copies CAG - 10bp unique-3'

                Read 1 sequence: 5'-10bp unique-47 copies CAG
                Read 2 sequence: 3'-10bp unique-47 copies CAG

                Any CAG can align with any other, so the only information from this sequence is that you have at least 47 copies.

                Comment


                • #9
                  Yes indeed. as you said, it has to reliably span the entire sequence.
                  So, now I'm back to PacBio vs GA2 Single (but that's limited to 150b).
                  If I can get PacBio to run with very low error rate, maybe I'll get a reliable count.

                  Comment


                  • #10
                    PacBio is the way to go for trinculeotide repeats. As mentioned already, other methodologies will be limited by short readlength even if they can get past problems resulting from PCR amplifying your repeats.
                    The high single-read error rate is a genuine concern, but with your insert size you should be able to easily get multiple coverage with much higher quality in the single molecule consensus.
                    What is your goal? ie. how many high quality reads of your repeats would you consider a success?

                    Comment


                    • #11
                      Hi Loomis,
                      thanks for the reply. the goal is to get an exact measure of the proportion of each sequence length in the sample, which contains these tri-repeats segments ranging 200-500 bases, including non-repeats flanking regions.

                      Comment


                      • #12
                        I see. PacBio might not be able to answer the proportion question. The RS currently uses a passive loading approach for the SMRTcell, so smaller molecules have a competitive advantage.
                        In other words, you would have no problem sequencing 500 bp of repeats, but if you put in a sample containing 200-500 bp inserts, your distribution of reads would be skewed towards the smaller insert sizes.
                        Unfortunately I don't see a way to use current NGS platforms to answer your question...
                        Have you tried something like capillary electrophoresis, or PAGE? That could give you an accurate distribution in bp and then you could use PacBio sequencing to verify the exact repeat length of the major bands or sizing standards...

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Essential Discoveries and Tools in Epitranscriptomics
                          by seqadmin


                          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                          Yesterday, 07:01 AM
                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        37 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        41 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        35 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        54 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X