Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Illumina Paired End Reads General Question

    Hi Everyone,

    I'm curious as to how the illumina machine knows how to "pair" paired-end reads?

    I've taken a look here: http://seqanswers.com/forums/showthread.php?t=21
    and am still confused.

    I am clear how it would work at step 5. There are exactly 2 sequences there, with one the reverse of the other by construction. Thus when you start going down both sequences, it's like you're starting at opposite ends.

    What confuses me is at step 6, when you get clusters, how does the illumina machine know which two sequences are still "paired"? Or are the pairings so "dense" that the illumina machine cat always find the two strings in the pair?

    Apologies if this is a silly question!

  • #2
    The instrument keeps a record of the location of each cluster in an X Y grid in the .locs file. You can also get the position information of the cluster in the header line of each read in the fastq file.
    @M00677:25:000000000-A36WV:1:1101:12297:2478 2:N:0:1
    The clusters don't really move very much during the read 2 turnaround that would cause it to get lost.

    Comment


    • #3
      Hm, perhaps I should of been more careful when I said cluster.

      Let's suppose at the moment, we are in step 5, so there are only two strings which are at (x1,y1) and (x2,y2). Does the machine already know that these strings are at these positions before the next round of denaturation, binding to the flow cell and building of the bridges? If so, then yes it can keep track of them and during each round it builds it's positional database. But wouldn't that mean that somehow the two strings have to be closer together than a randomly new string of dna that might attach nearby

      Thanks again!

      Comment


      • #4
        Read 1 and read 2 are two completely separate reads. During cluster generation, all of the read 2 strands are stripped off of the flow cell. Therefore, all sequences in the cluster are single stranded reverse complement of the read 1 sequence. After the first read is complete, the read 2 strand is re-synthesized and the read 1 strands are stripped off, leaving only the reverse complement of the read 2 strand for sequencing.

        EDIT: I guess I'm still not answering your questions as clearly. Sequencing does not occur at cluster generation. Sequencing begins once all of the clusters are already generated. Since they are attached to the flow cell, they do not move around anymore during the sequencing phase.
        Last edited by kcchan; 02-28-2013, 03:58 PM.

        Comment


        • #5
          [Removed -- No longer relevant, explanation below].
          Last edited by skiguy; 02-28-2013, 05:39 PM.

          Comment


          • #6
            Wow, I'm thoroughly confused by what you're trying to explain. But I think where you're getting confused is the clusters generated during read 2. In this step, no new template is being added. The complementary sequences are generated from the strands that are already in the cluster. There are chemical modifications in the flow cell that prevent the cluster from growing like during first phase as only read 2 sequences can be generated (the new read 2 strands cannot go back and generate a new read 1 strand and grow the cluster). This keeps the positions of the clusters intact so that it can be tracked by the instrument.

            Comment


            • #7
              Apologies for the confusion! Perhaps it's the use of the word "cluster"? During the first phase, all the read 1 strands are grown (is this what you are calling a cluster?). During the second phase all the read 2 strands are grown. How does it know which strand from phase 1 to match up from phase 2 -- are the phase1/phase2 strands always close enough together?

              Comment


              • #8
                During cluster generation, single stranded template is added to the flow cell and binds randomly somewhere on the flow cell. The strand is then amplified using bridge PCR, in which both read 1 and read 2 strands will be made. The amplified strands form clusters and all contain sequences identical or reverse complement of the original template sequence. Just before sequencing, all read 2 strands are washed away by using a chemical reaction which cleaves the read 2 end of the adapter at the base of the flow cell. As a result, the only strands in the template cluster which remain are identical copies of the read 1 template.

                Sequencing primers are then added to the template and read 1 sequencing begins.

                After read 1 sequencing, the bases that were added during sequencing is denatured and removed, leaving the flow cell looking something like it did just before sequencing (with clusters of read 1 template). A chemical reaction then occurs to reverse the modification which cleaved the read 2 sequences. The read 1 template strands are then used to re-generate the read 2 sequences through a process similar to the initial cluster generation.

                After a few cycles of amplification, the flow cell would look similar to how it did right after cluster generation, with both read 1 and read 2 strands present. This time, however, a chemical reaction cleaves the read 1 sequences at the base of the flow cell and all read 1 templates are washed away. The remaining read 2 sequences then go through sequencing just like the first read.
                Last edited by kcchan; 02-28-2013, 05:29 PM.

                Comment


                • #9
                  Ah! I see what you are saying now! Thank you!

                  Within each cluster, we are looking at the exact same piece of DNA.

                  My initial concern was what if for some "cluster", there were several different pieces of DNA that just binded near each other on the flow cell. But I imagine then, that when they take the photos, the color would be a jumble and that point on the flow cell would not be sequenced. Thus the ONLY spots that are sequenced are clusters that are made up of the exact same strand.

                  Comment


                  • #10
                    Correct. One cluster corresponds to an identical group of sequences that correspond to a single strand of library that was initially generated. During the initial input, the DNA is added in very low concentrations; usually less than 10picomolar. This ensures that the DNA that binds to the flow cell are spread far apart from each other and do not inhibit one another during cluster generation.

                    If by chance two clusters somehow converge upon one another, it will be detected by the instrument because there will be two intense signals at that cluster for each cycle. When this happens, the software will mark that cluster and filter it out from the final data.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin


                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                      Yesterday, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    39 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    41 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    35 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    55 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X