Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • what is a paired-end read?

    When I read papers, I find paired -end read and single-end reads are mentioned many times. But what is a paired-end read? I am not very clearly.
    just like:1 1 119 395 GAAGAGGAGATAAATAAAACTCAAAATACAGCTGAA
    1 1 852 893 GTTATTAATATTATTGATGTATTCATCTTTTCTTTT
    1 1 814 900 GTTAAAGCATTAAGAAAAGATGTACTTGCAAAATGC
    1 1 241 454 GGTGGAAGAGATGTCATTGGAGAAGCCCAAACAGGT
    1 1 759 899 GTGTGCTTTTTGAATGAGTAGGTATTGTAATTAGCT
    1 1 123 438 GAAAGCCAAACTTTTCATAAAAGCCTTCCTTGCCAT
    which are generated by Solexa. Are They paired-end reads?
    Thanks

  • #2
    The term 'paired ends' refers to the two ends of the same DNA molecule. So you can sequence one end, then turn it around and sequence the other end. The two sequences you get are 'paired end reads'. Sometimes they're called 'mate pairs' (but with Illumina technology, I think what they call 'mate pair' and 'paired end' methodology is different). Is that what you want to know?

    Comment


    • #3
      Originally posted by ScottC View Post
      The term 'paired ends' refers to the two ends of the same DNA molecule. So you can sequence one end, then turn it around and sequence the other end. The two sequences you get are 'paired end reads'. Sometimes they're called 'mate pairs' (but with Illumina technology, I think what they call 'mate pair' and 'paired end' methodology is different). Is that what you want to know?
      Thank you. IF two reads are paired ends,will one read be the complementary read of the other one? In SSAKE's readme, it says TGGCTCACCCCTGTAATCCCAGCACT:CTCCCAGGTTCAAGCGATTCTCCTGC consists of two paired reads. but i can't find some relation of this paired reads.
      Last edited by biocc; 08-20-2008, 06:54 PM.

      Comment


      • #4
        I hope I understand what you're asking, and that my answers are not too basic...

        No, the reads won't be complementary unless you're sequencing very short molecules so that a read from each end simply sequences the other strand. Generally, though, the molecule is longer, so you get the read from one end of the molecule and the read from the other end on the other strand. You don't know what the sequence is in the central section of the molecule because the reads are not long enough to span all the way across the molecule. So basically, you have no way of knowing, just by looking at two sequences, whether they're pairs or not.

        Comment


        • #5
          Paired end (mate pair) sequencing explanation

          biocc,

          "paired end" or "mate pair" refers to how the library is made, and then how it is sequenced. Both are methodologies that, in addition to the sequence information, give you information about the physical distance between the two reads in your genome.

          For example, you shear up some genomic DNA, and cut a region out at ~500bp. Then you prepare your library, and sequence 35bp from each end of each molecule. Now you have three pieces of information:

          --the tag 1 sequence
          --the tag 2 sequence
          --that they were 500bp ± (some) apart in your genome

          This gives you the ability to map to a reference (or denovo for that matter) using that distance information. It helps dramatically to resolve larger structural rearrangements (insertions, deletions, inversions), as well as helping to assemble across repetitive regions.

          Structural rearrangements can be deduced when your read pairs map to a reference at a distance that is substantially different from how that library was constructed (~500bp in the above example). Let's say you had two reads that mapped to your reference 1000bp apart...this suggests there has been a deletion between those two sequence reads within your genome. Same thing with an insertion, if your reads mapped 100bp apart on the reference, this suggests that your genome has an insertion.

          Mapping over repeats is similar...if one read is unmappable because it falls in a very repetitive region (eg. LINE, LTR, SINE), but the other is unique, you can again use that distance information to map both reads. The first read would likely come from the repeat that is ~500bp away from your unique second read.

          Hope that helps. It's a weird concept at first, but very useful for all types of sequencing. It's been around at some levels since the days of shotgun sequencing.

          And lastly, the terminology between "paired end" and "mate pair" is typically that "paired end" refers to sequencing both ends of the same molecule, while "mate pair" (in ABI's case) refers to sequencing only two tags (made by Type IIS restriction enzymes a la SAGE) from the ends of a typically much larger molecule. I could be wrong here though...

          Comment


          • #6
            Originally posted by ECO View Post
            Structural rearrangements can be deduced when your read pairs map to a reference at a distance that is substantially different from how that library was constructed (~500bp in the above example). Let's say you had two reads that mapped to your reference 1000bp apart...this suggests there has been a deletion between those two sequence reads within your genome. Same thing with an insertion, if your reads mapped 100bp apart on the reference, this suggests that your genome has an insertion.
            Browsing through the old posts and found this quite useful. But, isn't it a deletion when the distance is 100bp and a insertion if the reads are 1kb away?

            Comment


            • #7
              Browsing through the old posts and found this quite useful. But, isn't it a deletion when the distance is 100bp and a insertion if the reads are 1kb away?
              No. The original message was correct. Your confusion may be with which genome is seeing the insertion or deletion. Let me try to explain it.

              The reads, which come from your sequence are ~500 bases apart. They are always 500 bases apart. That is a biological fact, assuming that you did the laboratory work correctly.

              If you map your reads onto the reference genome and find that they are ~500 bases apart then you know that there is no insertion or deletion -- or at least no single indel event.

              If you map your reads on the reference and find that they are 100 bases apart then you have to think -- how did those 100 bases now become the biological 500 bp reads? Either your genome had a insertion compared to the reference. Or the reference had an deletion compared to your sequence. The original post said:
              Same thing with an insertion, if your reads mapped 100bp apart on the reference, this suggests that your genome has an insertion.
              Which is just the same as I wrote.

              As I said in my first paragraph, the confusion may be arising from which genome you are talking about. As I was writing this message my mind kept flipping back and forth between the genomes. Usually what I want to know about is my genome -- did it have an insertion or deletion. But the mapping is done to the reference genome and it there that we find smaller or larger pairing distances ... these are inverse of what biologically happened to my sequence.

              Comment


              • #8
                I get it. Thanks

                Comment


                • #9
                  What is the alignment difference between a single end and paired end read?

                  Hi, I'm currently working on a alignment program (bwa). And to verify functionality, I need to run tests with paired end reads. I know how paired end reads are made, but how would you make a sample paired end read from a reference genome? For a single end read I just take any random 35 bp sequence, but what do i do for a paired end read?
                  Thanks,
                  Matt

                  Comment


                  • #10
                    Hi mrwong05,

                    Synthetic read generator would be a very useful tool so i'll try and describe what we see with real world samples as best i can (and hopefully not air too much of our dirty laundry in public).

                    First some homemade terminology so you know what i mean, a paired-end run consists of two reads, 1 and its partner 2, and an unsequenced linker in the middle L. The read distance is 1+L+2.

                    When we do 2x 50 bp paired-end runs on a GAIIx using the current gel purification step we get read distances of between that vary by about 100 bp in a nice tight bell shaped curve starting between 160-200 bp. So the first thing to bear in mind is that L is not fixed within or between runs. Either way this group accounts for >99.99% of paired-end reads in an assembly. Because of the way fragments are generated for sequencing 1 and 2 can align either F-B of B-F.

                    If you want to be more realistic there are always a tiny proportion of reads <0.1% that align with much longer read distances, some of which is due to bioinformatics but some of which is real and simply reflects biology. Likewise a tiny proportion of reads at all read distances will be F-F or B-B. Also there appear to often be a tiny proportion of reads that come out overlapping where the read distance is the same as a read length ie 1+L+2 is, in this case, ~50-100. I have no idea of the prevelance of such reads but you can often find them if you look. Lastly if its not going to be part of the assembler, end trimming and quality trimming can often mean that 1 and 2 are different lengths and that a substantial number of reads from a paired end run end up with no partner at all.

                    I hope this is helpful. Please let me know how you get on with the read generator I would be very interested in using it to verify our sample analysis.

                    The_Roads

                    Comment


                    • #11
                      Originally posted by The_Roads View Post
                      Hi mrwong05,

                      Synthetic read generator would be a very useful tool so i'll try and describe what we see with real world samples as best i can (and hopefully not air too much of our dirty laundry in public).

                      First some homemade terminology so you know what i mean, a paired-end run consists of two reads, 1 and its partner 2, and an unsequenced linker in the middle L. The read distance is 1+L+2.

                      When we do 2x 50 bp paired-end runs on a GAIIx using the current gel purification step we get read distances of between that vary by about 100 bp in a nice tight bell shaped curve starting between 160-200 bp. So the first thing to bear in mind is that L is not fixed within or between runs. Either way this group accounts for >99.99% of paired-end reads in an assembly. Because of the way fragments are generated for sequencing 1 and 2 can align either F-B of B-F.

                      If you want to be more realistic there are always a tiny proportion of reads <0.1% that align with much longer read distances, some of which is due to bioinformatics but some of which is real and simply reflects biology. Likewise a tiny proportion of reads at all read distances will be F-F or B-B. Also there appear to often be a tiny proportion of reads that come out overlapping where the read distance is the same as a read length ie 1+L+2 is, in this case, ~50-100. I have no idea of the prevelance of such reads but you can often find them if you look. Lastly if its not going to be part of the assembler, end trimming and quality trimming can often mean that 1 and 2 are different lengths and that a substantial number of reads from a paired end run end up with no partner at all.

                      I hope this is helpful. Please let me know how you get on with the read generator I would be very interested in using it to verify our sample analysis.

                      The_Roads
                      For example,b oth BFAST and MAQ have read generators, with BFAST having a paired end read generator for ABI and SOLiD data. Most aligner authors have their own read generators to validate and benchmark their aligners.

                      Comment


                      • #12
                        Thanks nilshomer, don't ask don't get, I should have come to SEQanswers earlier.

                        Anyone know of any other paired-end read generators?

                        Are there any with which you can model read errors, duplicate removal etc? or is this getting beyond their function.

                        Comment


                        • #13
                          Originally posted by The_Roads View Post
                          Thanks nilshomer, don't ask don't get, I should have come to SEQanswers earlier.

                          Anyone know of any other paired-end read generators?

                          Are there any with which you can model read errors, duplicate removal etc? or is this getting beyond their function.
                          Both come freely in the BFAST or MAQ distribution (I haven't checked other aligners).

                          I know the one I wrote (BFAST) models read errors both for Illumina or SOLiD, as well as SNPs and indels.

                          Why do you worry about duplicate removal? This can be frequent in practice in some cases.

                          Comment


                          • #14
                            I am trying to quantify rare variants in deep coverage of small templates. I am not a statistician/bioinformatics pro but as far as i can see duplicate removal will introduce a bias that will enrich for rare variants both real and introduced.

                            Aside from library prep and pipeline issues which introduce their own biases, are there any assemblers that are designed for this type of assembly as opposed to large ref seq low coverage (<100x) assemblies?

                            Comment


                            • #15
                              Originally posted by The_Roads View Post
                              I am trying to quantify rare variants in deep coverage of small templates. I am not a statistician/bioinformatics pro but as far as i can see duplicate removal will introduce a bias that will enrich for rare variants both real and introduced.

                              Aside from library prep and pipeline issues which introduce their own biases, are there any assemblers that are designed for this type of assembly as opposed to large ref seq low coverage (<100x) assemblies?
                              Have you tried Velvet or Abyss? You can give either program the expected coverage and they will will work fine in my experiences.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM
                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin



                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-14-2024, 06:13 AM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-08-2024, 08:03 AM
                              0 responses
                              71 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-07-2024, 08:13 AM
                              0 responses
                              80 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-06-2024, 09:51 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X