SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Average Read Coverage for 454 paired end read data lisa1102 Core Facilities 8 10-18-2011 08:40 AM
Paired end Short read data SS1234 Bioinformatics 6 06-09-2010 01:16 PM
help! what is a paired-end read? hitdavid Bioinformatics 1 01-14-2010 11:42 AM
Difference in paired-end and single-end read ? darshan Bioinformatics 1 09-30-2009 11:44 PM

Reply
 
Thread Tools
Old 08-20-2008, 12:24 AM   #1
biocc
Member
 
Location: beijing

Join Date: Jul 2008
Posts: 35
Default what is a paired-end read?

When I read papers, I find paired -end read and single-end reads are mentioned many times. But what is a paired-end read? I am not very clearly.
just like:1 1 119 395 GAAGAGGAGATAAATAAAACTCAAAATACAGCTGAA
1 1 852 893 GTTATTAATATTATTGATGTATTCATCTTTTCTTTT
1 1 814 900 GTTAAAGCATTAAGAAAAGATGTACTTGCAAAATGC
1 1 241 454 GGTGGAAGAGATGTCATTGGAGAAGCCCAAACAGGT
1 1 759 899 GTGTGCTTTTTGAATGAGTAGGTATTGTAATTAGCT
1 1 123 438 GAAAGCCAAACTTTTCATAAAAGCCTTCCTTGCCAT
which are generated by Solexa. Are They paired-end reads?
Thanks
biocc is offline   Reply With Quote
Old 08-20-2008, 05:08 PM   #2
ScottC
Senior Member
 
Location: Monash University, Melbourne, Australia.

Join Date: Jan 2008
Posts: 219
Default

The term 'paired ends' refers to the two ends of the same DNA molecule. So you can sequence one end, then turn it around and sequence the other end. The two sequences you get are 'paired end reads'. Sometimes they're called 'mate pairs' (but with Illumina technology, I think what they call 'mate pair' and 'paired end' methodology is different). Is that what you want to know?
ScottC is offline   Reply With Quote
Old 08-20-2008, 06:41 PM   #3
biocc
Member
 
Location: beijing

Join Date: Jul 2008
Posts: 35
Default

Quote:
Originally Posted by ScottC View Post
The term 'paired ends' refers to the two ends of the same DNA molecule. So you can sequence one end, then turn it around and sequence the other end. The two sequences you get are 'paired end reads'. Sometimes they're called 'mate pairs' (but with Illumina technology, I think what they call 'mate pair' and 'paired end' methodology is different). Is that what you want to know?
Thank you. IF two reads are paired ends,will one read be the complementary read of the other one? In SSAKE's readme, it says TGGCTCACCCCTGTAATCCCAGCACT:CTCCCAGGTTCAAGCGATTCTCCTGC consists of two paired reads. but i can't find some relation of this paired reads.

Last edited by biocc; 08-20-2008 at 06:54 PM.
biocc is offline   Reply With Quote
Old 08-20-2008, 09:40 PM   #4
ScottC
Senior Member
 
Location: Monash University, Melbourne, Australia.

Join Date: Jan 2008
Posts: 219
Default

I hope I understand what you're asking, and that my answers are not too basic...

No, the reads won't be complementary unless you're sequencing very short molecules so that a read from each end simply sequences the other strand. Generally, though, the molecule is longer, so you get the read from one end of the molecule and the read from the other end on the other strand. You don't know what the sequence is in the central section of the molecule because the reads are not long enough to span all the way across the molecule. So basically, you have no way of knowing, just by looking at two sequences, whether they're pairs or not.
ScottC is offline   Reply With Quote
Old 08-20-2008, 11:11 PM   #5
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,287
Default Paired end (mate pair) sequencing explanation

biocc,

"paired end" or "mate pair" refers to how the library is made, and then how it is sequenced. Both are methodologies that, in addition to the sequence information, give you information about the physical distance between the two reads in your genome.

For example, you shear up some genomic DNA, and cut a region out at ~500bp. Then you prepare your library, and sequence 35bp from each end of each molecule. Now you have three pieces of information:

--the tag 1 sequence
--the tag 2 sequence
--that they were 500bp (some) apart in your genome

This gives you the ability to map to a reference (or denovo for that matter) using that distance information. It helps dramatically to resolve larger structural rearrangements (insertions, deletions, inversions), as well as helping to assemble across repetitive regions.

Structural rearrangements can be deduced when your read pairs map to a reference at a distance that is substantially different from how that library was constructed (~500bp in the above example). Let's say you had two reads that mapped to your reference 1000bp apart...this suggests there has been a deletion between those two sequence reads within your genome. Same thing with an insertion, if your reads mapped 100bp apart on the reference, this suggests that your genome has an insertion.

Mapping over repeats is similar...if one read is unmappable because it falls in a very repetitive region (eg. LINE, LTR, SINE), but the other is unique, you can again use that distance information to map both reads. The first read would likely come from the repeat that is ~500bp away from your unique second read.

Hope that helps. It's a weird concept at first, but very useful for all types of sequencing. It's been around at some levels since the days of shotgun sequencing.

And lastly, the terminology between "paired end" and "mate pair" is typically that "paired end" refers to sequencing both ends of the same molecule, while "mate pair" (in ABI's case) refers to sequencing only two tags (made by Type IIS restriction enzymes a la SAGE) from the ends of a typically much larger molecule. I could be wrong here though...
ECO is offline   Reply With Quote
Old 06-09-2009, 05:13 AM   #6
Melissa
Senior Member
 
Location: Canada

Join Date: Aug 2008
Posts: 101
Default

Quote:
Originally Posted by ECO View Post
Structural rearrangements can be deduced when your read pairs map to a reference at a distance that is substantially different from how that library was constructed (~500bp in the above example). Let's say you had two reads that mapped to your reference 1000bp apart...this suggests there has been a deletion between those two sequence reads within your genome. Same thing with an insertion, if your reads mapped 100bp apart on the reference, this suggests that your genome has an insertion.
Browsing through the old posts and found this quite useful. But, isn't it a deletion when the distance is 100bp and a insertion if the reads are 1kb away?
Melissa is offline   Reply With Quote
Old 06-09-2009, 06:03 AM   #7
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 886
Default

Quote:
Browsing through the old posts and found this quite useful. But, isn't it a deletion when the distance is 100bp and a insertion if the reads are 1kb away?
No. The original message was correct. Your confusion may be with which genome is seeing the insertion or deletion. Let me try to explain it.

The reads, which come from your sequence are ~500 bases apart. They are always 500 bases apart. That is a biological fact, assuming that you did the laboratory work correctly.

If you map your reads onto the reference genome and find that they are ~500 bases apart then you know that there is no insertion or deletion -- or at least no single indel event.

If you map your reads on the reference and find that they are 100 bases apart then you have to think -- how did those 100 bases now become the biological 500 bp reads? Either your genome had a insertion compared to the reference. Or the reference had an deletion compared to your sequence. The original post said:
Quote:
Same thing with an insertion, if your reads mapped 100bp apart on the reference, this suggests that your genome has an insertion.
Which is just the same as I wrote.

As I said in my first paragraph, the confusion may be arising from which genome you are talking about. As I was writing this message my mind kept flipping back and forth between the genomes. Usually what I want to know about is my genome -- did it have an insertion or deletion. But the mapping is done to the reference genome and it there that we find smaller or larger pairing distances ... these are inverse of what biologically happened to my sequence.
westerman is offline   Reply With Quote
Old 06-09-2009, 06:11 AM   #8
Melissa
Senior Member
 
Location: Canada

Join Date: Aug 2008
Posts: 101
Default

I get it. Thanks
Melissa is offline   Reply With Quote
Old 07-13-2009, 12:27 PM   #9
mrwong05
Junior Member
 
Location: San Diego

Join Date: Jul 2009
Posts: 1
Default What is the alignment difference between a single end and paired end read?

Hi, I'm currently working on a alignment program (bwa). And to verify functionality, I need to run tests with paired end reads. I know how paired end reads are made, but how would you make a sample paired end read from a reference genome? For a single end read I just take any random 35 bp sequence, but what do i do for a paired end read?
Thanks,
Matt
mrwong05 is offline   Reply With Quote
Old 07-15-2009, 12:20 PM   #10
The_Roads
Member
 
Location: USA

Join Date: May 2009
Posts: 32
Default

Hi mrwong05,

Synthetic read generator would be a very useful tool so i'll try and describe what we see with real world samples as best i can (and hopefully not air too much of our dirty laundry in public).

First some homemade terminology so you know what i mean, a paired-end run consists of two reads, 1 and its partner 2, and an unsequenced linker in the middle L. The read distance is 1+L+2.

When we do 2x 50 bp paired-end runs on a GAIIx using the current gel purification step we get read distances of between that vary by about 100 bp in a nice tight bell shaped curve starting between 160-200 bp. So the first thing to bear in mind is that L is not fixed within or between runs. Either way this group accounts for >99.99% of paired-end reads in an assembly. Because of the way fragments are generated for sequencing 1 and 2 can align either F-B of B-F.

If you want to be more realistic there are always a tiny proportion of reads <0.1% that align with much longer read distances, some of which is due to bioinformatics but some of which is real and simply reflects biology. Likewise a tiny proportion of reads at all read distances will be F-F or B-B. Also there appear to often be a tiny proportion of reads that come out overlapping where the read distance is the same as a read length ie 1+L+2 is, in this case, ~50-100. I have no idea of the prevelance of such reads but you can often find them if you look. Lastly if its not going to be part of the assembler, end trimming and quality trimming can often mean that 1 and 2 are different lengths and that a substantial number of reads from a paired end run end up with no partner at all.

I hope this is helpful. Please let me know how you get on with the read generator I would be very interested in using it to verify our sample analysis.

The_Roads
The_Roads is offline   Reply With Quote
Old 07-15-2009, 01:28 PM   #11
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,283
Default

Quote:
Originally Posted by The_Roads View Post
Hi mrwong05,

Synthetic read generator would be a very useful tool so i'll try and describe what we see with real world samples as best i can (and hopefully not air too much of our dirty laundry in public).

First some homemade terminology so you know what i mean, a paired-end run consists of two reads, 1 and its partner 2, and an unsequenced linker in the middle L. The read distance is 1+L+2.

When we do 2x 50 bp paired-end runs on a GAIIx using the current gel purification step we get read distances of between that vary by about 100 bp in a nice tight bell shaped curve starting between 160-200 bp. So the first thing to bear in mind is that L is not fixed within or between runs. Either way this group accounts for >99.99% of paired-end reads in an assembly. Because of the way fragments are generated for sequencing 1 and 2 can align either F-B of B-F.

If you want to be more realistic there are always a tiny proportion of reads <0.1% that align with much longer read distances, some of which is due to bioinformatics but some of which is real and simply reflects biology. Likewise a tiny proportion of reads at all read distances will be F-F or B-B. Also there appear to often be a tiny proportion of reads that come out overlapping where the read distance is the same as a read length ie 1+L+2 is, in this case, ~50-100. I have no idea of the prevelance of such reads but you can often find them if you look. Lastly if its not going to be part of the assembler, end trimming and quality trimming can often mean that 1 and 2 are different lengths and that a substantial number of reads from a paired end run end up with no partner at all.

I hope this is helpful. Please let me know how you get on with the read generator I would be very interested in using it to verify our sample analysis.

The_Roads
For example,b oth BFAST and MAQ have read generators, with BFAST having a paired end read generator for ABI and SOLiD data. Most aligner authors have their own read generators to validate and benchmark their aligners.
nilshomer is offline   Reply With Quote
Old 07-15-2009, 02:43 PM   #12
The_Roads
Member
 
Location: USA

Join Date: May 2009
Posts: 32
Default

Thanks nilshomer, don't ask don't get, I should have come to SEQanswers earlier.

Anyone know of any other paired-end read generators?

Are there any with which you can model read errors, duplicate removal etc? or is this getting beyond their function.
The_Roads is offline   Reply With Quote
Old 07-15-2009, 03:56 PM   #13
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,283
Default

Quote:
Originally Posted by The_Roads View Post
Thanks nilshomer, don't ask don't get, I should have come to SEQanswers earlier.

Anyone know of any other paired-end read generators?

Are there any with which you can model read errors, duplicate removal etc? or is this getting beyond their function.
Both come freely in the BFAST or MAQ distribution (I haven't checked other aligners).

I know the one I wrote (BFAST) models read errors both for Illumina or SOLiD, as well as SNPs and indels.

Why do you worry about duplicate removal? This can be frequent in practice in some cases.
nilshomer is offline   Reply With Quote
Old 07-16-2009, 07:59 AM   #14
The_Roads
Member
 
Location: USA

Join Date: May 2009
Posts: 32
Default

I am trying to quantify rare variants in deep coverage of small templates. I am not a statistician/bioinformatics pro but as far as i can see duplicate removal will introduce a bias that will enrich for rare variants both real and introduced.

Aside from library prep and pipeline issues which introduce their own biases, are there any assemblers that are designed for this type of assembly as opposed to large ref seq low coverage (<100x) assemblies?
The_Roads is offline   Reply With Quote
Old 07-16-2009, 08:28 AM   #15
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,283
Default

Quote:
Originally Posted by The_Roads View Post
I am trying to quantify rare variants in deep coverage of small templates. I am not a statistician/bioinformatics pro but as far as i can see duplicate removal will introduce a bias that will enrich for rare variants both real and introduced.

Aside from library prep and pipeline issues which introduce their own biases, are there any assemblers that are designed for this type of assembly as opposed to large ref seq low coverage (<100x) assemblies?
Have you tried Velvet or Abyss? You can give either program the expected coverage and they will will work fine in my experiences.
nilshomer is offline   Reply With Quote
Old 07-16-2009, 01:12 PM   #16
The_Roads
Member
 
Location: USA

Join Date: May 2009
Posts: 32
Default

no not yet. i'm going to have run through with them as soon as i can. thanks for the info
The_Roads is offline   Reply With Quote
Old 07-22-2009, 07:31 AM   #17
polivares
Member
 
Location: Manchester, UK

Join Date: Jan 2009
Posts: 29
Default

Hi all,
I am still confused with the difference between pair ends and mate pairs. Is it just that you call mates a pair of sequences (ends) from the same molecule?

Tnx
polivares is offline   Reply With Quote
Old 07-22-2009, 07:44 AM   #18
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,287
Default

Quote:
Originally Posted by polivares View Post
Hi all,
I am still confused with the difference between pair ends and mate pairs. Is it just that you call mates a pair of sequences (ends) from the same molecule?

Tnx
Hey...Nice last name!
ECO is offline   Reply With Quote
Old 07-22-2009, 07:53 AM   #19
polivares
Member
 
Location: Manchester, UK

Join Date: Jan 2009
Posts: 29
Default

Quote:
Originally Posted by ECO View Post
Hey...Nice last name!
That means?
polivares is offline   Reply With Quote
Old 07-22-2009, 08:28 AM   #20
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,287
Default

Quote:
Originally Posted by polivares View Post
That means?
Olivares is my surname as well. Haven't met many before.
ECO is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:13 PM.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.