SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Average Read Coverage for 454 paired end read data lisa1102 Core Facilities 8 10-18-2011 08:40 AM
Paired end Short read data SS1234 Bioinformatics 6 06-09-2010 01:16 PM
help! what is a paired-end read? hitdavid Bioinformatics 1 01-14-2010 11:42 AM
Difference in paired-end and single-end read ? darshan Bioinformatics 1 09-30-2009 11:44 PM

Reply
 
Thread Tools
Old 12-02-2009, 02:24 AM   #21
yvan.wenger
Member
 
Location: Switzerland

Join Date: Aug 2009
Posts: 28
Default paired-end and mate pairs

Paired-end and mate pairs. These two denominations refer to slightly different library preparations... correct?

Do we usually have the choice between the two? What are the practical differences between them? Anyone compared possible biases for example?

Best,

Yvan
yvan.wenger is offline   Reply With Quote
Old 12-02-2009, 12:32 PM   #22
ScottC
Senior Member
 
Location: Monash University, Melbourne, Australia.

Join Date: Jan 2008
Posts: 246
Default

Illumina refers to "paired end" as the original library preparation method they use, where you sequence each end of the same molecule. Because of the way the cluster generation technology works, it is limited to an inter-pair distance of ~300bp ( 200-600bp).

Illumina refers to "mate pairs" as sequences derived from their newer library prep method which is designed to provide paired sequences separated by a greater distance (between about 2 and 10kb). This method still actually only sequences the ends of ~400bp molecules, but this template is derived from both ends of a 2-10kb fragment that has had the middle section cut out and the 'internal' ends ligated in the middle. Basically, you take your 2-10kb random fragments, biotinylate the end, circularise them, shear the circles to ~400bp, capture biotinylated molecules, and then sequence those (they go into what is essentially a standard 'paired end' sample prep procedure).
ScottC is offline   Reply With Quote
Old 03-07-2010, 09:16 PM   #23
thondeboer
Member
 
Location: Bay Area

Join Date: Jan 2009
Posts: 24
Default Will original insert size of library be in SAM file header for Mate-paired sequences?

Would the original insert size for the library be listed in any resulting sequence alignment files (SAM/BAM for instance) for mate-paired reads (typicaly 2-5 kBp) or would it list the insert size of the paired-end reads (Typically ~500 bp). This would be important for the downstream Structural Variant analysis I can imagine...

I know that SAM/BAM is not (yet) produced by Illumina or Solid systems natively, but do any translators take care of this?

Thanks!
__________________
Thon
__________________________________
Thon de Boer, Ph.D.
Director of Product Management, Software
Strand Life Sciences
548 Market Street, Suite 82804
San Francisco, CA 94104, USA
[email protected]
www.strandls.com
Pioneers in Discovery Research Informatics
_______________________________________
thondeboer is offline   Reply With Quote
Old 03-07-2010, 09:30 PM   #24
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by thondeboer View Post
Would the original insert size for the library be listed in any resulting sequence alignment files (SAM/BAM for instance) for mate-paired reads (typicaly 2-5 kBp) or would it list the insert size of the paired-end reads (Typically ~500 bp). This would be important for the downstream Structural Variant analysis I can imagine...

I know that SAM/BAM is not (yet) produced by Illumina or Solid systems natively, but do any translators take care of this?

Thanks!
Take a look at the SAM specifcation. The insert size of an alignment (ISIZE) is defined as the distance between the 5' ends of the reads after mapping. The PI field in the RG tag in the header allows reads with the same expected insert size to be identified.

So to answer your question, if a read has its two ends 2-5Kbp or 500bp apart, ISIZE should be set accordingly. Optionally, the predicted size can be inferred from PI/RG in the header (when available).
nilshomer is offline   Reply With Quote
Old 05-03-2010, 06:45 PM   #25
flobpf
Member
 
Location: USA

Join Date: Apr 2010
Posts: 76
Cool

I think paired end refers to sequencing from the ENDS of a DNA fragment. If you're doing 75-bp paired end read using an insert size of 300bp, then the machine will sequence 1-75 and 300-225 (for simplicity, omitting the adapters)

Mate pair requires a completely different protocol and is typically over longer distances such as 2-5kb. If you want to sequence a 5kb mate pair library, then 5kb fragments of DNA are isolated on the gel, the ends are biotinylated, the fragment is circularized and sheared. So now when you select using streptavidin, you'll get the fragment that has the ENDS of the original 5kb fragment. This fragment is then sequenced.

Mate pair is more relevant in genome assembly, especially for covering repetitive sequences. Paired end can be used for anything - RNA, DNA.

You can get more information on the Illumina site:
http://www.illumina.com/technology/p...ing_assay.ilmn

http://www.illumina.com/technology/m...ing_assay.ilmn

I'm not sure how 454 or other methods define these terms
flobpf is offline   Reply With Quote
Old 05-04-2010, 05:24 AM   #26
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

Mate pair can also cover longer distances & is therefore more sensitive for detecting rearrangements / structural variation.

A third entry in this general category are what Pacific Biosciences calls strobe reads and Helicos calls dark fill. Essentially what happens is that sequence is read then there is a gap of probabilistic length then more sequence is read. This can be repeated many times to give a series of sequence islands separated by gaps with a constrained length distribution.
krobison is offline   Reply With Quote
Old 05-12-2010, 02:03 PM   #27
plichel
Junior Member
 
Location: Germany

Join Date: Mar 2010
Posts: 9
Default

I am not very familiar with all the biotechnology, so want to ask:
Is it possible to resolve which pair of reads come from the one, unique molecule ? (For instance, that the sequencer tracks this information somehow.) I mean usually there are a lot of reads and in diploid systems it might happen there are two 'versions' of the reads.
plichel is offline   Reply With Quote
Old 06-07-2010, 01:34 AM   #28
hege
Junior Member
 
Location: Singapore

Join Date: Jun 2010
Posts: 1
Default

I understand that a normal read pair can be aligned with either F-B or B-F orientation. What I'm not too sure about the importance of read1 and read2, does it mean anything if the mapped position of read1 is greater than read2?
hege is offline   Reply With Quote
Old 06-07-2010, 11:42 AM   #29
drio
Senior Member
 
Location: 4117'49"N / 24'42"E

Join Date: Oct 2008
Posts: 323
Default

Quote:
Originally Posted by hege View Post
I understand that a normal read pair can be aligned with either F-B or B-F orientation. What I'm not too sure about the importance of read1 and read2, does it mean anything if the mapped position of read1 is greater than read2?
Assuming the alignment is correct that may indicate a structural variation happened at that location (you want to check what other alignments on that location tell you).
__________________
-drd
drio is offline   Reply With Quote
Old 06-08-2010, 06:19 AM   #30
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

Quote:
Originally Posted by plichel View Post
I am not very familiar with all the biotechnology, so want to ask:
Is it possible to resolve which pair of reads come from the one, unique molecule ? (For instance, that the sequencer tracks this information somehow.)
Yes. In any of these schemes the sequencer keeps track of the physical location in the flowcell of the first read for each polony (spot of DNA) and then

For ligation technologies (Polonator & SOLiD), the reverse reads are really the same technology. For Illumina, there are some clever molecular acrobatics to replace the originally sequenced molecule with it's reverse complement to do paired end sequencing. For mate pairs, I believe they just strip the old extended primer & anneal a new primer. Helicos proposed just adding unlabeled bases for a time period (hence the term "dark fill") and PacBio shuts off their laser and keeps extending (hence "strobe sequencing" is an apropos term).

Complete Genomics used a 4-part read structure & may have gone to even higher number of mates.

Keeping track of the location of polonies or individual DNA molecules is one of the core tricks to all these technologies.
krobison is offline   Reply With Quote
Old 06-08-2010, 08:36 AM   #31
jdrum00
Member
 
Location: Houston, TX, USA

Join Date: Dec 2009
Posts: 16
Default

Thanks to everyone who contributes to these basic threads. I know they may get tedious for the old-timers, but they're vital for newer folks. Even if the same info is technically available at reference sites, it's often easier to understand when the answer is to a specific, basic question, rather than organized as vendor documentation or generalized teaching material!
jdrum00 is offline   Reply With Quote
Old 09-28-2010, 12:47 AM   #32
coswaters
Junior Member
 
Location: shanghai

Join Date: Sep 2010
Posts: 2
Default

scaffold and contig helps a lot~ thanks a lot~
coswaters is offline   Reply With Quote
Old 01-17-2011, 06:38 AM   #33
fedor5002
Junior Member
 
Location: Michigan

Join Date: Jan 2011
Posts: 1
Default About single-end read

For a double-stranded DNA molecule, are the single-end reads generated by sequencing from both 3' ends of the two strands of DNA?
fedor5002 is offline   Reply With Quote
Old 02-18-2011, 12:30 PM   #34
karve
Member
 
Location: Colorado

Join Date: Feb 2011
Posts: 12
Default

This helped but not enough ! I think I get it but then questions pop up - this mates in a pair, paired read stuff pops up when I look a the specification for the SAM format output from the bowtie program. There are flags values that are returned to indicate if the aligned read is one of pair and/or first one in a pair and/or second one in a pair and so on..

But how does it, meaning bowtie, know ? Am I right in thinking that is specified in the input record ? I can see the .fastq input format has a place holder for this, so I'm guessing that other input formats eg. sra also have it ? And if they don't then there's no way for bowtie ( or other algorithms ) to derive it ?

Seeing as this is all equipment specific I'm going to need to look at some videos that describe the front parts of the this entire operation. Any ideas where ? I was hoping to just pick it up and work with it from the point of sequences of characters - an IT perspective - but guess not.
karve is offline   Reply With Quote
Old 06-28-2011, 02:06 PM   #35
edilana.gomes
Junior Member
 
Location: Braga

Join Date: Mar 2011
Posts: 1
Default What is the difference between mate pairs, pair end and single end?

Hi!!
Can anyone clarify the difference between mate pair, pair end and single end reads?

Thanks.
edilana.gomes is offline   Reply With Quote
Old 06-28-2011, 03:02 PM   #36
ScottC
Senior Member
 
Location: Monash University, Melbourne, Australia.

Join Date: Jan 2008
Posts: 246
Default

Quote:
Originally Posted by edilana.gomes View Post
Hi!!
Can anyone clarify the difference between mate pair, pair end and single end reads?

Thanks.
Hi,

Mate-pairs and paired-end reads have been covered. Single end reads are just that... a single read from one end of each sheared DNA fragment.

Scott.
ScottC is offline   Reply With Quote
Old 08-18-2011, 10:15 AM   #37
raonyguimaraes
Member
 
Location: Belo Horizonte - Brazil

Join Date: Jun 2010
Posts: 38
Default

Hello,

Suppose I have two reads from an exome in fastq. How to determine if they are pair-end or single-end or mate-pair ?
raonyguimaraes is offline   Reply With Quote
Old 10-14-2011, 10:12 PM   #38
arkal
advancing one byte at a time!
 
Location: Bangalore, India

Join Date: Jun 2011
Posts: 56
Default

Quote:
Originally Posted by raonyguimaraes View Post
Hello,

Suppose I have two reads from an exome in fastq. How to determine if they are pair-end or single-end or mate-pair ?
i could be wrong but i think u can get ur answer by decoding the name of the fastq read... i.e the string following the > symbol.

-A
arkal is offline   Reply With Quote
Old 10-15-2011, 11:39 PM   #39
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Quote:
Originally Posted by raonyguimaraes View Post
Hello,

Suppose I have two reads from an exome in fastq. How to determine if they are pair-end or single-end or mate-pair ?
Paired end or mate pair reads have to have a means of knowing which two reads go together. Usually, this is by the name. In Illumina at least, reads are normally named by their coordinates on the flowcell. So if you don't have two reads with the same coordiantes, you've got single end.

Paired ends run towards each other, and are about 100-500 bp apart. Mate pairs run away from each other, and tend to be a few kb apart, but I belive sometimes they are contaminated with ordinary paired end data.
swbarnes2 is offline   Reply With Quote
Old 11-03-2011, 05:04 PM   #40
ywlim
Junior Member
 
Location: California

Join Date: Jul 2011
Posts: 6
Default

I have some questions about aligning paired end sequencing reads. I am using BWA sampe function to align my paired end reads and it worked, but surprisingly almost all reads are being paired with reads on a different chromosomes, resulting in a lot of "improper reads". I don't understand why BWA did that and I wonder if it was because I used the command "bwa sampe -a 15000 -A" to force bwa to not run smith waterman alignment for unmapped reads.

Also, if paired end reads share the same x and y coordinates, which are indicated by the first line of their fastq files, why doesn't bwa just pair them up by their coordinates? That seems like the most straightforward way to find the right pair to me.
ywlim is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:33 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO