![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
50 bp paired end reads vs. 100 bp single end reads | efoss | Bioinformatics | 12 | 01-15-2014 09:05 PM |
Can Cuffdiff treat paired-end and single-end reads at the same time? | zun | RNA Sequencing | 3 | 06-12-2012 06:37 PM |
question about paired-end libraries | ajthomas | 454 Pyrosequencing | 29 | 11-29-2011 01:30 PM |
paired-end reads mapped to genome.. gene with only one direction of paired-end reads? | danwiththeplan | Bioinformatics | 2 | 09-22-2011 03:06 AM |
paired-end question | abelhj | Illumina/Solexa | 1 | 12-30-2009 02:01 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: India Join Date: Jul 2011
Posts: 14
|
![]()
Hi,
I have a slightly weird question about paired end reads. I will try to explain as best as I can: For simplicity, let's assume that the read length is just 3 base pairs. Let the DNA fragment being read have the sequence AGCTAAGGTCG. With paired end reads, my understanding is that we will read the first three (AGC) and last three (TCG) bases of this sequence, with the middle section (TAAGG) unknown. With the common data formats used to represent paired end reads (FastQ etc), how is the pair represented? Are the two pairs shown as AGC and TCG (both reads running left to right on the original sequence) or as AGC and GCT - the "left" read running from left to right and the "right" read running from right to left, presumably the direction in which the two reads were extracted? I guess what I am asking is: Is there a directionality to the reads? Are all reads represented in the same "direction" as related to the genome from which they were extracted? Does this apply to the two pairs of a paired end read? Please let me know if I am not making any sense at all :-) Last edited by ashwatha; 07-31-2011 at 09:54 PM. Reason: grammar |
![]() |
![]() |
![]() |
#2 | |
Senior Member
Location: Stuttgart, Germany Join Date: Apr 2010
Posts: 192
|
![]()
Yes there is, normally it is indicated for ech read through a '+' or '-' or W(atson)/C(rick) or F(orward)/R(everse). So you can distinguish between reads from both strand.
Quote:
To answer your question how one is able to find mate pairs in the sequence file. Usually in the fastq file there is a flag at the end of the header line (normally '/1' or '/2') which indicates whether it is a 'front' or an 'end' read. Comming up with your example it should look like this: >Read1 more headerinfo /1 AGC >Read2 more headerinfo /2 TCG nice revision on all such stuff can be found on: http://en.wikipedia.org/wiki/FASTQ_format , for instance. hope that helps, best phil |
|
![]() |
![]() |
![]() |
#3 | ||
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]() Quote:
Quote:
So actually, in your example, your reads would be AGC and CGA. Most alignment programs would report them both in the forward direction, and have a tag in there to tell you that the read is rev comped where appropriate. |
||
![]() |
![]() |
![]() |
#4 |
Member
Location: India Join Date: Jul 2011
Posts: 14
|
![]()
Hi Phil and swbarnes,
thanks for the info - very helpful. |
![]() |
![]() |
![]() |
#5 | |
Member
Location: Beijing Join Date: Jul 2011
Posts: 74
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#6 |
Registered Vendor
Location: New Zealand Join Date: Jun 2011
Posts: 9
|
![]()
I thought maybe a little example would help (using RTG Investigator tool chain of course
![]() I grabbed a bit of the sequence above and manually made two reads. Two 10-mers, the first forward from the beginning of the sequence and second reverse complement from the end of the sequence. e.g. I grabbed the last 10-mer (CATGATCCAT) and reverse complemented it to get ATGGATCATG. $ cat template.fasta >test ATACATCCTCATTTCTCACTAATTTATTTCTGTTAAAATATTAAAACTAACATGATCCAT $ cat reads.fasta >read1 ATACATCCTC >read2 ATGGATCATG $ rtg format -o t template.fasta Run a single end mapping run: $ rtg map -o o -i reads.fasta -F fasta -t t $ zcat o/alignments.sam.gz | grep -v "@" 0 0 test 1 37 10= * 0 0 ATACATCCTC * AS:i:0 NM:i:0 IH:i:1 NH:i:1 1 16 test 51 37 10= * 0 0 CATGATCCAT * AS:i:0 NM:i:0 IH:i:1 NH:i:1 The second column of the SAM file shows that a bit (0x10 which equals 16 in decimal) is set if the read is reverse frame. The SAM file contains the read in the forward direction (same as the template sequence), but this extra flag allows you to determine the direction. In the paired end world this may look like: $ cat left.fasta >read ATACATCCTC $ cat right.fasta >read ATGGATCATG Then run a paired-end mapping run: $ rtg map -o o -l left.fasta -r right.fasta -F fasta -t t $ zcat o/mated.sam.gz | grep -v "@" 0 99 test 1 55 10= = 51 60 ATACATCCTC * AS:i:0 NM:i:0 MQ:i:255 XA:i:0 IH:i:1 NH:i:1 0 147 test 51 55 10= = 1 -60 CATGATCCAT * AS:i:0 NM:i:0 MQ:i:255 XA:i:0 IH:i:1 NH:i:1 The second column is harder to decode now. 99 and 147 mean mapped in correct orientation and correct insert size. For a breakdown of the two codes see http://ppotato.wordpress.com/2010/08...-paired-reads/ Hope this helps. cheers Stu |
![]() |
![]() |
![]() |
#7 | |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]() Quote:
The plus is just a place holder. In the old days, before fastqs routinely had several million individual entries per file, the name of the read was rewritten after the + sign. Once fastqs started having millions of 40-mers and their 40 character quality scores, repeating the read name made each read 25% bigger than it had to be, so now, no one writes anything after that plus sign. And if you do a standard paired end read, then yes, the reads should point in at each other. I think mate paired reads, which are a more complex prep intended to greatly increase the genomic distance between the two ends, the reads read outwardly, but I might be mistaken on that point. If you have paired end reads that don't point in at each other, then you have inaccurate reads, or an inaccurate reference as compared to the sample. |
|
![]() |
![]() |
![]() |
#8 |
Junior Member
Location: TX Join Date: Feb 2012
Posts: 2
|
![]()
So for illumina pair end data, read 1 and read 2 does not denote forward and reverse, right?
|
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: Vancouver, BC Join Date: Mar 2010
Posts: 275
|
![]()
Yes, that is my understanding as well. Paired-end reads are "innie" and mate pairs are "outie." Sanger paired ends are generated from a completely different process (sequencing the ends of BAC clones) and the result is that those paired ends are "outie." This leads to a lot of confusion when using a mix of technologies, or using software that expects your paired ends in a certain orientation.
|
![]() |
![]() |
![]() |
#10 | |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]() Quote:
The only exception would be if you were doing something like a library of vectors with various insert sequences, and you wanted to know all the insert sequences. One could do PCR around those inserts, and put adaptor sequences on those PCR primers, and then adaptor 1 would be fixed at one point in the vector, and adaptor 2 woud be fixed at the other end. But if you are just randomly cutting DNA, then half of read 1 will be in one orientation, half will be in the other. Same with read 2. |
|
![]() |
![]() |
![]() |
#11 | |
Junior Member
Location: TX Join Date: Feb 2012
Posts: 2
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#12 | |||
Member
Location: Vancouver, BC Join Date: May 2011
Posts: 55
|
![]() Quote:
Are you sure about this? Because I have two paired fastq files from a MiSeq machine and here is the read pair: Read Pair 1: Quote:
Quote:
>Read1 more headerinfo /1 AGC >Read2 more headerinfo /2 CGA Perhaps I am mistaken? |
|||
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: Stuttgart, Germany Join Date: Apr 2010
Posts: 192
|
![]()
As swbarnes2 stated above. The reads are just like you said. I just wanted to point out that it is going from left to right and therefore didn't mention that it is actually also rev. comped. So your second read should always be the reverse complement of the loci the 'first' read maps to.
Maybe this http://www.illumina.com/technology/p...ing_assay.ilmn helps to clarify things for good ![]() |
![]() |
![]() |
![]() |
#14 |
Junior Member
Location: Philippines Join Date: Mar 2017
Posts: 3
|
![]()
Hi. I was given raw reads by a service provider but there were no Left or Right reads. Is there any way that I could revert back to separate R and L?
|
![]() |
![]() |
![]() |
#15 |
Junior Member
Location: Philippines Join Date: Mar 2017
Posts: 3
|
![]()
Please ignore my question. I already found the paired reads. Thanks.
|
![]() |
![]() |
![]() |
Tags |
paired end read, sequencing |
Thread Tools | |
|
|