Seqanswers Leaderboard Ad

**sphil** · 08-02-2011, 06:30 AM

Originally posted by ashwatha View Post

Is there a directionality to the reads?

Yes there is, normally it is indicated for ech read through a '+' or '-' or W(atson)/C(rick) or F(orward)/R(everse). So you can distinguish between reads from both strand.

Originally posted by ashwatha View Post

Are all reads represented in the same "direction" as related to the genome from which they were extracted? Does this apply to the two pairs of a paired end read?

No all reads are considered to be written from left to right. The strand flag should make clear which strand the read originated from.
To answer your question how one is able to find mate pairs in the sequence file. Usually in the fastq file there is a flag at the end of the header line (normally '/1' or '/2') which indicates whether it is a 'front' or an 'end' read. Comming up with your example it should look like this:
>Read1 more headerinfo /1
AGC
>Read2 more headerinfo /2
TCG

nice revision on all such stuff can be found on: http://en.wikipedia.org/wiki/FASTQ_format , for instance.

hope that helps,

best

phil

**swbarnes2** · 08-02-2011, 08:30 AM

Originally posted by ashwatha View Post

Hi,

I have a slightly weird question about paired end reads. I will try to explain as best as I can:

For simplicity, let's assume that the read length is just 3 base pairs. Let the DNA fragment being read have the sequence AGCTAAGGTCG.

With paired end reads, my understanding is that we will read the first three (AGC) and last three (TCG) bases of this sequence, with the middle section (TAAGG) unknown.

With the common data formats used to represent paired end reads (FastQ etc), how is the pair represented? Are the two pairs shown as AGC and TCG (both reads running left to right on the original sequence) or as AGC and GCT - the "left" read running from left to right and the "right" read running from right to left, presumably the direction in which the two reads were extracted?

I guess what I am asking is: Is there a directionality to the reads? Are all reads represented in the same "direction" as related to the genome from which they were extracted? Does this apply to the two pairs of a paired end read?

Please let me know if I am not making any sense at all :-)

Here's a real example from a Staph Aureus run we did a few weeks ago. The first is from read 1, the second is from read 2

@I-HWUSI-EAS1826:5:70N3AAAXX_FL:8:4:16707:8219 1:N:0:CGATGT
ATACATCCTCATTTCTCACTAATTTATTTCTGTTAAAATATTAAAACTAACATGATCCAT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

@I-HWUSI-EAS1826:5:70N3AAAXX_FL:8:4:16707:8219 2:N:0:CGATGT
AATTACAGCGAAGGATTTATTAGAAAATATGCAAGCGTAGTAAATATTGAACCTAACCAA
+
IIIIIIIIIIIIIIIHIHIIIHIIIIIIIIIHIIIIIHIIIIIIIHIIIIIIIIIIGIII

If you blast those, you'll see that they run in opposite directions, and towards each other, as a proper paired end pair of reads should.

So actually, in your example, your reads would be AGC and CGA. Most alignment programs would report them both in the forward direction, and have a tag in there to tell you that the read is rev comped where appropriate.

**ashwatha** · 08-02-2011, 07:31 PM

Hi Phil and swbarnes,

thanks for the info - very helpful.

**chenyao** · 08-03-2011, 05:31 PM

Originally posted by swbarnes2 View Post

Here's a real example from a Staph Aureus run we did a few weeks ago. The first is from read 1, the second is from read 2

If you blast those, you'll see that they run in opposite directions, and towards each other, as a proper paired end pair of reads should.

So actually, in your example, your reads would be AGC and CGA. Most alignment programs would report them both in the forward direction, and have a tag in there to tell you that the read is rev comped where appropriate.

I don't get it. Does the pair-end reads have to come from the opposite directions (one is "+", the other is "-"). If it is, why your example show both read are "+"?

**Stuart Inglis** · 08-03-2011, 06:41 PM

A little forward/reverse and paired end example

I thought maybe a little example would help (using RTG Investigator tool chain of course

)

I grabbed a bit of the sequence above and manually made two reads. Two 10-mers, the first forward from the beginning of the sequence and second reverse complement from the end of the sequence. e.g. I grabbed the last 10-mer (CATGATCCAT) and reverse complemented it to get ATGGATCATG.

$ cat template.fasta
>test
ATACATCCTCATTTCTCACTAATTTATTTCTGTTAAAATATTAAAACTAACATGATCCAT

$ cat reads.fasta
>read1
ATACATCCTC
>read2
ATGGATCATG

$ rtg format -o t template.fasta

Run a single end mapping run:

$ rtg map -o o -i reads.fasta -F fasta -t t
$ zcat o/alignments.sam.gz | grep -v "@"
0 0 test 1 37 10= * 0 0 ATACATCCTC * AS:i:0 NM:i:0 IH:i:1 NH:i:1
1 16 test 51 37 10= * 0 0 CATGATCCAT * AS:i:0 NM:i:0 IH:i:1 NH:i:1

The second column of the SAM file shows that a bit (0x10 which equals 16 in decimal) is set if the read is reverse frame.

The SAM file contains the read in the forward direction (same as the template sequence), but this extra flag allows you to determine the direction.

In the paired end world this may look like:

$ cat left.fasta
>read
ATACATCCTC

$ cat right.fasta
>read
ATGGATCATG

Then run a paired-end mapping run:

$ rtg map -o o -l left.fasta -r right.fasta -F fasta -t t
$ zcat o/mated.sam.gz | grep -v "@"
0 99 test 1 55 10= = 51 60 ATACATCCTC * AS:i:0 NM:i:0 MQ:i:255 XA:i:0 IH:i:1 NH:i:1
0 147 test 51 55 10= = 1 -60 CATGATCCAT * AS:i:0 NM:i:0 MQ:i:255 XA:i:0 IH:i:1 NH:i:1

The second column is harder to decode now. 99 and 147 mean mapped in correct orientation and correct insert size. For a breakdown of the two codes see http://ppotato.wordpress.com/2010/08...-paired-reads/

Hope this helps.

cheers
Stu

**swbarnes2** · 08-03-2011, 09:46 PM

Originally posted by chenyao View Post

I don't get it. Does the pair-end reads have to come from the opposite directions (one is "+", the other is "-"). If it is, why your example show both read are "+"?

It's a fastq file, it hasn't been mapped, the software that made it has no idea whether it is in the forward or reverse direction, it doesn't even know what reference I want to align it to.

The plus is just a place holder. In the old days, before fastqs routinely had several million individual entries per file, the name of the read was rewritten after the + sign. Once fastqs started having millions of 40-mers and their 40 character quality scores, repeating the read name made each read 25% bigger than it had to be, so now, no one writes anything after that plus sign.

And if you do a standard paired end read, then yes, the reads should point in at each other. I think mate paired reads, which are a more complex prep intended to greatly increase the genomic distance between the two ends, the reads read outwardly, but I might be mistaken on that point.

If you have paired end reads that don't point in at each other, then you have inaccurate reads, or an inaccurate reference as compared to the sample.

**Arthur123** · 02-14-2012, 08:21 PM

Originally posted by swbarnes2 View Post

It's a fastq file, it hasn't been mapped, the software that made it has no idea whether it is in the forward or reverse direction, it doesn't even know what reference I want to align it to.

.

So for illumina pair end data, read 1 and read 2 does not denote forward and reverse, right?

**SES** · 02-15-2012, 08:30 AM

Originally posted by swbarnes2 View Post

I think mate paired reads, which are a more complex prep intended to greatly increase the genomic distance between the two ends, the reads read outwardly, but I might be mistaken on that point.

Yes, that is my understanding as well. Paired-end reads are "innie" and mate pairs are "outie." Sanger paired ends are generated from a completely different process (sequencing the ends of BAC clones) and the result is that those paired ends are "outie." This leads to a lot of confusion when using a mix of technologies, or using software that expects your paired ends in a certain orientation.

**swbarnes2** · 02-15-2012, 09:34 AM

Originally posted by Arthur123 View Post

So for illumina pair end data, read 1 and read 2 does not denote forward and reverse, right?

The enzymes putting the adaptors on the piece of DNA have no idea which way your particular reference is oriented, and have no way of distinguishing which end of the DNA coresponds to the "forward" sequence. They are just molecules.

The only exception would be if you were doing something like a library of vectors with various insert sequences, and you wanted to know all the insert sequences. One could do PCR around those inserts, and put adaptor sequences on those PCR primers, and then adaptor 1 would be fixed at one point in the vector, and adaptor 2 woud be fixed at the other end.

But if you are just randomly cutting DNA, then half of read 1 will be in one orientation, half will be in the other. Same with read 2.

**Arthur123** · 02-15-2012, 12:14 PM

Originally posted by swbarnes2 View Post

The enzymes putting the adaptors on the piece of DNA have no idea which way your particular reference is oriented, and have no way of distinguishing which end of the DNA coresponds to the "forward" sequence. They are just molecules.

The only exception would be if you were doing something like a library of vectors with various insert sequences, and you wanted to know all the insert sequences. One could do PCR around those inserts, and put adaptor sequences on those PCR primers, and then adaptor 1 would be fixed at one point in the vector, and adaptor 2 woud be fixed at the other end.

But if you are just randomly cutting DNA, then half of read 1 will be in one orientation, half will be in the other. Same with read 2.

Thanks! You are awesome!

**fongchun** · 07-18-2013, 02:58 PM

Originally posted by sphil View Post

No all reads are considered to be written from left to right. The strand flag should make clear which strand the read originated from.
To answer your question how one is able to find mate pairs in the sequence file. Usually in the fastq file there is a flag at the end of the header line (normally '/1' or '/2') which indicates whether it is a 'front' or an 'end' read. Comming up with your example it should look like this:
>Read1 more headerinfo /1
AGC
>Read2 more headerinfo /2
TCG
nice revision on all such stuff can be found on: http://en.wikipedia.org/wiki/FASTQ_format , for instance.

hope that helps,

best

phil

Hi,

Are you sure about this? Because I have two paired fastq files from a MiSeq machine and here is the read pair:

Read Pair 1:

@M00569:20:000000000-A3EGF:1:1101:14488:1761 1:N:0:1
ACAGAATGTAAGCTTTCTAACTCATAAAACTCTTTCTGGAGGTCTGTAATTTTCTGCATAGGATCTTCATAAATCTGTTCTGAAAGTCTTATCTTTTGCTCTCTTCCTTTCTGCTGCATAAATCCATTTTCTTCTTCTTGCCTTGTTAGCA
+
>>>334DBDB55EGGGGG65FGGBG5555FGHHHHHHFFBA?EFGFHEFGHHHHHBFBHBBB3FGHHHFHFBBFGHBFHHHE5E3BFGHH5GGHHHHFDHGFHHHHHHHHHHFFHBG3F43EFGHFHHHHHHFHHHHHHBFGHF3GGF4F4

Read Pair 2:

@M00569:20:000000000-A3EGF:1:1101:14488:1761 2:N:0:1
NNGGGATGCTAATAGAGGATTATATTTATGAATCTTTAGTAGAAGACACGTACAATGGATCGGTAGATGGCAGTCTGCTAACAAGGCAAGAAGAAGAAAATGGATTTATGCAGCAGAAAGGAAGAGAGCAAAAGATAAGACTTTCAGAACA
+
##1111>1>D33B331111BFBGBGHHFHBFFFGGGHC1FB2B21CFBFCHFG?1FBB1FF//EA/AFDBG0EGGHFFHFFFFBGEFA0C00C10>BCCBGBB1FGHFGFGFFFF0C01CE0CAAG0>GHCBFFAHFFHEHHGHHBB2FF0

Here the second read pair is actually the reverse complement of the reference human sequence at the loci. So in that example that was stated, I would have thought it would be:

>Read1 more headerinfo /1
AGC
>Read2 more headerinfo /2
CGA

Perhaps I am mistaken?

**sphil** · 07-19-2013, 12:15 AM

As swbarnes2 stated above. The reads are just like you said. I just wanted to point out that it is going from left to right and therefore didn't mention that it is actually also rev. comped. So your second read should always be the reverse complement of the loci the 'first' read maps to.

Maybe this http://www.illumina.com/technology/p...ing_assay.ilmn helps to clarify things for good

.

Originally posted by swbarnes2 View Post

So actually, in your example, your reads would be AGC and CGA. Most alignment programs would report them both in the forward direction, and have a tag in there to tell you that the read is rev comped where appropriate.

**serrano.gus** · 07-23-2017, 04:14 AM

Hi. I was given raw reads by a service provider but there were no Left or Right reads. Is there any way that I could revert back to separate R and L?

**serrano.gus** · 07-23-2017, 04:22 AM

Please ignore my question. I already found the paired reads. Thanks.

Topics	Statistics	Last Post
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, Yesterday, 06:57 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 16 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, 04-30-2024, 12:17 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-30-2024, 12:17 PM

Seqanswers Leaderboard Ad

Announcement

Question about paired end reads

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News