Seqanswers Leaderboard Ad

**lh3** · 05-04-2012, 04:37 AM

In the 2nd example, the orientation is wrong.

**dg13** · 05-16-2012, 02:52 AM

Hi Heng,
Thanks for your quick reply. I'll expand it a little for anybody else who is confused (I know I had to think a bit before I understood). There are two important points - first, we map to the + strand of the reference genome and second, sequencing is always in the 5'--->3' direction. In example 2, the read mapping to the + strand of the reference (in this case the first of the pair) is located at a higher genomic position (793910) than the read mapping to the - strand of the reference (793838). By definition, this should never happen as, if PE sequencing works as it is supposed to, we should always have one read from the + strand (reading 5'--->3') and one from the - strand (again 5'--->3'). The selection of reference strand is arbitrary, so the + strand read can be sequenced first or second, but by definition the start of the + strand read will always map to a lower genomic position than the - strand read because, in correctly oriented reads sequencing is always in the 5' -> 3' direction i.e.

R1----->

+ 5 ---------------- 3
- 3 ---------------- 5

<----- R2

or

R2----->

+ 5 ---------------- 3
- 3 ---------------- 5

<----- R1

In example 2 of the first post the + strand read maps to a higher genomic position in reference than the - strand read. Its not clear to me how this may happen, but perhaps if the reads are in a parallel orientation rather than opposing its possible. In these cases, BWA computes the overlap rather then the insert size, and flags examples like this as improper pairs. However, there are other categories of error which also get flagged as improper, for example insert sizes greater than 3sds (I think this is correct - Heng?). Usually flagging these large inserts is the right thing to do as library prep involves a size selection step, so a large "insert" indicates poor mapping of one or both reads. In the special case of mapping PE RNA-seq reads to the genome (which you might want to do to identify novel transcripts etc) you may easily get apparent large "inserts" (due to introns between two exons) in legitimate read pairs which you want to keep, so care is warranted when using the improper pair filter in QC. I'm sure this is redundant info for many of you, but I thought I'd post for any who are curious.

**jp.** · 07-30-2013, 09:17 PM

hi dg13
Which commands did you use to get the example 1.
I am troubling to find my real insert size in RNA seq data.
Expecting your reply.
Thank you.

Topics	Statistics	Last Post
The Role of Spliceosomes in RNA Splicing and Genome Evolution by seqadmin Started by seqadmin, Today, 07:03 AM	0 responses 10 views 0 likes	Last Post by seqadmin Today, 07:03 AM
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 31 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 41 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 33 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM

Seqanswers Leaderboard Ad

Announcement

BWA insert size calculation in paired end sequencing

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News