Hi,
Let me start of by saying that I am thoroughly confused and I hope someone can help me make sense of my data.
I have stranded RNA-seq data which I believe has been sequenced using the illumina dUTP method.
This thread is a great resource on understand the rationale behind stranded pair-end sequencing and I believe I have understood the rationale. http://seqanswers.com/forums/showthr...pt+orientation
My question relates to the inconsistency I am seeing when looking at my data in IGV.
I have aligned my reads using STAR, using ----readFilesIn R1.fastq R2.fastq.
Since I'm interested in the reads corresponding to the negative transcript, I have then filtered the aligned reads on the samflags 147 and 99.
To summarise my current understanding:
R1(+)R2(-) with positive insert size should be denoted as F1R2 - from the negative transcript - should have sam flags R1(99)/R2(147)
R1(-)R2(+) with negative insert size should be denoted as F2R1 - from the positive transcript - should have sam flags R1(83)/R2(163)
However when I subsequently look at only those reads with flags 99 or 147 in IGV I notice the following:
The pair orientation is F1R2 for all my read pairs, however some R1 reads are mapped to the (-) strand and the R2 reads to the (+) strand, with a negative insertsize (which makes me believe they should be denoted F2R1). I am also seeing R1 reads mapped to the (+) strand and the R2 reads mapped to the (-) strand with a positive insert size (which makes me believe they should be denoted F1R2).
I believe I should be only seeing F2R1 pair orientation, from the flags 99 or 147 reads. So I believe there is some inconsitency somewhere.
Could someone please help me make some sense of this? Most likely I misunderstood the use of either STAR or IGV or the samflags. Or perhaps the library is not what I think it is.
Any help would be greatly appreciated.
Some additional Info:
- The genome I'm investigating is very small (5000bp) and the transcripts on both strands will overlap.
- when I count the number of reads for each of the sam flags mentioned it is the following:
99 = 1440
147 = 1440
83 = 1113334
163 = 1113334
- I'm expecting to see more reads from the positive than from the negative transcripts
- When summarising the number of reads with flags 99, 147, 83, 163, the total is about 99% of the uniquely aligned reads.
Let me start of by saying that I am thoroughly confused and I hope someone can help me make sense of my data.
I have stranded RNA-seq data which I believe has been sequenced using the illumina dUTP method.
This thread is a great resource on understand the rationale behind stranded pair-end sequencing and I believe I have understood the rationale. http://seqanswers.com/forums/showthr...pt+orientation
My question relates to the inconsistency I am seeing when looking at my data in IGV.
I have aligned my reads using STAR, using ----readFilesIn R1.fastq R2.fastq.
Since I'm interested in the reads corresponding to the negative transcript, I have then filtered the aligned reads on the samflags 147 and 99.
To summarise my current understanding:
R1(+)R2(-) with positive insert size should be denoted as F1R2 - from the negative transcript - should have sam flags R1(99)/R2(147)
R1(-)R2(+) with negative insert size should be denoted as F2R1 - from the positive transcript - should have sam flags R1(83)/R2(163)
However when I subsequently look at only those reads with flags 99 or 147 in IGV I notice the following:
The pair orientation is F1R2 for all my read pairs, however some R1 reads are mapped to the (-) strand and the R2 reads to the (+) strand, with a negative insertsize (which makes me believe they should be denoted F2R1). I am also seeing R1 reads mapped to the (+) strand and the R2 reads mapped to the (-) strand with a positive insert size (which makes me believe they should be denoted F1R2).
I believe I should be only seeing F2R1 pair orientation, from the flags 99 or 147 reads. So I believe there is some inconsitency somewhere.
Could someone please help me make some sense of this? Most likely I misunderstood the use of either STAR or IGV or the samflags. Or perhaps the library is not what I think it is.
Any help would be greatly appreciated.
Some additional Info:
- The genome I'm investigating is very small (5000bp) and the transcripts on both strands will overlap.
- when I count the number of reads for each of the sam flags mentioned it is the following:
99 = 1440
147 = 1440
83 = 1113334
163 = 1113334
- I'm expecting to see more reads from the positive than from the negative transcripts
- When summarising the number of reads with flags 99, 147, 83, 163, the total is about 99% of the uniquely aligned reads.