Hello all,
I have similar problems as I am currently determining my inner distance between the paired end reads and have received much help from this forum thread.
My reads are Illumina HiSeq RNA-seq paired end reads. From the Lab I received the information that the inner distance between the paired end reads is between 100-150 bp as the reads are 100 bp long and size selected for 300-350 bp.
I have mapped some reads in TopHat both with mean inner distance 125 and 150 and SD 25 and 50. I got fairly good results in my opinion. In one sample with 35 million reads, and mean inner distance of 150, 27 million were mapped and out of these, 90% were properly paired. When I used -r 125, 26 million were mapped (88% properly paired).
However, I was uncertain about the inner distance so I followed the advice given in this thread and elsewhere. So I mapped some samples with Bowtie to the transcriptome, got sam-files, used samtools to convert to bam and sort. I used both the samtools script given above and the Picard tools InsertSizeMetrics, which resulted in Mean insert size= 150 and SD= ~30. This would mean that my mean inner distance is 150-2*100 = -50. When looking manually at the coordinates in the Bowtie-produced bam-file, the paired reads give coordinates differing from each other ranging from 15-115 bp. This also points to a mean inner distance of -50, since they seem to be overlapping somewhat.
All is well so far, but when I tried mapping some samples with my newly found mean inner distance of -50, the number of reads that mapped decreased in one sample from 2.5 million to 2.3 million and the above sample from 26-27 million to 26 million. Not much but the biggest difference was in the flagstat output properly paired reads. Before it was 90%, but now with -50 mean inner distance this number was only 57%!
To conclude what I just said:
- Lab technician reports a inner distance between paired ends to 100-150.
- Picard and samtools workflow with Bowtie mapping to transcriptome reports distance as -50. But different SD values? Which is correct?
- When mapping with -r -50 instead of 150 I get slightly fewer reads mapped and a major decrease in properly paired reads. (SD=30 given the Picard output)
This has made me confused and I appreciate any help and tips I can get as to why I get these results and what the true mean inner distance and standard deviation is.
I have similar problems as I am currently determining my inner distance between the paired end reads and have received much help from this forum thread.
My reads are Illumina HiSeq RNA-seq paired end reads. From the Lab I received the information that the inner distance between the paired end reads is between 100-150 bp as the reads are 100 bp long and size selected for 300-350 bp.
I have mapped some reads in TopHat both with mean inner distance 125 and 150 and SD 25 and 50. I got fairly good results in my opinion. In one sample with 35 million reads, and mean inner distance of 150, 27 million were mapped and out of these, 90% were properly paired. When I used -r 125, 26 million were mapped (88% properly paired).
However, I was uncertain about the inner distance so I followed the advice given in this thread and elsewhere. So I mapped some samples with Bowtie to the transcriptome, got sam-files, used samtools to convert to bam and sort. I used both the samtools script given above and the Picard tools InsertSizeMetrics, which resulted in Mean insert size= 150 and SD= ~30. This would mean that my mean inner distance is 150-2*100 = -50. When looking manually at the coordinates in the Bowtie-produced bam-file, the paired reads give coordinates differing from each other ranging from 15-115 bp. This also points to a mean inner distance of -50, since they seem to be overlapping somewhat.
All is well so far, but when I tried mapping some samples with my newly found mean inner distance of -50, the number of reads that mapped decreased in one sample from 2.5 million to 2.3 million and the above sample from 26-27 million to 26 million. Not much but the biggest difference was in the flagstat output properly paired reads. Before it was 90%, but now with -50 mean inner distance this number was only 57%!
To conclude what I just said:
- Lab technician reports a inner distance between paired ends to 100-150.
- Picard and samtools workflow with Bowtie mapping to transcriptome reports distance as -50. But different SD values? Which is correct?
- When mapping with -r -50 instead of 150 I get slightly fewer reads mapped and a major decrease in properly paired reads. (SD=30 given the Picard output)
This has made me confused and I appreciate any help and tips I can get as to why I get these results and what the true mean inner distance and standard deviation is.
Comment