Seqanswers Leaderboard Ad

**kopi-o** · 11-13-2011, 11:33 AM

FWIW, I don't really think the "properly paired" statistic is meaningful in this context, because of the intron issue that you discuss. I assume the mate inner distance is used in a meaningful way inside TopHat, but I don't care much about the percent properly paired I get from e.g. samtools flagstat. After all, a lot of the transcriptome is spliced.

**rnaseek** · 11-13-2011, 12:17 PM

kopl-o's suggestion to check the percentage of proper pairs will help you find whether your analysis is ok. It does help to use the fragment size distribution from the library prep and use it to compute the mean inner distance. In the library prep that we used (and for one sample), we found that having a small distance (20-100) gave fewer properly paired reads. And the percentage reached close 90% for longer distances (>100).

**ozs2006** · 11-27-2011, 07:35 AM

related question

Hello all,

I understood that in tophat 1.3.3 there is no need to assign the inner distance.
I used data from recent published paper, where all reads in the repository were marked as properly paired (flags 99 or 147).
However, when I re-aligned them using tophat 1.3.3 they mapped to the same positions (hg19) but don't marked as properly paired (they got the flags 129 or 65).
My question is: why I don't get the exact flags?

For example:
The paper's alignment:
HWUSI-EAS371_0021:3:28:19038:18734#0/ 147 chrM 16261 255 60M = 16162 -159 CCCCTCACCCACTAGGATATCAACAAACCTACCCACCCTTAACAGTACATAGCACATAAA EEEE?EAEEEFFFGGGGGGBGGFGGECEEC<CEE@GGGGGGGGGGGGGGGGGGGGGGGGG NM:i:2 XS:A:+

My re-alignment (using tophat 1.3.3):
HWUSI-EAS371_0021:3:28:19038:18734#0 129 chrM 16261 255 60M = 16162 -159 CCCCTCACCCACTAGGATATCAACAAACCTACCCACCCTTAACAGTACATAGCACATAAA EEEE?EAEEEFFFGGGGGGBGGFGGECEEC<CEE@GGGGGGGGGGGGGGGGGGGGGGGGG NM:i:2 NH:i:1

Thanks in advance,
Oz Solomon

**rnaseek** · 11-27-2011, 08:23 AM

Interesting, I believe many things could potentially cause this. I am bit unclear on how you did the "re-alignment". Did you use the whole data again to align or only the read satisfying (99 or 147)? If later, did you make sure to include the paired end sequence as well. I think the flags 83, 99, 147, and 163 will give the all properly paired reads (twice actually.)

**ozs2006** · 11-27-2011, 10:40 AM

Thanks for the quick reply

As you noted, it is very strange, because all the publicly available reads are flagged as 99 and 147, and I used all of them.

**ozs2006** · 11-27-2011, 11:10 AM

I created the fastq files from the sam files of the publication (using awk) and then ran tophat.

1. awk:

awk '{if($2==99) print "@" $1 "\n" $10 "\n" "+\n" $11}' > sample_1.fq
awk '{if($2==147) print "@" $1 "\n" $10 "\n" "+\n" $11}' > sample_2.fq

2. Tophat's command I used:

/tophat-1.3.3.Linux_x86_64/tophat -p 8 --min-anchor-length 15 --splice-mismatches 0 --keep-tmp --GTF /data/pipeline_in/Genomes/Human_GRCh37/Homo_sapiens.GRCh37.64.gtf /data/pipeline_in/Genomes/Human_GRCh37/Index/hg19 sample_1.fq sample_2.fq

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Inner distance value for TopHat / Proper mapping with RNA-Seq PE data

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News