Seqanswers Leaderboard Ad

**maize** · 12-04-2012, 06:36 PM

htseq

Hi, all
Another interesting thing I did is to treat pair-end reads data as "single-end" data to run throught tophat-htseq. In this case, I donot need to struggle with the above issue when using single-end data. The below is an example table from htseq. The end of table is a summary of counting, in which I added "on_feature", "total", and "on_feature(%)". Using pair-end reads does increase on feature ratio by 8%. The number between pair-end and single-end for each gene is interesting.

gene pair-end single-end
GRMZM2G061626 7 8
GRMZM2G061629 13 24
GRMZM2G061655 2 2
GRMZM2G061662 61 111
GRMZM2G061663 55 93
GRMZM2G061672 128 185
GRMZM2G061681 202 375
GRMZM2G061684 20 38
GRMZM2G061695 12 19
GRMZM2G061700 19 20
GRMZM2G061702 74 113
no_feature 225430 352813
ambiguous 95104 161366
too_low_aQual 0 0
not_aligned 0 0
alignment_not_unique 3325479 8191346
total 8939003 17842485
on_feature 5292990 9136960
on_feature(%) 0.592123081 0.512090104

**sdriscoll** · 12-05-2012, 11:50 PM

an interesting observations by the developers of RSEM (http://www.biomedcentral.com/1471-2105/12/323) is that for gene expression you are actually going to get better results from single-end, short reads (think 50bp reads) over long paired-end reads. naturally the paired-end reads provide much better evidence of isoform structures over single-end. in my tests in the past i've observed that if i aligned only the left side reads from paired data verses aligning the pairs there was negligible difference between the expressions. in reality if you're aligning just the left side of each pair it's not much different from what's going on with single end reads anyways. in both cases you're aligning a read from one side of a fragment.

to address your original question i'm not sure if HTSeq does anything about those random unmated alignments. i know that it needs pairs next to each other in the file in order for it to properly count (since each pair should count as 1 and not 2). if you have sorted the alignments by read name then in the case that both sides of of an unmated pair actually aligned their names would appear next to each other in the SAM file. that isn't the case in what you posted so my guess is HTSeq is going to count that 3rd alignment towards whatever feature it aligned to. if it were the case that both ends aligned but didn't pair i'm not sure what it would do. it might detect that the same read name aligned to different features and then throw it out.

i think your test in your second post demonstrates the improved alignment confidence of paired-end reads. it's interesting, however, that if you compare differential expression between the paired-end alignments and single-end alignments from the paired data the results are similar if not identical. the improved alignment accuracy is desirable for other types of analysis though such as splicing or mutations.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Htseq counting

Comment

Comment

Latest Articles

ad_right_rmr

News