Seqanswers Leaderboard Ad

**wjeck** · 05-17-2011, 05:48 AM

I would guess (and this is a guess) that you are including the "gap" part of your spliced reads in the coverage. That is, splice reads are being counted as covering the intron, which is def not what you want. Have you looked at your mappings in something like IGV, and does it show a ton of reads in your intronic regions?

**dariober** · 05-17-2011, 06:36 AM

Originally posted by wjeck View Post

I would guess (and this is a guess) that you are including the "gap" part of your spliced reads in the coverage. That is, splice reads are being counted as covering the intron, which is def not what you want. Have you looked at your mappings in something like IGV, and does it show a ton of reads in your intronic regions?

Well thought wjeck, thanks so much! Here's below the picture from IGV.

The plot I first showed is from mpileup. I thought samtools mpileup would have been aware of gaps (splice-junctions) as they are encoded in the CIGAR string, right? Instead it seems to count coverage where there should be gaps. This is the samtools command I executed. Am I doing something wrong? (Unless I'm missing something here, it surprises me that samtools counts reads where there are gaps!?)

samtools mpileup -r 7:27658519-27661277 accepted_hits.bam > tnf.pileup

Dario

Attached Files

igv.jpg (34.1 KB, 1461 views)

**zorph** · 05-17-2011, 06:50 AM

I have have some RNA-seq data and i have been to talks involving RNA-seq data and it seems that introns seem to be retained.

When I talked to a guy from PacBio presenting his data (where he had retained introns) he stated that this could be due to pre-mRNA being included in your sample or it could be a signal that these introns might be doing more (i.e. playing a regulatory role).

It's unknown as of now but I've seen 3 different sets of data and they all have intron retention :-/.

**pbluescript** · 05-17-2011, 06:57 AM

It is a tricky question and it is something a lot of people are seeing, but you almost have to take it on a case-by-case basis to figure out what might be going on.
In addition to the pre-mRNA and retained intron possibilities, some of the reads may map to multiple places or map to repeat regions of the genome. I place less trust in those types of reads.

**wjeck** · 05-17-2011, 07:33 AM

Originally posted by dariober View Post

Well thought wjeck, thanks so much! Here's below the picture from IGV.

The plot I first showed is from mpileup. I thought samtools mpileup would have been aware of gaps (splice-junctions) as they are encoded in the CIGAR string, right? Instead it seems to count coverage where there should be gaps. This is the samtools command I executed. Am I doing something wrong? (Unless I'm missing something here, it surprises me that samtools counts reads where there are gaps!?)

samtools mpileup -r 7:27658519-27661277 accepted_hits.bam > tnf.pileup

Dario

Yep, that's your problem. I don't know how to fix this because I haven't been working with RNA, but we had this problem when we were trying to use coverage to do indel detection (in a very naive way). We just ended up taking another route. Could you go through the SAM file and break the reads with introns into two separate records? It's a hack, but it might work.
--will

**Richard Finney** · 05-17-2011, 07:50 AM

This picture, via Cancer Genome Workbench (cgwb.nci.nih.gov) shows rnaseq reads hitting introns. The wiggle plot show logarithmic coverage for a TCGA rnaseq samples. It may be immature rna or intronic control RNAexpression as others have theorized. This intronic rna expression appears quite common in many samples and most of the reads, in this case, appear to be "uniquely" mappable.

Attached Files

intron.jpg (33.1 KB, 311 views)

**nilshomer** · 05-17-2011, 09:06 AM

Could you show a SAM record that is being improperly counted? Curiosity here...

**steven** · 05-19-2011, 02:34 AM

What I usually observe from public RNA-seq data is that about 10-15% of the non-spliced reads do overlap annotated introns. It can be much higher, depending on the library preparation. It also depends on the reference annotation (for instance Refseq is more restrictive than Ensembl).

Possible reasons:
- "background noise"
- "pervasive transcription"
- partially processed mRNAs
- genomic contamination
- functional non coding RNAs
- new transcripts / splicing isoforms
- degradation products (they can be polyAdenylated and even re-capped)

**dariober** · 05-19-2011, 06:01 AM

Originally posted by nilshomer View Post

Could you show a SAM record that is being improperly counted? Curiosity here...

Hi nilshomer - Thank you and everyone else for discussion,

Here some lines from the SAM and pileup output just at the border between the first exon-intron. The CIGAR string of the reads spanning the intron have a 586 bp insertion, which is the size of the intron. These insertions are counted in the pileup file.

For the first time I realize that the insertion is coded as '>' or '<' in the pileup column reporting the bases. So a 'corrected' pileup should have at each position the read count minus the count of '>' and '<', right?

Any comments welcome!

All the best

Code:

EBRI093151_0001:7:65:657:1096#TCTCCC	73	7	27658852	255	35M	*	0	0	TGCACTTCGAGGTTATCGGCCCCCAGAAGGAAGAG	BCCCCBCBBCAAABCBA>@@BBCC@:<;A?0?@;>	NM:i:0	NH:i:1
EBRI093151_0001:7:10:722:420#TCTCTT	163	7	27658853	255	35M	=	27658864	0	GCACTTCGAGGTTATCGGCCCCCAGAAGGAAGAGT	AA+>BC@CACBBBBBBB;)@CBBAC>@1?&=;%=3	NM:i:1	NH:i:1
EBRI093151_0001:7:100:1536:385#CCCCCC	97	7	27658853	255	35M	=	27659701	0	GCACTTCGAGGTTATCGGCCCCCAGAAGGAAGAGT	BAAA5=>A9AB>?>-372?29BB?.':?A?)3)@2	NM:i:1	NH:i:1
EBRI093151_0001:7:105:1158:856#TTAACT	163	7	27658856	255	31M586N4M	=	27659472	0	CTTCGAGGTTATCGGCCCCCAGAAGGAAGAGTTTC	<C;?9?BCBC6B6BC>>CBACCCA8@1=@AC>C=*	NM:i:0	XS:A:+	NH:i:1
EBRI093151_0001:7:72:996:1595#TCTCTA	83	7	27658856	255	35M	=	27658780	0	CTTCGAGGTTATCGGCCCCCAGAAGGAAGAGTTGC	C4BBBA;BBBABBBBB@6@BB?BBBBCCABBBC@B	NM:i:2	NH:i:1
EBRI093151_0001:7:21:730:103#ACCTCC	163	7	27658859	255	28M586N7M	=	27659486	0	CGAGGTTATCGGCCCCCAGAAGGAAGAGTTTCCAG	ACCCC2CBBBBA@ABB@1><A@A1>;%=>=B7;)7	NM:i:0	XS:A:+	NH:i:1
EBRI093151_0001:7:42:94:608#TCCCCC	161	7	27658859	255	28M586N7M	=	27659700	0	CGAGGTTATCGGCCCCCAGAAGGAAGAGTTTCCAG	@B@CC?CBAAA?5>ACCA>9??A8@7;>3>B=?57	NM:i:0	XS:A:+	NH:i:1
EBRI093151_0001:7:117:1539:645#CCTCTC	97	7	27658859	255	28M586N7M	=	27659706	0	CGAGGTTATCGGCCCCCAGAAGGAAGAGTTTCCAG	BCB@A=AA;5?@BBBB=49?<>>=4628,=<A3##	NM:i:0	XS:A:+	NH:i:1
EBRI093151_0001:7:117:341:1130#CTTACC	147	7	27658862	255	25M586N10M	=	27658795	0	GGTTATCGGCCCCCAGAAGGAAGAGTTTCCAGCTG	C;=ABCC@@@>BCAAACC>??=BBC9>?ABBA=;@	NM:i:0	XS:A:+	NH:i:1
EBRI093151_0001:7:86:263:1733#TTTCTA	73	7	27658863	255	24M586N11M	*	0	0	GTTATCGGCCCCCAGAAGGAAGAGTTTCCAGCTGG	BBAB?A;1<B?B=82:92;:79785=72;648594	NM:i:0	XS:A:+	NH:i:1
EBRI093151_0001:7:10:722:420#TCTCTT	83	7	27658864	255	23M586N12M	=	27658853	0	TTATCGGCCCCCAGAAGGAAGAGTTTCCAGCTGGC	@C@ACCB?BA4ABCB@ABBBBBBBBAC;@BBBCCB	NM:i:0	XS:A:+	NH:i:1
EBRI093151_0001:7:58:1684:1481#TATCTA	83	7	27658864	255	23M586N12M	=	27658781	0	TTATCGGCCCCCAGAAGGAAGAGTTTCCAGCTGGC	ABA8CBB:7<ABC>CBAA@BBBBA=ABBCACABCB	NM:i:0	XS:A:+	NH:i:1
EBRI093151_0001:7:73:1052:643#CCCCCA	97	7	27658866	255	21M586N14M	=	27659703	0	ATCGGCCCCCAGAAGGAAGAGTTTCCAGCTGGCCC	BAA?A@A=BA?@>;=A=??=?4?=?:3;317>::5	NM:i:0	XS:A:+	NH:i:1
EBRI093151_0001:7:89:1185:1493#CCCCCT	99	7	27658866	255	21M586N14M	=	27659486	0	ATCGGCCCCCAGAAGGAAGAGTTTCCAGCTGGCCC	@AA?=A==A@2=<+B8(<@1@1@9?>1>=4=-.=:	NM:i:0	XS:A:+	NH:i:1
EBRI093151_0001:7:65:504:840#CCTCTT	99	7	27658867	255	20M586N15M	=	27659469	0	TCGGCCCCCAGAAGGAAGAGTTTCCAGCTGGCCCC	BABA<B?BB*@0;>69A@=97=BAB<6A?=9>=7@	NM:i:0	XS:A:+	NH:i:1


7	27658852	N	75	TTtTtTtttttTtttTTTtttttttTTTTttttttttttttttttttttttTtttttttttttttttTTttTttt	4AA>A?>CABA>=AAA69BA>B@=94>A@A@CB:A8ABABBAB@BA>B@BBAA7B=B:A8B>9@B?=AB@@?CB>
7	27658853	N	76	GGgGgGgggggGgggGGGgggggggGGGGggggggggggggggggggggggGgggggggggggggggGGggGggg^~G	9AB=ABBCABCBABAC:@CB@ABBA:BACBB>C:A@BCBC=BBBA?@ABB@B?0>:B4C@B>>BB@?=B@:BBBAA
7	27658854	N	76	CCcCcCcccccCcccCCCcccccccCCCCccccccccccccccccccccccCcccccccccccccccCCccCcccC	1=@A@ABCABBAAAB;99BA>BB=B>?@BBCAC@CCACBC?BCB=A?BCBC<B8@AA?CBBB@@BBB=B;B?ABBA
7	27658855	N	76	A$AaAaAaaaaaAaaaAAAaaaaaaaAAAAaaaaaaaaaaaaaaaaaaaaaaGaaaaaaaaaaaaaaaAAaaAaaaA	3A>>3B<:=B<B49=0=0A?@@8-;;?@B7BA@;>?:AA>=?9:.:9=C?B=<657=8?<;?8;=676;64A8=?+
7	27658856	N	77	CcCcCcccccCcccCCCcccccccCCCCccccccccccccccccccccccCcccccccccccccccCCccCcccC^~C^~c	=@AAACB@BAB?BB;B/BB?BA=A:BBB@CCCAB:@CCC>CCC@A>CCBC4B5C>BABA:BBABBB<?@B?B?B><C
7	27658857	N	77	TtTtTtttttTtttTTTtttttttTTTTttttttttttttttttttttttTtttttttttttttttTTttTtttTTt	?B<A?@?ABCB9AB>A>BA;A@A<.@?C9BB>8A3?B?B;ABA>?@?CA>=A4>5A9@A>;?:@??>BAAA:?<BC4
7	27658858	N	77	T$t$TtTtttttTtttTTTtttttttTTTTttttttttttttttttttttttTtttttttttttttttTTttTtttTTt	>A:?B@@<BCB:ABBB?A28@@?@?AAB%;BC=A<3BB=;?>8=<;?C>A=?=>:@0>A@>A>=6?ABB?A:>9C;B
7	27658859	N	76	C$cCcccccCcccCCCcccccccCCCCccccccccccccccccccccccCcccccccccccccccCCccCcccCCc^~C	@AACCAB@@@BB=B@C<=BBAB=ABA=BCCBCA@BC??CBB>A:BACB<A;C<BBCBBBB5@BC7ABBA@BB@?BA
7	27658860	N	75	g$GgggggGgggGGGgggggggGGGGggggggggggggggggggggggGgggggggggggggggGGggGgggGGgG	BBB@BBA8@BB;BABB99BABAAB:B?CCACBBCB@BCBBAA>?BBB@B>B>@;BBB?AAC@@:CBABB>BC9BC
7	27658861	N	74	A$a$a$aaaAaaaAAAaaaaaaaAAAAaaaaaaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaAAaaAaaaAAaA	3BBBBB3=B@2@<BB>=AAA.?52A@BAACCBABCBCBBABA>@BB8B@BAB?;BBBBABBA(>B==C?BA?AC
7	27658862	N	72	g$g$g$G$g$ggGGGgggggggGGGGggggggggggggggggggggggGgggggggggggggggGGggGgggGGgG^~g	@B@9@AB>A@CBABCBBA?A:BA>BBCBACBCBCBA@BA@BAB9BBBAA?CBBAB=BBA<CBAACBACB;CC
7	27658863	N	67	g$g$G$GGgggggggGGGGggggggggggggggggggggggGgggggggggggggggGGggGgggGGgGg	@@ACABB74AAB?>B9B8CC=CB>CACBBA8AB?4CBABA>AAA2CBA;@ABA@=CA?BBB<BCBC;
7	27658864	N	66	TTtttttttTTTTttttttttttttttttttttttTtttttttttttttttCTttTtttTTtTt^~t^~t	?)CB:BBBB(3499ACC/CB@BBB@CCC>BBCCBA1B?C7B7BBAB@>AB>(>@B4CC>BBB2=@A
7	27658865	N	66	TTtttttttTTTTttttttttttttttttttttttTtttttttttttttttTTttTtttTTtTttt	@2CB=9B>B1ABBB=>B/C>?>B+2CBB@BCBCBC5?BB?ABBB@B>BB>>6@AABCBABCBCACB
7	27658866	N	67	A$A$a$aaaaaaAAAAaaaaaaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaAAaaAaaaAAaAaaa^~A	@ABBB>BAC:A>B@=CB;ACBCA>;CAC=BCABBA2?BCBA@B@AB@AAB@7=BBACBBB6ABB@A@
7	27658867	N	67	t$t$t$t$t$t$TTTTttttttttttttttttttttttTtttttttttttttttTTttTtttTTtTtttT^~T^~T^~t	BBBB@B5BAC>CCB?CABBBCA?@ABB@BCC>1BABBB7B@[email protected]?BBBBCA8ABA>
7	27658868	N	63	C$C$C$C$c$cccccccccccccccccccccCcccccccccccccccCCccCcccCCcCcccCCCc^~C^~c	48>@AACCBCCACC>A?=ABBABAB@,B@BAAABABAABACB4?@B0C?BB6BBCCCAA@AA;
7	27658869	N	58	g$g$g$g$g$g$g$g$g$g$g$g$g$g$g$g$gggggGgggggggggggggggGGggGgggGGgGgggGGGgGg	BABBBBBAB?A?BBABABCC?2BABBA@BCBB>BCBB8BBB8CBABBBB@CB?BBA@=
7	27658870	N	45	g$g$g$g$g$GgagggggggggggggGGggGgggGGgGgggGGGgGg^~g^~g^~g	BBBBB5@>BABA@B?A@ACBB.@B:@BBC;CBA@BB=AB;A)6=:
7	27658871	N	40	C$c$c$cccccccccccccCCccCcccCCcCcccCCCcCcccc	.BBCBABBB8?BBCCB>:AA9CCB)>B@@?:A<9>A@7A?
7	27658872	N	38	c$c$c$c$c$c$c$ccccccCCccCcccCCcCcccCCCcCcccc^~c	BBABBB>AABCCB9@@C6CCB@>@A>B7=BA=BB&;18
7	27658873	N	34	c$a$c$c$ccCCccCcccCCcCcccCCCcCccccc^~C^~C^~c	BBBBCB-ABC=CCACC6BBA<=?B;B@6;>?BB#
7	27658874	N	30	c$a$CCccCcccCCcCcccCCCcCcccccCCc	BB8BBB=CC>BB@BC4AABBBA(<@?=CC#
7	27658875	N	29	C$C$ccCcccCCcCcccCCCcCcccccCCc^~C	::BC8CCCBAB@AAB@BA@B3>A==CB#B
7	27658876	N	27	g$a$AaaaAAaAaaaAAAaAaaaaaAAaA	BB.CCCACB1ABC2*+@@+ABBBCB#B
7	27658877	N	30	GgggGGgGgggGGGgGgggggGGgG^~g^~g^~g^~g^~g	1ACCCC?>AC>=@@=@,=B;BC@#B9A42:
7	27658878	N	32	AaaaAAaAaaaAAAaAaaaaaAAaAaaaaa^~a^~a	1CCC>CB<CBC<0AB3<>A=?A9#B>B:=;>#
7	27658879	N	37	AaaaAAaAaaaAAAaAaaaaaAAaAaaaaaaa^~a^~a^~a^~a^~a	1ACB@ABAC@B+;B@@*=A??B>#A?A9A<B62<+9>
7	27658880	N	42	G$g$g$gGGgGgggGGGgGgggggGGgGgggggggggggg^~g^~g^~g^~g^~g	/>BC18B@>AAB>B?B>A>A>BA#B=B@?A?7;;:8A2:;>?
7	27658881	N	39	gGGgGgggGGGgGgggggGGgGggggggggggggggggg	B?@BA?BA86C9A5=B1=B>;AAB<BAA3=<33<2A7;?
7	27658882	N	39	aAAaAaaaAAAaAaaaaaAAaAaaaaaaaaaaaaaaaaa	B&1C1?B@(9<B=@<AA?B=5@AC8A?@3=;;<B1@;@A
7	27658883	N	39	aAAaAaaaAAAaAaaaaaAAaAaaaaaaaaaaaaaaaaa	C==C>=BB<AB;B3?A?AA>3B@CB@AB97?:;37?<;@
7	27658884	N	39	g$GGgGgggGGGgGgggggGGgGggggggggggggggggg	B;@A;BBB@@@>?7AA3AC=/B>B?A@=05=1;=7=>=@
7	27658885	N	38	AAaAaaaAAAaAgaaaaAAaAaaaaaaaaaaaaaaaaa	%AB%BBB1=BAA(9AAAC=4A@B=B>A7>=6:@0?<=B
7	27658886	N	38	GGgGgggGGGgGgggggGGgGggggggggggggggggg	=CB=CBB@9B6A@8B;:C?4B?B8?;A555:8=-?5A<
7	27658887	N	38	T$>t><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<<	3>B>9BA17A0>7A?AAB?:.:@?B>@>8&;:96A??0
7	27658888	N	37	>t><<<>>><>t<<<<>><><<<<<<<<<<<<<<<<<	>C>9BA17A0>AA?AAB?:.:@?B>@>8&;:96A??0
7	27658889	N	37	>g><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<<	>@>9BA17A0>3A?AAB?:.:@?B>@>8&;:96A??0
7	27658890	N	37	>c$><<<>>><>a<<<<>><><<<<<<<<<<<<<<<<<	>B>9BA17A0><A?AAB?:.:@?B>@>8&;:96A??0
7	27658891	N	36	>><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
7	27658892	N	36	>><<<>>><>c<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
7	27658893	N	36	>><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
7	27658894	N	36	>><<<>>><>c<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
7	27658895	N	36	>><<<>>><>c<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
7	27658896	N	36	>><<<>>><>t<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
7	27658897	N	36	>><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
7	27658898	N	36	>><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
7	27658899	N	36	>><<<>>><>c<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
7	27658900	N	36	>><<<>>><>c<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
7	27658901	N	36	>><<<>>><>a<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
7	27658902	N	36	>><<<>>><>g$<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>AA?AAB?:.:@?B>@>8&;:96A??0
7	27658903	N	35	>><<<>>><><<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>A?AAB?:.:@?B>@>8&;:96A??0
7	27658904	N	35	>><<<>>><><<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>A?AAB?:.:@?B>@>8&;:96A??0
7	27658905	N	35	>><<<>>><><<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>A?AAB?:.:@?B>@>8&;:96A??0
7	27658906	N	35	>><<<>>><><<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>A?AAB?:.:@?B>@>8&;:96A??0

**arecht** · 09-01-2011, 09:50 AM

Hi dariober, I was wondering if this problem of removing the gaps in the pileup step has ever been resolved for you? I ran into the same issue. Any tip would be greatly appreciated!

**chrisbala** · 09-14-2011, 01:16 PM

has anyone seen any discussion of the influence of intronic reads on transcript prediction? Or I suppose, mention of how people are dealing with this? I've raised this issue elsewhere and continue to be plagued by it... i really can't find any automated solution for dealing with this. it seems we have to get down to business and manually annotate genes... fun!

**malachig** · 09-15-2011, 10:46 AM

The ALEXA-seq and Trans-ABySS manuscripts discuss the implications of intronic mapping reads for measuring expression and transcript assembly respectively. Due to all of the possible sources of sequences that will map to introns (nicely summarized by Steven above), it is important to consider whether the signal you see for an exon is really above that of the surrounding areas. RNA-seq is not a zero noise data type.

. You will always have some amount of contaminating genomic DNA and unprocessed RNA (hnRNA). There are molecular strategies for reducing these (e.g. DNAseI treatment, cytoplasmic RNA isolation, polyA+ selection, etc.) but there will always be some. So, you expect signal across the genome in an RNA-seq library for purely molecular reasons and analytical sources add on top of this (e.g. multi-mapping reads). You can actually see hints of the difference between intergenic noise and intronic noise by examining the data. Intronic noise is expected to be correlated with gene expression level (more expression results in more unprocessed RNA when you take a snapshot of the transcriptome). This is exactly what we see in our RNA-seq libraries. This complicates the interpretation of whether an intron is really being retained as part of an alternative isoform. As others have mentioned, local context is important when interpreting RNA-seq data.

For all of these reasons, one can argue that exon-exon junctions might be a useful metric for evaluating RNA-seq library quality. The reason being that exon-exon sequences from real transcripts for the most part do not occur in the genome (and therefore not in unprocessed transcripts either). We also routinely evaluate exon-to-intron and exon-to-intergenic signal, among many other metrics. For example here is a report for a:
Human Breast vHMECS RNA-seq library.

Full disclosure. I am an author on the two papers linked above. Perhaps others can link to others that discuss this topic!

**cjp** · 11-25-2011, 05:22 AM

Maybe HTseq-count deals with counting spliced reads:

From this page:

http://www-huber.embl.de/users/anders/HTSeq/doc/count.html

"Make sure to use a splicing-aware aligner such as TopHat. HTSeq-count makes full use of the information in the CIGAR field."

Chris

**adameur** · 11-28-2011, 02:29 PM

Here is an answer to why there are so many intronic reads: it's because RNA-seq measures ongoing transcription!

We have recently published a paper in Nat Struct Mol Biol where this is described. See also this thread for more info.

Topics	Statistics	Last Post
Bacterial Timeline Study Suggests Oxygen Use Preceded Photosynthesis by seqadmin Started by seqadmin, Today, 12:59 PM	0 responses 6 views 0 reactions	Last Post by seqadmin Today, 12:59 PM
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Yesterday, 10:17 AM	0 responses 8 views 0 reactions	Last Post by seqadmin Yesterday, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 60 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM

Seqanswers Leaderboard Ad

Intron coverage in RNAseq

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News