Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intron coverage in RNAseq

    Hi All,

    Two questions about RNAseq coverage:

    1) Introns: I find that introns get quite a lot of reads mapping to them compared to the exons. Is this expected or is it a sign of something that went wrong? (I would expect introns to be almost invisible).

    2) Read distribution: The read count varies a lot in the exons while it seems quite uniform in the introns. Is there any explanation for this?

    I attach here a plot of read count vs position (mpileup) in two samples for a gene as an example of what I mean (at a glance, other genes show similar patterns).

    Reads & alignments: c.ca 30 million reads, 35 bp pair-end aligned with Tophat.

    I realize these issues must have been discussed a number of times but I couldn't find any threads/papers.

    Any and all comments welcome!
    Dario

    Attached Files

  • #2
    I would guess (and this is a guess) that you are including the "gap" part of your spliced reads in the coverage. That is, splice reads are being counted as covering the intron, which is def not what you want. Have you looked at your mappings in something like IGV, and does it show a ton of reads in your intronic regions?

    Comment


    • #3
      Originally posted by wjeck View Post
      I would guess (and this is a guess) that you are including the "gap" part of your spliced reads in the coverage. That is, splice reads are being counted as covering the intron, which is def not what you want. Have you looked at your mappings in something like IGV, and does it show a ton of reads in your intronic regions?
      Well thought wjeck, thanks so much! Here's below the picture from IGV.

      The plot I first showed is from mpileup. I thought samtools mpileup would have been aware of gaps (splice-junctions) as they are encoded in the CIGAR string, right? Instead it seems to count coverage where there should be gaps. This is the samtools command I executed. Am I doing something wrong? (Unless I'm missing something here, it surprises me that samtools counts reads where there are gaps!?)

      samtools mpileup -r 7:27658519-27661277 accepted_hits.bam > tnf.pileup

      Dario

      Attached Files

      Comment


      • #4
        I have have some RNA-seq data and i have been to talks involving RNA-seq data and it seems that introns seem to be retained.

        When I talked to a guy from PacBio presenting his data (where he had retained introns) he stated that this could be due to pre-mRNA being included in your sample or it could be a signal that these introns might be doing more (i.e. playing a regulatory role).

        It's unknown as of now but I've seen 3 different sets of data and they all have intron retention :-/.

        Comment


        • #5
          It is a tricky question and it is something a lot of people are seeing, but you almost have to take it on a case-by-case basis to figure out what might be going on.
          In addition to the pre-mRNA and retained intron possibilities, some of the reads may map to multiple places or map to repeat regions of the genome. I place less trust in those types of reads.

          Comment


          • #6
            Originally posted by dariober View Post
            Well thought wjeck, thanks so much! Here's below the picture from IGV.

            The plot I first showed is from mpileup. I thought samtools mpileup would have been aware of gaps (splice-junctions) as they are encoded in the CIGAR string, right? Instead it seems to count coverage where there should be gaps. This is the samtools command I executed. Am I doing something wrong? (Unless I'm missing something here, it surprises me that samtools counts reads where there are gaps!?)

            samtools mpileup -r 7:27658519-27661277 accepted_hits.bam > tnf.pileup

            Dario


            Yep, that's your problem. I don't know how to fix this because I haven't been working with RNA, but we had this problem when we were trying to use coverage to do indel detection (in a very naive way). We just ended up taking another route. Could you go through the SAM file and break the reads with introns into two separate records? It's a hack, but it might work.
            --will

            Comment


            • #7
              This picture, via Cancer Genome Workbench (cgwb.nci.nih.gov) shows rnaseq reads hitting introns. The wiggle plot show logarithmic coverage for a TCGA rnaseq samples. It may be immature rna or intronic control RNAexpression as others have theorized. This intronic rna expression appears quite common in many samples and most of the reads, in this case, appear to be "uniquely" mappable.

              Attached Files
              Last edited by Richard Finney; 05-17-2011, 09:08 AM. Reason: gramerr

              Comment


              • #8
                Could you show a SAM record that is being improperly counted? Curiosity here...

                Comment


                • #9
                  What I usually observe from public RNA-seq data is that about 10-15% of the non-spliced reads do overlap annotated introns. It can be much higher, depending on the library preparation. It also depends on the reference annotation (for instance Refseq is more restrictive than Ensembl).

                  Possible reasons:
                  - "background noise"
                  - "pervasive transcription"
                  - partially processed mRNAs
                  - genomic contamination
                  - functional non coding RNAs
                  - new transcripts / splicing isoforms
                  - degradation products (they can be polyAdenylated and even re-capped)
                  Last edited by steven; 05-19-2011, 02:36 AM.

                  Comment


                  • #10
                    Originally posted by nilshomer View Post
                    Could you show a SAM record that is being improperly counted? Curiosity here...
                    Hi nilshomer - Thank you and everyone else for discussion,

                    Here some lines from the SAM and pileup output just at the border between the first exon-intron. The CIGAR string of the reads spanning the intron have a 586 bp insertion, which is the size of the intron. These insertions are counted in the pileup file.

                    For the first time I realize that the insertion is coded as '>' or '<' in the pileup column reporting the bases. So a 'corrected' pileup should have at each position the read count minus the count of '>' and '<', right?

                    Any comments welcome!

                    All the best

                    Code:
                    EBRI093151_0001:7:65:657:1096#TCTCCC	73	7	27658852	255	35M	*	0	0	TGCACTTCGAGGTTATCGGCCCCCAGAAGGAAGAG	BCCCCBCBBCAAABCBA>@@BBCC@:<;A?0?@;>	NM:i:0	NH:i:1
                    EBRI093151_0001:7:10:722:420#TCTCTT	163	7	27658853	255	35M	=	27658864	0	GCACTTCGAGGTTATCGGCCCCCAGAAGGAAGAGT	AA+>BC@CACBBBBBBB;)@CBBAC>@1?&=;%=3	NM:i:1	NH:i:1
                    EBRI093151_0001:7:100:1536:385#CCCCCC	97	7	27658853	255	35M	=	27659701	0	GCACTTCGAGGTTATCGGCCCCCAGAAGGAAGAGT	BAAA5=>A9AB>?>-372?29BB?.':?A?)3)@2	NM:i:1	NH:i:1
                    EBRI093151_0001:7:105:1158:856#TTAACT	163	7	27658856	255	31M586N4M	=	27659472	0	CTTCGAGGTTATCGGCCCCCAGAAGGAAGAGTTTC	<C;?9?BCBC6B6BC>>CBACCCA8@1=@AC>C=*	NM:i:0	XS:A:+	NH:i:1
                    EBRI093151_0001:7:72:996:1595#TCTCTA	83	7	27658856	255	35M	=	27658780	0	CTTCGAGGTTATCGGCCCCCAGAAGGAAGAGTTGC	C4BBBA;BBBABBBBB@6@BB?BBBBCCABBBC@B	NM:i:2	NH:i:1
                    EBRI093151_0001:7:21:730:103#ACCTCC	163	7	27658859	255	28M586N7M	=	27659486	0	CGAGGTTATCGGCCCCCAGAAGGAAGAGTTTCCAG	ACCCC2CBBBBA@ABB@1><A@A1>;%=>=B7;)7	NM:i:0	XS:A:+	NH:i:1
                    EBRI093151_0001:7:42:94:608#TCCCCC	161	7	27658859	255	28M586N7M	=	27659700	0	CGAGGTTATCGGCCCCCAGAAGGAAGAGTTTCCAG	@B@CC?CBAAA?5>ACCA>9??A8@7;>3>B=?57	NM:i:0	XS:A:+	NH:i:1
                    EBRI093151_0001:7:117:1539:645#CCTCTC	97	7	27658859	255	28M586N7M	=	27659706	0	CGAGGTTATCGGCCCCCAGAAGGAAGAGTTTCCAG	BCB@A=AA;5?@BBBB=49?<>>=4628,=<A3##	NM:i:0	XS:A:+	NH:i:1
                    EBRI093151_0001:7:117:341:1130#CTTACC	147	7	27658862	255	25M586N10M	=	27658795	0	GGTTATCGGCCCCCAGAAGGAAGAGTTTCCAGCTG	C;=ABCC@@@>BCAAACC>??=BBC9>?ABBA=;@	NM:i:0	XS:A:+	NH:i:1
                    EBRI093151_0001:7:86:263:1733#TTTCTA	73	7	27658863	255	24M586N11M	*	0	0	GTTATCGGCCCCCAGAAGGAAGAGTTTCCAGCTGG	BBAB?A;1<B?B=82:92;:79785=72;648594	NM:i:0	XS:A:+	NH:i:1
                    EBRI093151_0001:7:10:722:420#TCTCTT	83	7	27658864	255	23M586N12M	=	27658853	0	TTATCGGCCCCCAGAAGGAAGAGTTTCCAGCTGGC	@C@ACCB?BA4ABCB@ABBBBBBBBAC;@BBBCCB	NM:i:0	XS:A:+	NH:i:1
                    EBRI093151_0001:7:58:1684:1481#TATCTA	83	7	27658864	255	23M586N12M	=	27658781	0	TTATCGGCCCCCAGAAGGAAGAGTTTCCAGCTGGC	ABA8CBB:7<ABC>CBAA@BBBBA=ABBCACABCB	NM:i:0	XS:A:+	NH:i:1
                    EBRI093151_0001:7:73:1052:643#CCCCCA	97	7	27658866	255	21M586N14M	=	27659703	0	ATCGGCCCCCAGAAGGAAGAGTTTCCAGCTGGCCC	BAA?A@A=BA?@>;=A=??=?4?=?:3;317>::5	NM:i:0	XS:A:+	NH:i:1
                    EBRI093151_0001:7:89:1185:1493#CCCCCT	99	7	27658866	255	21M586N14M	=	27659486	0	ATCGGCCCCCAGAAGGAAGAGTTTCCAGCTGGCCC	@AA?=A==A@2=<+B8(<@1@1@9?>1>=4=-.=:	NM:i:0	XS:A:+	NH:i:1
                    EBRI093151_0001:7:65:504:840#CCTCTT	99	7	27658867	255	20M586N15M	=	27659469	0	TCGGCCCCCAGAAGGAAGAGTTTCCAGCTGGCCCC	BABA<B?BB*@0;>69A@=97=BAB<6A?=9>=7@	NM:i:0	XS:A:+	NH:i:1
                    
                    
                    7	27658852	N	75	TTtTtTtttttTtttTTTtttttttTTTTttttttttttttttttttttttTtttttttttttttttTTttTttt	4AA>A?>CABA>=AAA69BA>B@=94>A@A@CB:A8ABABBAB@BA>B@BBAA7B=B:A8B>9@B?=AB@@?CB>
                    7	27658853	N	76	GGgGgGgggggGgggGGGgggggggGGGGggggggggggggggggggggggGgggggggggggggggGGggGggg^~G	9AB=ABBCABCBABAC:@CB@ABBA:BACBB>C:A@BCBC=BBBA?@ABB@B?0>:B4C@B>>BB@?=B@:BBBAA
                    7	27658854	N	76	CCcCcCcccccCcccCCCcccccccCCCCccccccccccccccccccccccCcccccccccccccccCCccCcccC	1=@A@ABCABBAAAB;99BA>BB=B>?@BBCAC@CCACBC?BCB=A?BCBC<B8@AA?CBBB@@BBB=B;B?ABBA
                    7	27658855	N	76	A$AaAaAaaaaaAaaaAAAaaaaaaaAAAAaaaaaaaaaaaaaaaaaaaaaaGaaaaaaaaaaaaaaaAAaaAaaaA	3A>>3B<:=B<B49=0=0A?@@8-;;?@B7BA@;>?:AA>=?9:.:9=C?B=<657=8?<;?8;=676;64A8=?+
                    7	27658856	N	77	CcCcCcccccCcccCCCcccccccCCCCccccccccccccccccccccccCcccccccccccccccCCccCcccC^~C^~c	=@AAACB@BAB?BB;B/BB?BA=A:BBB@CCCAB:@CCC>CCC@A>CCBC4B5C>BABA:BBABBB<?@B?B?B><C
                    7	27658857	N	77	TtTtTtttttTtttTTTtttttttTTTTttttttttttttttttttttttTtttttttttttttttTTttTtttTTt	?B<A?@?ABCB9AB>A>BA;A@A<.@?C9BB>8A3?B?B;ABA>?@?CA>=A4>5A9@A>;?:@??>BAAA:?<BC4
                    7	27658858	N	77	T$t$TtTtttttTtttTTTtttttttTTTTttttttttttttttttttttttTtttttttttttttttTTttTtttTTt	>A:?B@@<BCB:ABBB?A28@@?@?AAB%;BC=A<3BB=;?>8=<;?C>A=?=>:@0>A@>A>=6?ABB?A:>9C;B
                    7	27658859	N	76	C$cCcccccCcccCCCcccccccCCCCccccccccccccccccccccccCcccccccccccccccCCccCcccCCc^~C	@AACCAB@@@BB=B@C<=BBAB=ABA=BCCBCA@BC??CBB>A:BACB<A;C<BBCBBBB5@BC7ABBA@BB@?BA
                    7	27658860	N	75	g$GgggggGgggGGGgggggggGGGGggggggggggggggggggggggGgggggggggggggggGGggGgggGGgG	BBB@BBA8@BB;BABB99BABAAB:B?CCACBBCB@BCBBAA>?BBB@B>B>@;BBB?AAC@@:CBABB>BC9BC
                    7	27658861	N	74	A$a$a$aaaAaaaAAAaaaaaaaAAAAaaaaaaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaAAaaAaaaAAaA	3BBBBB3=B@2@<BB>=AAA.?52A@BAACCBABCBCBBABA>@BB8B@BAB?;BBBBABBA(>B==C?BA?AC
                    7	27658862	N	72	g$g$g$G$g$ggGGGgggggggGGGGggggggggggggggggggggggGgggggggggggggggGGggGgggGGgG^~g	@B@9@AB>A@CBABCBBA?A:BA>BBCBACBCBCBA@BA@BAB9BBBAA?CBBAB=BBA<CBAACBACB;CC
                    7	27658863	N	67	g$g$G$GGgggggggGGGGggggggggggggggggggggggGgggggggggggggggGGggGgggGGgGg	@@ACABB74AAB?>B9B8CC=CB>CACBBA8AB?4CBABA>AAA2CBA;@ABA@=CA?BBB<BCBC;
                    7	27658864	N	66	TTtttttttTTTTttttttttttttttttttttttTtttttttttttttttCTttTtttTTtTt^~t^~t	?)CB:BBBB(3499ACC/CB@BBB@CCC>BBCCBA1B?C7B7BBAB@>AB>(>@B4CC>BBB2=@A
                    7	27658865	N	66	TTtttttttTTTTttttttttttttttttttttttTtttttttttttttttTTttTtttTTtTttt	@2CB=9B>B1ABBB=>B/C>?>B+2CBB@BCBCBC5?BB?ABBB@B>BB>>6@AABCBABCBCACB
                    7	27658866	N	67	A$A$a$aaaaaaAAAAaaaaaaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaAAaaAaaaAAaAaaa^~A	@ABBB>BAC:A>B@=CB;ACBCA>;CAC=BCABBA2?BCBA@B@AB@AAB@7=BBACBBB6ABB@A@
                    7	27658867	N	67	t$t$t$t$t$t$TTTTttttttttttttttttttttttTtttttttttttttttTTttTtttTTtTtttT^~T^~T^~t	BBBB@B5BAC>CCB?CABBBCA?@ABB@BCC>1BABBB7B@[email protected]?BBBBCA8ABA>
                    7	27658868	N	63	C$C$C$C$c$cccccccccccccccccccccCcccccccccccccccCCccCcccCCcCcccCCCc^~C^~c	48>@AACCBCCACC>A?=ABBABAB@,B@BAAABABAABACB4?@B0C?BB6BBCCCAA@AA;
                    7	27658869	N	58	g$g$g$g$g$g$g$g$g$g$g$g$g$g$g$g$gggggGgggggggggggggggGGggGgggGGgGgggGGGgGg	BABBBBBAB?A?BBABABCC?2BABBA@BCBB>BCBB8BBB8CBABBBB@CB?BBA@=
                    7	27658870	N	45	g$g$g$g$g$GgagggggggggggggGGggGgggGGgGgggGGGgGg^~g^~g^~g	BBBBB5@>BABA@B?A@ACBB.@B:@BBC;CBA@BB=AB;A)6=:
                    7	27658871	N	40	C$c$c$cccccccccccccCCccCcccCCcCcccCCCcCcccc	.BBCBABBB8?BBCCB>:AA9CCB)>B@@?:A<9>A@7A?
                    7	27658872	N	38	c$c$c$c$c$c$c$ccccccCCccCcccCCcCcccCCCcCcccc^~c	BBABBB>AABCCB9@@C6CCB@>@A>B7=BA=BB&;18
                    7	27658873	N	34	c$a$c$c$ccCCccCcccCCcCcccCCCcCccccc^~C^~C^~c	BBBBCB-ABC=CCACC6BBA<=?B;B@6;>?BB#
                    7	27658874	N	30	c$a$CCccCcccCCcCcccCCCcCcccccCCc	BB8BBB=CC>BB@BC4AABBBA(<@?=CC#
                    7	27658875	N	29	C$C$ccCcccCCcCcccCCCcCcccccCCc^~C	::BC8CCCBAB@AAB@BA@B3>A==CB#B
                    7	27658876	N	27	g$a$AaaaAAaAaaaAAAaAaaaaaAAaA	BB.CCCACB1ABC2*+@@+ABBBCB#B
                    7	27658877	N	30	GgggGGgGgggGGGgGgggggGGgG^~g^~g^~g^~g^~g	1ACCCC?>AC>=@@=@,=B;BC@#B9A42:
                    7	27658878	N	32	AaaaAAaAaaaAAAaAaaaaaAAaAaaaaa^~a^~a	1CCC>CB<CBC<0AB3<>A=?A9#B>B:=;>#
                    7	27658879	N	37	AaaaAAaAaaaAAAaAaaaaaAAaAaaaaaaa^~a^~a^~a^~a^~a	1ACB@ABAC@B+;B@@*=A??B>#A?A9A<B62<+9>
                    7	27658880	N	42	G$g$g$gGGgGgggGGGgGgggggGGgGgggggggggggg^~g^~g^~g^~g^~g	/>BC18B@>AAB>B?B>A>A>BA#B=B@?A?7;;:8A2:;>?
                    7	27658881	N	39	gGGgGgggGGGgGgggggGGgGggggggggggggggggg	B?@BA?BA86C9A5=B1=B>;AAB<BAA3=<33<2A7;?
                    7	27658882	N	39	aAAaAaaaAAAaAaaaaaAAaAaaaaaaaaaaaaaaaaa	B&1C1?B@(9<B=@<AA?B=5@AC8A?@3=;;<B1@;@A
                    7	27658883	N	39	aAAaAaaaAAAaAaaaaaAAaAaaaaaaaaaaaaaaaaa	C==C>=BB<AB;B3?A?AA>3B@CB@AB97?:;37?<;@
                    7	27658884	N	39	g$GGgGgggGGGgGgggggGGgGggggggggggggggggg	B;@A;BBB@@@>?7AA3AC=/B>B?A@=05=1;=7=>=@
                    7	27658885	N	38	AAaAaaaAAAaAgaaaaAAaAaaaaaaaaaaaaaaaaa	%AB%BBB1=BAA(9AAAC=4A@B=B>A7>=6:@0?<=B
                    7	27658886	N	38	GGgGgggGGGgGgggggGGgGggggggggggggggggg	=CB=CBB@9B6A@8B;:C?4B?B8?;A555:8=-?5A<
                    7	27658887	N	38	T$>t><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<<	3>B>9BA17A0>7A?AAB?:.:@?B>@>8&;:96A??0
                    7	27658888	N	37	>t><<<>>><>t<<<<>><><<<<<<<<<<<<<<<<<	>C>9BA17A0>AA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658889	N	37	>g><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<<	>@>9BA17A0>3A?AAB?:.:@?B>@>8&;:96A??0
                    7	27658890	N	37	>c$><<<>>><>a<<<<>><><<<<<<<<<<<<<<<<<	>B>9BA17A0><A?AAB?:.:@?B>@>8&;:96A??0
                    7	27658891	N	36	>><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658892	N	36	>><<<>>><>c<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658893	N	36	>><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658894	N	36	>><<<>>><>c<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658895	N	36	>><<<>>><>c<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658896	N	36	>><<<>>><>t<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658897	N	36	>><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658898	N	36	>><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658899	N	36	>><<<>>><>c<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658900	N	36	>><<<>>><>c<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658901	N	36	>><<<>>><>a<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658902	N	36	>><<<>>><>g$<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>AA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658903	N	35	>><<<>>><><<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>A?AAB?:.:@?B>@>8&;:96A??0
                    7	27658904	N	35	>><<<>>><><<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>A?AAB?:.:@?B>@>8&;:96A??0
                    7	27658905	N	35	>><<<>>><><<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>A?AAB?:.:@?B>@>8&;:96A??0
                    7	27658906	N	35	>><<<>>><><<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>A?AAB?:.:@?B>@>8&;:96A??0

                    Comment


                    • #11
                      Hi dariober, I was wondering if this problem of removing the gaps in the pileup step has ever been resolved for you? I ran into the same issue. Any tip would be greatly appreciated!

                      Comment


                      • #12
                        has anyone seen any discussion of the influence of intronic reads on transcript prediction? Or I suppose, mention of how people are dealing with this? I've raised this issue elsewhere and continue to be plagued by it... i really can't find any automated solution for dealing with this. it seems we have to get down to business and manually annotate genes... fun!

                        Comment


                        • #13
                          The ALEXA-seq and Trans-ABySS manuscripts discuss the implications of intronic mapping reads for measuring expression and transcript assembly respectively. Due to all of the possible sources of sequences that will map to introns (nicely summarized by Steven above), it is important to consider whether the signal you see for an exon is really above that of the surrounding areas. RNA-seq is not a zero noise data type. . You will always have some amount of contaminating genomic DNA and unprocessed RNA (hnRNA). There are molecular strategies for reducing these (e.g. DNAseI treatment, cytoplasmic RNA isolation, polyA+ selection, etc.) but there will always be some. So, you expect signal across the genome in an RNA-seq library for purely molecular reasons and analytical sources add on top of this (e.g. multi-mapping reads). You can actually see hints of the difference between intergenic noise and intronic noise by examining the data. Intronic noise is expected to be correlated with gene expression level (more expression results in more unprocessed RNA when you take a snapshot of the transcriptome). This is exactly what we see in our RNA-seq libraries. This complicates the interpretation of whether an intron is really being retained as part of an alternative isoform. As others have mentioned, local context is important when interpreting RNA-seq data.

                          For all of these reasons, one can argue that exon-exon junctions might be a useful metric for evaluating RNA-seq library quality. The reason being that exon-exon sequences from real transcripts for the most part do not occur in the genome (and therefore not in unprocessed transcripts either). We also routinely evaluate exon-to-intron and exon-to-intergenic signal, among many other metrics. For example here is a report for a:
                          Human Breast vHMECS RNA-seq library.

                          Full disclosure. I am an author on the two papers linked above. Perhaps others can link to others that discuss this topic!

                          Comment


                          • #14
                            Maybe HTseq-count deals with counting spliced reads:

                            From this page:




                            "Make sure to use a splicing-aware aligner such as TopHat. HTSeq-count makes full use of the information in the CIGAR field."

                            Chris

                            Comment


                            • #15
                              Here is an answer to why there are so many intronic reads: it's because RNA-seq measures ongoing transcription!

                              We have recently published a paper in Nat Struct Mol Biol where this is described. See also this thread for more info.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Advancing Precision Medicine for Rare Diseases in Children
                                by seqadmin




                                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                12-16-2024, 07:57 AM
                              • seqadmin
                                Recent Advances in Sequencing Technologies
                                by seqadmin



                                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                                Long-Read Sequencing
                                Long-read sequencing has seen remarkable advancements,...
                                12-02-2024, 01:49 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 12-17-2024, 10:28 AM
                              0 responses
                              33 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-13-2024, 08:24 AM
                              0 responses
                              48 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-12-2024, 07:41 AM
                              0 responses
                              34 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-11-2024, 07:45 AM
                              0 responses
                              46 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X