Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • dariober
    Senior Member
    • May 2010
    • 311

    Intron coverage in RNAseq

    Hi All,

    Two questions about RNAseq coverage:

    1) Introns: I find that introns get quite a lot of reads mapping to them compared to the exons. Is this expected or is it a sign of something that went wrong? (I would expect introns to be almost invisible).

    2) Read distribution: The read count varies a lot in the exons while it seems quite uniform in the introns. Is there any explanation for this?

    I attach here a plot of read count vs position (mpileup) in two samples for a gene as an example of what I mean (at a glance, other genes show similar patterns).

    Reads & alignments: c.ca 30 million reads, 35 bp pair-end aligned with Tophat.

    I realize these issues must have been discussed a number of times but I couldn't find any threads/papers.

    Any and all comments welcome!
    Dario

    Attached Files
  • wjeck
    Member
    • Mar 2009
    • 39

    #2
    I would guess (and this is a guess) that you are including the "gap" part of your spliced reads in the coverage. That is, splice reads are being counted as covering the intron, which is def not what you want. Have you looked at your mappings in something like IGV, and does it show a ton of reads in your intronic regions?

    Comment

    • dariober
      Senior Member
      • May 2010
      • 311

      #3
      Originally posted by wjeck View Post
      I would guess (and this is a guess) that you are including the "gap" part of your spliced reads in the coverage. That is, splice reads are being counted as covering the intron, which is def not what you want. Have you looked at your mappings in something like IGV, and does it show a ton of reads in your intronic regions?
      Well thought wjeck, thanks so much! Here's below the picture from IGV.

      The plot I first showed is from mpileup. I thought samtools mpileup would have been aware of gaps (splice-junctions) as they are encoded in the CIGAR string, right? Instead it seems to count coverage where there should be gaps. This is the samtools command I executed. Am I doing something wrong? (Unless I'm missing something here, it surprises me that samtools counts reads where there are gaps!?)

      samtools mpileup -r 7:27658519-27661277 accepted_hits.bam > tnf.pileup

      Dario

      Attached Files

      Comment

      • zorph
        Member
        • May 2010
        • 40

        #4
        I have have some RNA-seq data and i have been to talks involving RNA-seq data and it seems that introns seem to be retained.

        When I talked to a guy from PacBio presenting his data (where he had retained introns) he stated that this could be due to pre-mRNA being included in your sample or it could be a signal that these introns might be doing more (i.e. playing a regulatory role).

        It's unknown as of now but I've seen 3 different sets of data and they all have intron retention :-/.

        Comment

        • pbluescript
          Senior Member
          • Nov 2009
          • 224

          #5
          It is a tricky question and it is something a lot of people are seeing, but you almost have to take it on a case-by-case basis to figure out what might be going on.
          In addition to the pre-mRNA and retained intron possibilities, some of the reads may map to multiple places or map to repeat regions of the genome. I place less trust in those types of reads.

          Comment

          • wjeck
            Member
            • Mar 2009
            • 39

            #6
            Originally posted by dariober View Post
            Well thought wjeck, thanks so much! Here's below the picture from IGV.

            The plot I first showed is from mpileup. I thought samtools mpileup would have been aware of gaps (splice-junctions) as they are encoded in the CIGAR string, right? Instead it seems to count coverage where there should be gaps. This is the samtools command I executed. Am I doing something wrong? (Unless I'm missing something here, it surprises me that samtools counts reads where there are gaps!?)

            samtools mpileup -r 7:27658519-27661277 accepted_hits.bam > tnf.pileup

            Dario


            Yep, that's your problem. I don't know how to fix this because I haven't been working with RNA, but we had this problem when we were trying to use coverage to do indel detection (in a very naive way). We just ended up taking another route. Could you go through the SAM file and break the reads with introns into two separate records? It's a hack, but it might work.
            --will

            Comment

            • Richard Finney
              Senior Member
              • Feb 2009
              • 701

              #7
              This picture, via Cancer Genome Workbench (cgwb.nci.nih.gov) shows rnaseq reads hitting introns. The wiggle plot show logarithmic coverage for a TCGA rnaseq samples. It may be immature rna or intronic control RNAexpression as others have theorized. This intronic rna expression appears quite common in many samples and most of the reads, in this case, appear to be "uniquely" mappable.

              Attached Files
              Last edited by Richard Finney; 05-17-2011, 09:08 AM. Reason: gramerr

              Comment

              • nilshomer
                Nils Homer
                • Nov 2008
                • 1283

                #8
                Could you show a SAM record that is being improperly counted? Curiosity here...

                Comment

                • steven
                  Senior Member
                  • Aug 2009
                  • 269

                  #9
                  What I usually observe from public RNA-seq data is that about 10-15% of the non-spliced reads do overlap annotated introns. It can be much higher, depending on the library preparation. It also depends on the reference annotation (for instance Refseq is more restrictive than Ensembl).

                  Possible reasons:
                  - "background noise"
                  - "pervasive transcription"
                  - partially processed mRNAs
                  - genomic contamination
                  - functional non coding RNAs
                  - new transcripts / splicing isoforms
                  - degradation products (they can be polyAdenylated and even re-capped)
                  Last edited by steven; 05-19-2011, 02:36 AM.

                  Comment

                  • dariober
                    Senior Member
                    • May 2010
                    • 311

                    #10
                    Originally posted by nilshomer View Post
                    Could you show a SAM record that is being improperly counted? Curiosity here...
                    Hi nilshomer - Thank you and everyone else for discussion,

                    Here some lines from the SAM and pileup output just at the border between the first exon-intron. The CIGAR string of the reads spanning the intron have a 586 bp insertion, which is the size of the intron. These insertions are counted in the pileup file.

                    For the first time I realize that the insertion is coded as '>' or '<' in the pileup column reporting the bases. So a 'corrected' pileup should have at each position the read count minus the count of '>' and '<', right?

                    Any comments welcome!

                    All the best

                    Code:
                    EBRI093151_0001:7:65:657:1096#TCTCCC	73	7	27658852	255	35M	*	0	0	TGCACTTCGAGGTTATCGGCCCCCAGAAGGAAGAG	BCCCCBCBBCAAABCBA>@@BBCC@:<;A?0?@;>	NM:i:0	NH:i:1
                    EBRI093151_0001:7:10:722:420#TCTCTT	163	7	27658853	255	35M	=	27658864	0	GCACTTCGAGGTTATCGGCCCCCAGAAGGAAGAGT	AA+>BC@CACBBBBBBB;)@CBBAC>@1?&=;%=3	NM:i:1	NH:i:1
                    EBRI093151_0001:7:100:1536:385#CCCCCC	97	7	27658853	255	35M	=	27659701	0	GCACTTCGAGGTTATCGGCCCCCAGAAGGAAGAGT	BAAA5=>A9AB>?>-372?29BB?.':?A?)3)@2	NM:i:1	NH:i:1
                    EBRI093151_0001:7:105:1158:856#TTAACT	163	7	27658856	255	31M586N4M	=	27659472	0	CTTCGAGGTTATCGGCCCCCAGAAGGAAGAGTTTC	<C;?9?BCBC6B6BC>>CBACCCA8@1=@AC>C=*	NM:i:0	XS:A:+	NH:i:1
                    EBRI093151_0001:7:72:996:1595#TCTCTA	83	7	27658856	255	35M	=	27658780	0	CTTCGAGGTTATCGGCCCCCAGAAGGAAGAGTTGC	C4BBBA;BBBABBBBB@6@BB?BBBBCCABBBC@B	NM:i:2	NH:i:1
                    EBRI093151_0001:7:21:730:103#ACCTCC	163	7	27658859	255	28M586N7M	=	27659486	0	CGAGGTTATCGGCCCCCAGAAGGAAGAGTTTCCAG	ACCCC2CBBBBA@ABB@1><A@A1>;%=>=B7;)7	NM:i:0	XS:A:+	NH:i:1
                    EBRI093151_0001:7:42:94:608#TCCCCC	161	7	27658859	255	28M586N7M	=	27659700	0	CGAGGTTATCGGCCCCCAGAAGGAAGAGTTTCCAG	@B@CC?CBAAA?5>ACCA>9??A8@7;>3>B=?57	NM:i:0	XS:A:+	NH:i:1
                    EBRI093151_0001:7:117:1539:645#CCTCTC	97	7	27658859	255	28M586N7M	=	27659706	0	CGAGGTTATCGGCCCCCAGAAGGAAGAGTTTCCAG	BCB@A=AA;5?@BBBB=49?<>>=4628,=<A3##	NM:i:0	XS:A:+	NH:i:1
                    EBRI093151_0001:7:117:341:1130#CTTACC	147	7	27658862	255	25M586N10M	=	27658795	0	GGTTATCGGCCCCCAGAAGGAAGAGTTTCCAGCTG	C;=ABCC@@@>BCAAACC>??=BBC9>?ABBA=;@	NM:i:0	XS:A:+	NH:i:1
                    EBRI093151_0001:7:86:263:1733#TTTCTA	73	7	27658863	255	24M586N11M	*	0	0	GTTATCGGCCCCCAGAAGGAAGAGTTTCCAGCTGG	BBAB?A;1<B?B=82:92;:79785=72;648594	NM:i:0	XS:A:+	NH:i:1
                    EBRI093151_0001:7:10:722:420#TCTCTT	83	7	27658864	255	23M586N12M	=	27658853	0	TTATCGGCCCCCAGAAGGAAGAGTTTCCAGCTGGC	@C@ACCB?BA4ABCB@ABBBBBBBBAC;@BBBCCB	NM:i:0	XS:A:+	NH:i:1
                    EBRI093151_0001:7:58:1684:1481#TATCTA	83	7	27658864	255	23M586N12M	=	27658781	0	TTATCGGCCCCCAGAAGGAAGAGTTTCCAGCTGGC	ABA8CBB:7<ABC>CBAA@BBBBA=ABBCACABCB	NM:i:0	XS:A:+	NH:i:1
                    EBRI093151_0001:7:73:1052:643#CCCCCA	97	7	27658866	255	21M586N14M	=	27659703	0	ATCGGCCCCCAGAAGGAAGAGTTTCCAGCTGGCCC	BAA?A@A=BA?@>;=A=??=?4?=?:3;317>::5	NM:i:0	XS:A:+	NH:i:1
                    EBRI093151_0001:7:89:1185:1493#CCCCCT	99	7	27658866	255	21M586N14M	=	27659486	0	ATCGGCCCCCAGAAGGAAGAGTTTCCAGCTGGCCC	@AA?=A==A@2=<+B8(<@1@1@9?>1>=4=-.=:	NM:i:0	XS:A:+	NH:i:1
                    EBRI093151_0001:7:65:504:840#CCTCTT	99	7	27658867	255	20M586N15M	=	27659469	0	TCGGCCCCCAGAAGGAAGAGTTTCCAGCTGGCCCC	BABA<B?BB*@0;>69A@=97=BAB<6A?=9>=7@	NM:i:0	XS:A:+	NH:i:1
                    
                    
                    7	27658852	N	75	TTtTtTtttttTtttTTTtttttttTTTTttttttttttttttttttttttTtttttttttttttttTTttTttt	4AA>A?>CABA>=AAA69BA>B@=94>A@A@CB:A8ABABBAB@BA>B@BBAA7B=B:A8B>9@B?=AB@@?CB>
                    7	27658853	N	76	GGgGgGgggggGgggGGGgggggggGGGGggggggggggggggggggggggGgggggggggggggggGGggGggg^~G	9AB=ABBCABCBABAC:@CB@ABBA:BACBB>C:A@BCBC=BBBA?@ABB@B?0>:B4C@B>>BB@?=B@:BBBAA
                    7	27658854	N	76	CCcCcCcccccCcccCCCcccccccCCCCccccccccccccccccccccccCcccccccccccccccCCccCcccC	1=@A@ABCABBAAAB;99BA>BB=B>?@BBCAC@CCACBC?BCB=A?BCBC<B8@AA?CBBB@@BBB=B;B?ABBA
                    7	27658855	N	76	A$AaAaAaaaaaAaaaAAAaaaaaaaAAAAaaaaaaaaaaaaaaaaaaaaaaGaaaaaaaaaaaaaaaAAaaAaaaA	3A>>3B<:=B<B49=0=0A?@@8-;;?@B7BA@;>?:AA>=?9:.:9=C?B=<657=8?<;?8;=676;64A8=?+
                    7	27658856	N	77	CcCcCcccccCcccCCCcccccccCCCCccccccccccccccccccccccCcccccccccccccccCCccCcccC^~C^~c	=@AAACB@BAB?BB;B/BB?BA=A:BBB@CCCAB:@CCC>CCC@A>CCBC4B5C>BABA:BBABBB<?@B?B?B><C
                    7	27658857	N	77	TtTtTtttttTtttTTTtttttttTTTTttttttttttttttttttttttTtttttttttttttttTTttTtttTTt	?B<A?@?ABCB9AB>A>BA;A@A<.@?C9BB>8A3?B?B;ABA>?@?CA>=A4>5A9@A>;?:@??>BAAA:?<BC4
                    7	27658858	N	77	T$t$TtTtttttTtttTTTtttttttTTTTttttttttttttttttttttttTtttttttttttttttTTttTtttTTt	>A:?B@@<BCB:ABBB?A28@@?@?AAB%;BC=A<3BB=;?>8=<;?C>A=?=>:@0>A@>A>=6?ABB?A:>9C;B
                    7	27658859	N	76	C$cCcccccCcccCCCcccccccCCCCccccccccccccccccccccccCcccccccccccccccCCccCcccCCc^~C	@AACCAB@@@BB=B@C<=BBAB=ABA=BCCBCA@BC??CBB>A:BACB<A;C<BBCBBBB5@BC7ABBA@BB@?BA
                    7	27658860	N	75	g$GgggggGgggGGGgggggggGGGGggggggggggggggggggggggGgggggggggggggggGGggGgggGGgG	BBB@BBA8@BB;BABB99BABAAB:B?CCACBBCB@BCBBAA>?BBB@B>B>@;BBB?AAC@@:CBABB>BC9BC
                    7	27658861	N	74	A$a$a$aaaAaaaAAAaaaaaaaAAAAaaaaaaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaAAaaAaaaAAaA	3BBBBB3=B@2@<BB>=AAA.?52A@BAACCBABCBCBBABA>@BB8B@BAB?;BBBBABBA(>B==C?BA?AC
                    7	27658862	N	72	g$g$g$G$g$ggGGGgggggggGGGGggggggggggggggggggggggGgggggggggggggggGGggGgggGGgG^~g	@B@9@AB>A@CBABCBBA?A:BA>BBCBACBCBCBA@BA@BAB9BBBAA?CBBAB=BBA<CBAACBACB;CC
                    7	27658863	N	67	g$g$G$GGgggggggGGGGggggggggggggggggggggggGgggggggggggggggGGggGgggGGgGg	@@ACABB74AAB?>B9B8CC=CB>CACBBA8AB?4CBABA>AAA2CBA;@ABA@=CA?BBB<BCBC;
                    7	27658864	N	66	TTtttttttTTTTttttttttttttttttttttttTtttttttttttttttCTttTtttTTtTt^~t^~t	?)CB:BBBB(3499ACC/CB@BBB@CCC>BBCCBA1B?C7B7BBAB@>AB>(>@B4CC>BBB2=@A
                    7	27658865	N	66	TTtttttttTTTTttttttttttttttttttttttTtttttttttttttttTTttTtttTTtTttt	@2CB=9B>B1ABBB=>B/C>?>B+2CBB@BCBCBC5?BB?ABBB@B>BB>>6@AABCBABCBCACB
                    7	27658866	N	67	A$A$a$aaaaaaAAAAaaaaaaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaAAaaAaaaAAaAaaa^~A	@ABBB>BAC:A>B@=CB;ACBCA>;CAC=BCABBA2?BCBA@B@AB@AAB@7=BBACBBB6ABB@A@
                    7	27658867	N	67	t$t$t$t$t$t$TTTTttttttttttttttttttttttTtttttttttttttttTTttTtttTTtTtttT^~T^~T^~t	BBBB@B5BAC>CCB?CABBBCA?@ABB@BCC>1BABBB7B@[email protected]?BBBBCA8ABA>
                    7	27658868	N	63	C$C$C$C$c$cccccccccccccccccccccCcccccccccccccccCCccCcccCCcCcccCCCc^~C^~c	48>@AACCBCCACC>A?=ABBABAB@,B@BAAABABAABACB4?@B0C?BB6BBCCCAA@AA;
                    7	27658869	N	58	g$g$g$g$g$g$g$g$g$g$g$g$g$g$g$g$gggggGgggggggggggggggGGggGgggGGgGgggGGGgGg	BABBBBBAB?A?BBABABCC?2BABBA@BCBB>BCBB8BBB8CBABBBB@CB?BBA@=
                    7	27658870	N	45	g$g$g$g$g$GgagggggggggggggGGggGgggGGgGgggGGGgGg^~g^~g^~g	BBBBB5@>BABA@B?A@ACBB.@B:@BBC;CBA@BB=AB;A)6=:
                    7	27658871	N	40	C$c$c$cccccccccccccCCccCcccCCcCcccCCCcCcccc	.BBCBABBB8?BBCCB>:AA9CCB)>B@@?:A<9>A@7A?
                    7	27658872	N	38	c$c$c$c$c$c$c$ccccccCCccCcccCCcCcccCCCcCcccc^~c	BBABBB>AABCCB9@@C6CCB@>@A>B7=BA=BB&;18
                    7	27658873	N	34	c$a$c$c$ccCCccCcccCCcCcccCCCcCccccc^~C^~C^~c	BBBBCB-ABC=CCACC6BBA<=?B;B@6;>?BB#
                    7	27658874	N	30	c$a$CCccCcccCCcCcccCCCcCcccccCCc	BB8BBB=CC>BB@BC4AABBBA(<@?=CC#
                    7	27658875	N	29	C$C$ccCcccCCcCcccCCCcCcccccCCc^~C	::BC8CCCBAB@AAB@BA@B3>A==CB#B
                    7	27658876	N	27	g$a$AaaaAAaAaaaAAAaAaaaaaAAaA	BB.CCCACB1ABC2*+@@+ABBBCB#B
                    7	27658877	N	30	GgggGGgGgggGGGgGgggggGGgG^~g^~g^~g^~g^~g	1ACCCC?>AC>=@@=@,=B;BC@#B9A42:
                    7	27658878	N	32	AaaaAAaAaaaAAAaAaaaaaAAaAaaaaa^~a^~a	1CCC>CB<CBC<0AB3<>A=?A9#B>B:=;>#
                    7	27658879	N	37	AaaaAAaAaaaAAAaAaaaaaAAaAaaaaaaa^~a^~a^~a^~a^~a	1ACB@ABAC@B+;B@@*=A??B>#A?A9A<B62<+9>
                    7	27658880	N	42	G$g$g$gGGgGgggGGGgGgggggGGgGgggggggggggg^~g^~g^~g^~g^~g	/>BC18B@>AAB>B?B>A>A>BA#B=B@?A?7;;:8A2:;>?
                    7	27658881	N	39	gGGgGgggGGGgGgggggGGgGggggggggggggggggg	B?@BA?BA86C9A5=B1=B>;AAB<BAA3=<33<2A7;?
                    7	27658882	N	39	aAAaAaaaAAAaAaaaaaAAaAaaaaaaaaaaaaaaaaa	B&1C1?B@(9<B=@<AA?B=5@AC8A?@3=;;<B1@;@A
                    7	27658883	N	39	aAAaAaaaAAAaAaaaaaAAaAaaaaaaaaaaaaaaaaa	C==C>=BB<AB;B3?A?AA>3B@CB@AB97?:;37?<;@
                    7	27658884	N	39	g$GGgGgggGGGgGgggggGGgGggggggggggggggggg	B;@A;BBB@@@>?7AA3AC=/B>B?A@=05=1;=7=>=@
                    7	27658885	N	38	AAaAaaaAAAaAgaaaaAAaAaaaaaaaaaaaaaaaaa	%AB%BBB1=BAA(9AAAC=4A@B=B>A7>=6:@0?<=B
                    7	27658886	N	38	GGgGgggGGGgGgggggGGgGggggggggggggggggg	=CB=CBB@9B6A@8B;:C?4B?B8?;A555:8=-?5A<
                    7	27658887	N	38	T$>t><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<<	3>B>9BA17A0>7A?AAB?:.:@?B>@>8&;:96A??0
                    7	27658888	N	37	>t><<<>>><>t<<<<>><><<<<<<<<<<<<<<<<<	>C>9BA17A0>AA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658889	N	37	>g><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<<	>@>9BA17A0>3A?AAB?:.:@?B>@>8&;:96A??0
                    7	27658890	N	37	>c$><<<>>><>a<<<<>><><<<<<<<<<<<<<<<<<	>B>9BA17A0><A?AAB?:.:@?B>@>8&;:96A??0
                    7	27658891	N	36	>><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658892	N	36	>><<<>>><>c<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658893	N	36	>><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658894	N	36	>><<<>>><>c<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658895	N	36	>><<<>>><>c<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658896	N	36	>><<<>>><>t<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658897	N	36	>><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658898	N	36	>><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658899	N	36	>><<<>>><>c<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658900	N	36	>><<<>>><>c<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658901	N	36	>><<<>>><>a<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658902	N	36	>><<<>>><>g$<<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>AA?AAB?:.:@?B>@>8&;:96A??0
                    7	27658903	N	35	>><<<>>><><<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>A?AAB?:.:@?B>@>8&;:96A??0
                    7	27658904	N	35	>><<<>>><><<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>A?AAB?:.:@?B>@>8&;:96A??0
                    7	27658905	N	35	>><<<>>><><<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>A?AAB?:.:@?B>@>8&;:96A??0
                    7	27658906	N	35	>><<<>>><><<<<>><><<<<<<<<<<<<<<<<<	>>9BA17A0>A?AAB?:.:@?B>@>8&;:96A??0

                    Comment

                    • arecht
                      Junior Member
                      • Aug 2009
                      • 3

                      #11
                      Hi dariober, I was wondering if this problem of removing the gaps in the pileup step has ever been resolved for you? I ran into the same issue. Any tip would be greatly appreciated!

                      Comment

                      • chrisbala
                        Member
                        • Jan 2010
                        • 82

                        #12
                        has anyone seen any discussion of the influence of intronic reads on transcript prediction? Or I suppose, mention of how people are dealing with this? I've raised this issue elsewhere and continue to be plagued by it... i really can't find any automated solution for dealing with this. it seems we have to get down to business and manually annotate genes... fun!

                        Comment

                        • malachig
                          Senior Member
                          • Aug 2010
                          • 117

                          #13
                          The ALEXA-seq and Trans-ABySS manuscripts discuss the implications of intronic mapping reads for measuring expression and transcript assembly respectively. Due to all of the possible sources of sequences that will map to introns (nicely summarized by Steven above), it is important to consider whether the signal you see for an exon is really above that of the surrounding areas. RNA-seq is not a zero noise data type. . You will always have some amount of contaminating genomic DNA and unprocessed RNA (hnRNA). There are molecular strategies for reducing these (e.g. DNAseI treatment, cytoplasmic RNA isolation, polyA+ selection, etc.) but there will always be some. So, you expect signal across the genome in an RNA-seq library for purely molecular reasons and analytical sources add on top of this (e.g. multi-mapping reads). You can actually see hints of the difference between intergenic noise and intronic noise by examining the data. Intronic noise is expected to be correlated with gene expression level (more expression results in more unprocessed RNA when you take a snapshot of the transcriptome). This is exactly what we see in our RNA-seq libraries. This complicates the interpretation of whether an intron is really being retained as part of an alternative isoform. As others have mentioned, local context is important when interpreting RNA-seq data.

                          For all of these reasons, one can argue that exon-exon junctions might be a useful metric for evaluating RNA-seq library quality. The reason being that exon-exon sequences from real transcripts for the most part do not occur in the genome (and therefore not in unprocessed transcripts either). We also routinely evaluate exon-to-intron and exon-to-intergenic signal, among many other metrics. For example here is a report for a:
                          Human Breast vHMECS RNA-seq library.

                          Full disclosure. I am an author on the two papers linked above. Perhaps others can link to others that discuss this topic!

                          Comment

                          • cjp
                            Member
                            • Jun 2011
                            • 58

                            #14
                            Maybe HTseq-count deals with counting spliced reads:

                            From this page:




                            "Make sure to use a splicing-aware aligner such as TopHat. HTSeq-count makes full use of the information in the CIGAR field."

                            Chris

                            Comment

                            • adameur
                              Member
                              • Nov 2009
                              • 23

                              #15
                              Here is an answer to why there are so many intronic reads: it's because RNA-seq measures ongoing transcription!

                              We have recently published a paper in Nat Struct Mol Biol where this is described. See also this thread for more info.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Pathogen Surveillance with Advanced Genomic Tools
                                by seqadmin




                                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                03-24-2025, 11:48 AM
                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-20-2025, 05:03 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-19-2025, 07:27 AM
                              0 responses
                              57 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              50 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              201 views
                              0 reactions
                              Last Post seqadmin  
                              Working...