![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
who coined RNAseq? RNAseq as an alignment first approach only | brachysclereid | Bioinformatics | 3 | 01-10-2012 12:17 PM |
how do you find a possible intron on a gene? | bbsinfo | Illumina/Solexa | 2 | 12-30-2011 10:05 AM |
"coverage" of introns, intergenic regions for RNASEQ | PFS | Bioinformatics | 2 | 09-07-2011 01:51 PM |
coverage RNAseq | Trudy | Bioinformatics | 1 | 06-22-2011 08:27 AM |
Looking for an aligner allowing intron gap | superligang | Bioinformatics | 6 | 04-12-2010 11:20 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: Cambridge, UK Join Date: May 2010
Posts: 311
|
![]()
Hi All,
Two questions about RNAseq coverage: 1) Introns: I find that introns get quite a lot of reads mapping to them compared to the exons. Is this expected or is it a sign of something that went wrong? (I would expect introns to be almost invisible). 2) Read distribution: The read count varies a lot in the exons while it seems quite uniform in the introns. Is there any explanation for this? I attach here a plot of read count vs position (mpileup) in two samples for a gene as an example of what I mean (at a glance, other genes show similar patterns). Reads & alignments: c.ca 30 million reads, 35 bp pair-end aligned with Tophat. I realize these issues must have been discussed a number of times but I couldn't find any threads/papers. Any and all comments welcome! Dario |
![]() |
![]() |
![]() |
#2 |
Member
Location: Chapel Hill, NC Join Date: Mar 2009
Posts: 39
|
![]()
I would guess (and this is a guess) that you are including the "gap" part of your spliced reads in the coverage. That is, splice reads are being counted as covering the intron, which is def not what you want. Have you looked at your mappings in something like IGV, and does it show a ton of reads in your intronic regions?
|
![]() |
![]() |
![]() |
#3 | |
Senior Member
Location: Cambridge, UK Join Date: May 2010
Posts: 311
|
![]() Quote:
The plot I first showed is from mpileup. I thought samtools mpileup would have been aware of gaps (splice-junctions) as they are encoded in the CIGAR string, right? Instead it seems to count coverage where there should be gaps. This is the samtools command I executed. Am I doing something wrong? (Unless I'm missing something here, it surprises me that samtools counts reads where there are gaps!?) samtools mpileup -r 7:27658519-27661277 accepted_hits.bam > tnf.pileup Dario |
|
![]() |
![]() |
![]() |
#4 |
Member
Location: FL Join Date: May 2010
Posts: 40
|
![]()
I have have some RNA-seq data and i have been to talks involving RNA-seq data and it seems that introns seem to be retained.
When I talked to a guy from PacBio presenting his data (where he had retained introns) he stated that this could be due to pre-mRNA being included in your sample or it could be a signal that these introns might be doing more (i.e. playing a regulatory role). It's unknown as of now but I've seen 3 different sets of data and they all have intron retention :-/. |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Boston Join Date: Nov 2009
Posts: 224
|
![]()
It is a tricky question and it is something a lot of people are seeing, but you almost have to take it on a case-by-case basis to figure out what might be going on.
In addition to the pre-mRNA and retained intron possibilities, some of the reads may map to multiple places or map to repeat regions of the genome. I place less trust in those types of reads. |
![]() |
![]() |
![]() |
#6 | |
Member
Location: Chapel Hill, NC Join Date: Mar 2009
Posts: 39
|
![]() Quote:
Yep, that's your problem. I don't know how to fix this because I haven't been working with RNA, but we had this problem when we were trying to use coverage to do indel detection (in a very naive way). We just ended up taking another route. Could you go through the SAM file and break the reads with introns into two separate records? It's a hack, but it might work. --will |
|
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: bethesda Join Date: Feb 2009
Posts: 700
|
![]()
This picture, via Cancer Genome Workbench (cgwb.nci.nih.gov) shows rnaseq reads hitting introns. The wiggle plot show logarithmic coverage for a TCGA rnaseq samples. It may be immature rna or intronic control RNAexpression as others have theorized. This intronic rna expression appears quite common in many samples and most of the reads, in this case, appear to be "uniquely" mappable.
![]() Last edited by Richard Finney; 05-17-2011 at 09:08 AM. Reason: gramerr |
![]() |
![]() |
![]() |
#8 |
Nils Homer
Location: Boston, MA, USA Join Date: Nov 2008
Posts: 1,285
|
![]()
Could you show a SAM record that is being improperly counted? Curiosity here...
|
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: Southern France Join Date: Aug 2009
Posts: 269
|
![]()
What I usually observe from public RNA-seq data is that about 10-15% of the non-spliced reads do overlap annotated introns. It can be much higher, depending on the library preparation. It also depends on the reference annotation (for instance Refseq is more restrictive than Ensembl).
Possible reasons: - "background noise" - "pervasive transcription" - partially processed mRNAs - genomic contamination - functional non coding RNAs - new transcripts / splicing isoforms - degradation products (they can be polyAdenylated and even re-capped) Last edited by steven; 05-19-2011 at 02:36 AM. |
![]() |
![]() |
![]() |
#10 | |
Senior Member
Location: Cambridge, UK Join Date: May 2010
Posts: 311
|
![]() Quote:
Here some lines from the SAM and pileup output just at the border between the first exon-intron. The CIGAR string of the reads spanning the intron have a 586 bp insertion, which is the size of the intron. These insertions are counted in the pileup file. For the first time I realize that the insertion is coded as '>' or '<' in the pileup column reporting the bases. So a 'corrected' pileup should have at each position the read count minus the count of '>' and '<', right? Any comments welcome! All the best Code:
EBRI093151_0001:7:65:657:1096#TCTCCC 73 7 27658852 255 35M * 0 0 TGCACTTCGAGGTTATCGGCCCCCAGAAGGAAGAG BCCCCBCBBCAAABCBA>@@BBCC@:<;A?0?@;> NM:i:0 NH:i:1 EBRI093151_0001:7:10:722:420#TCTCTT 163 7 27658853 255 35M = 27658864 0 GCACTTCGAGGTTATCGGCCCCCAGAAGGAAGAGT AA+>BC@CACBBBBBBB;)@CBBAC>@1?&=;%=3 NM:i:1 NH:i:1 EBRI093151_0001:7:100:1536:385#CCCCCC 97 7 27658853 255 35M = 27659701 0 GCACTTCGAGGTTATCGGCCCCCAGAAGGAAGAGT BAAA5=>A9AB>?>-372?29BB?.':?A?)3)@2 NM:i:1 NH:i:1 EBRI093151_0001:7:105:1158:856#TTAACT 163 7 27658856 255 31M586N4M = 27659472 0 CTTCGAGGTTATCGGCCCCCAGAAGGAAGAGTTTC <C;?9?BCBC6B6BC>>CBACCCA8@1=@AC>C=* NM:i:0 XS:A:+ NH:i:1 EBRI093151_0001:7:72:996:1595#TCTCTA 83 7 27658856 255 35M = 27658780 0 CTTCGAGGTTATCGGCCCCCAGAAGGAAGAGTTGC C4BBBA;BBBABBBBB@6@BB?BBBBCCABBBC@B NM:i:2 NH:i:1 EBRI093151_0001:7:21:730:103#ACCTCC 163 7 27658859 255 28M586N7M = 27659486 0 CGAGGTTATCGGCCCCCAGAAGGAAGAGTTTCCAG ACCCC2CBBBBA@ABB@1><A@A1>;%=>=B7;)7 NM:i:0 XS:A:+ NH:i:1 EBRI093151_0001:7:42:94:608#TCCCCC 161 7 27658859 255 28M586N7M = 27659700 0 CGAGGTTATCGGCCCCCAGAAGGAAGAGTTTCCAG @B@CC?CBAAA?5>ACCA>9??A8@7;>3>B=?57 NM:i:0 XS:A:+ NH:i:1 EBRI093151_0001:7:117:1539:645#CCTCTC 97 7 27658859 255 28M586N7M = 27659706 0 CGAGGTTATCGGCCCCCAGAAGGAAGAGTTTCCAG BCB@A=AA;5?@BBBB=49?<>>=4628,=<A3## NM:i:0 XS:A:+ NH:i:1 EBRI093151_0001:7:117:341:1130#CTTACC 147 7 27658862 255 25M586N10M = 27658795 0 GGTTATCGGCCCCCAGAAGGAAGAGTTTCCAGCTG C;=ABCC@@@>BCAAACC>??=BBC9>?ABBA=;@ NM:i:0 XS:A:+ NH:i:1 EBRI093151_0001:7:86:263:1733#TTTCTA 73 7 27658863 255 24M586N11M * 0 0 GTTATCGGCCCCCAGAAGGAAGAGTTTCCAGCTGG BBAB?A;1<B?B=82:92;:79785=72;648594 NM:i:0 XS:A:+ NH:i:1 EBRI093151_0001:7:10:722:420#TCTCTT 83 7 27658864 255 23M586N12M = 27658853 0 TTATCGGCCCCCAGAAGGAAGAGTTTCCAGCTGGC @C@ACCB?BA4ABCB@ABBBBBBBBAC;@BBBCCB NM:i:0 XS:A:+ NH:i:1 EBRI093151_0001:7:58:1684:1481#TATCTA 83 7 27658864 255 23M586N12M = 27658781 0 TTATCGGCCCCCAGAAGGAAGAGTTTCCAGCTGGC ABA8CBB:7<ABC>CBAA@BBBBA=ABBCACABCB NM:i:0 XS:A:+ NH:i:1 EBRI093151_0001:7:73:1052:643#CCCCCA 97 7 27658866 255 21M586N14M = 27659703 0 ATCGGCCCCCAGAAGGAAGAGTTTCCAGCTGGCCC BAA?A@A=BA?@>;=A=??=?4?=?:3;317>::5 NM:i:0 XS:A:+ NH:i:1 EBRI093151_0001:7:89:1185:1493#CCCCCT 99 7 27658866 255 21M586N14M = 27659486 0 ATCGGCCCCCAGAAGGAAGAGTTTCCAGCTGGCCC @AA?=A==A@2=<+B8(<@1@1@9?>1>=4=-.=: NM:i:0 XS:A:+ NH:i:1 EBRI093151_0001:7:65:504:840#CCTCTT 99 7 27658867 255 20M586N15M = 27659469 0 TCGGCCCCCAGAAGGAAGAGTTTCCAGCTGGCCCC BABA<B?BB*@0;>69A@=97=BAB<6A?=9>=7@ NM:i:0 XS:A:+ NH:i:1 7 27658852 N 75 TTtTtTtttttTtttTTTtttttttTTTTttttttttttttttttttttttTtttttttttttttttTTttTttt 4AA>A?>CABA>=AAA69BA>B@=94>A@A@CB:A8ABABBAB@BA>B@BBAA7B=B:A8B>9@B?=AB@@?CB> 7 27658853 N 76 GGgGgGgggggGgggGGGgggggggGGGGggggggggggggggggggggggGgggggggggggggggGGggGggg^~G 9AB=ABBCABCBABAC:@CB@ABBA:BACBB>C:A@BCBC=BBBA?@ABB@B?0>:B4C@B>>BB@?=B@:BBBAA 7 27658854 N 76 CCcCcCcccccCcccCCCcccccccCCCCccccccccccccccccccccccCcccccccccccccccCCccCcccC 1=@A@ABCABBAAAB;99BA>BB=B>?@BBCAC@CCACBC?BCB=A?BCBC<B8@AA?CBBB@@BBB=B;B?ABBA 7 27658855 N 76 A$AaAaAaaaaaAaaaAAAaaaaaaaAAAAaaaaaaaaaaaaaaaaaaaaaaGaaaaaaaaaaaaaaaAAaaAaaaA 3A>>3B<:=B<B49=0=0A?@@8-;;?@B7BA@;>?:AA>=?9:.:9=C?B=<657=8?<;?8;=676;64A8=?+ 7 27658856 N 77 CcCcCcccccCcccCCCcccccccCCCCccccccccccccccccccccccCcccccccccccccccCCccCcccC^~C^~c =@AAACB@BAB?BB;B/BB?BA=A:BBB@CCCAB:@CCC>CCC@A>CCBC4B5C>BABA:BBABBB<?@B?B?B><C 7 27658857 N 77 TtTtTtttttTtttTTTtttttttTTTTttttttttttttttttttttttTtttttttttttttttTTttTtttTTt ?B<A?@?ABCB9AB>A>BA;A@A<.@?C9BB>8A3?B?B;ABA>?@?CA>=A4>5A9@A>;?:@??>BAAA:?<BC4 7 27658858 N 77 T$t$TtTtttttTtttTTTtttttttTTTTttttttttttttttttttttttTtttttttttttttttTTttTtttTTt >A:?B@@<BCB:ABBB?A28@@?@?AAB%;BC=A<3BB=;?>8=<;?C>A=?=>:@0>A@>A>=6?ABB?A:>9C;B 7 27658859 N 76 C$cCcccccCcccCCCcccccccCCCCccccccccccccccccccccccCcccccccccccccccCCccCcccCCc^~C @AACCAB@@@BB=B@C<=BBAB=ABA=BCCBCA@BC??CBB>A:BACB<A;C<BBCBBBB5@BC7ABBA@BB@?BA 7 27658860 N 75 g$GgggggGgggGGGgggggggGGGGggggggggggggggggggggggGgggggggggggggggGGggGgggGGgG BBB@BBA8@BB;BABB99BABAAB:B?CCACBBCB@BCBBAA>?BBB@B>B>@;BBB?AAC@@:CBABB>BC9BC 7 27658861 N 74 A$a$a$aaaAaaaAAAaaaaaaaAAAAaaaaaaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaAAaaAaaaAAaA 3BBBBB3=B@2@<BB>=AAA.?52A@BAACCBABCBCBBABA>@BB8B@BAB?;BBBBABBA(>B==C?BA?AC 7 27658862 N 72 g$g$g$G$g$ggGGGgggggggGGGGggggggggggggggggggggggGgggggggggggggggGGggGgggGGgG^~g @B@9@AB>A@CBABCBBA?A:BA>BBCBACBCBCBA@BA@BAB9BBBAA?CBBAB=BBA<CBAACBACB;CC 7 27658863 N 67 g$g$G$GGgggggggGGGGggggggggggggggggggggggGgggggggggggggggGGggGgggGGgGg @@ACABB74AAB?>B9B8CC=CB>CACBBA8AB?4CBABA>AAA2CBA;@ABA@=CA?BBB<BCBC; 7 27658864 N 66 TTtttttttTTTTttttttttttttttttttttttTtttttttttttttttCTttTtttTTtTt^~t^~t ?)CB:BBBB(3499ACC/CB@BBB@CCC>BBCCBA1B?C7B7BBAB@>AB>(>@B4CC>BBB2=@A 7 27658865 N 66 TTtttttttTTTTttttttttttttttttttttttTtttttttttttttttTTttTtttTTtTttt @2CB=9B>B1ABBB=>B/C>?>B+2CBB@BCBCBC5?BB?ABBB@B>BB>>6@AABCBABCBCACB 7 27658866 N 67 A$A$a$aaaaaaAAAAaaaaaaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaAAaaAaaaAAaAaaa^~A @ABBB>BAC:A>B@=CB;ACBCA>;CAC=BCABBA2?BCBA@B@AB@AAB@7=BBACBBB6ABB@A@ 7 27658867 N 67 t$t$t$t$t$t$TTTTttttttttttttttttttttttTtttttttttttttttTTttTtttTTtTtttT^~T^~T^~t BBBB@B5BAC>CCB?CABBBCA?@ABB@BCC>1BABBB7B@B@ABBBA.ABA9CA?BBBBCA8ABA> 7 27658868 N 63 C$C$C$C$c$cccccccccccccccccccccCcccccccccccccccCCccCcccCCcCcccCCCc^~C^~c 48>@AACCBCCACC>A?=ABBABAB@,B@BAAABABAABACB4?@B0C?BB6BBCCCAA@AA; 7 27658869 N 58 g$g$g$g$g$g$g$g$g$g$g$g$g$g$g$g$gggggGgggggggggggggggGGggGgggGGgGgggGGGgGg BABBBBBAB?A?BBABABCC?2BABBA@BCBB>BCBB8BBB8CBABBBB@CB?BBA@= 7 27658870 N 45 g$g$g$g$g$GgagggggggggggggGGggGgggGGgGgggGGGgGg^~g^~g^~g BBBBB5@>BABA@B?A@ACBB.@B:@BBC;CBA@BB=AB;A)6=: 7 27658871 N 40 C$c$c$cccccccccccccCCccCcccCCcCcccCCCcCcccc .BBCBABBB8?BBCCB>:AA9CCB)>B@@?:A<9>A@7A? 7 27658872 N 38 c$c$c$c$c$c$c$ccccccCCccCcccCCcCcccCCCcCcccc^~c BBABBB>AABCCB9@@C6CCB@>@A>B7=BA=BB&;18 7 27658873 N 34 c$a$c$c$ccCCccCcccCCcCcccCCCcCccccc^~C^~C^~c BBBBCB-ABC=CCACC6BBA<=?B;B@6;>?BB# 7 27658874 N 30 c$a$CCccCcccCCcCcccCCCcCcccccCCc BB8BBB=CC>BB@BC4AABBBA(<@?=CC# 7 27658875 N 29 C$C$ccCcccCCcCcccCCCcCcccccCCc^~C ::BC8CCCBAB@AAB@BA@B3>A==CB#B 7 27658876 N 27 g$a$AaaaAAaAaaaAAAaAaaaaaAAaA BB.CCCACB1ABC2*+@@+ABBBCB#B 7 27658877 N 30 GgggGGgGgggGGGgGgggggGGgG^~g^~g^~g^~g^~g 1ACCCC?>AC>=@@=@,=B;BC@#B9A42: 7 27658878 N 32 AaaaAAaAaaaAAAaAaaaaaAAaAaaaaa^~a^~a 1CCC>CB<CBC<0AB3<>A=?A9#B>B:=;># 7 27658879 N 37 AaaaAAaAaaaAAAaAaaaaaAAaAaaaaaaa^~a^~a^~a^~a^~a 1ACB@ABAC@B+;B@@*=A??B>#A?A9A<B62<+9> 7 27658880 N 42 G$g$g$gGGgGgggGGGgGgggggGGgGgggggggggggg^~g^~g^~g^~g^~g />BC18B@>AAB>B?B>A>A>BA#B=B@?A?7;;:8A2:;>? 7 27658881 N 39 gGGgGgggGGGgGgggggGGgGggggggggggggggggg B?@BA?BA86C9A5=B1=B>;AAB<BAA3=<33<2A7;? 7 27658882 N 39 aAAaAaaaAAAaAaaaaaAAaAaaaaaaaaaaaaaaaaa B&1C1?B@(9<B=@<AA?B=5@AC8A?@3=;;<B1@;@A 7 27658883 N 39 aAAaAaaaAAAaAaaaaaAAaAaaaaaaaaaaaaaaaaa C==C>=BB<AB;B3?A?AA>3B@CB@AB97?:;37?<;@ 7 27658884 N 39 g$GGgGgggGGGgGgggggGGgGggggggggggggggggg B;@A;BBB@@@>?7AA3AC=/B>B?A@=05=1;=7=>=@ 7 27658885 N 38 AAaAaaaAAAaAgaaaaAAaAaaaaaaaaaaaaaaaaa %AB%BBB1=BAA(9AAAC=4A@B=B>A7>=6:@0?<=B 7 27658886 N 38 GGgGgggGGGgGgggggGGgGggggggggggggggggg =CB=CBB@9B6A@8B;:C?4B?B8?;A555:8=-?5A< 7 27658887 N 38 T$>t><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<< 3>B>9BA17A0>7A?AAB?:.:@?B>@>8&;:96A??0 7 27658888 N 37 >t><<<>>><>t<<<<>><><<<<<<<<<<<<<<<<< >C>9BA17A0>AA?AAB?:.:@?B>@>8&;:96A??0 7 27658889 N 37 >g><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<< >@>9BA17A0>3A?AAB?:.:@?B>@>8&;:96A??0 7 27658890 N 37 >c$><<<>>><>a<<<<>><><<<<<<<<<<<<<<<<< >B>9BA17A0><A?AAB?:.:@?B>@>8&;:96A??0 7 27658891 N 36 >><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<< >>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0 7 27658892 N 36 >><<<>>><>c<<<<>><><<<<<<<<<<<<<<<<< >>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0 7 27658893 N 36 >><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<< >>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0 7 27658894 N 36 >><<<>>><>c<<<<>><><<<<<<<<<<<<<<<<< >>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0 7 27658895 N 36 >><<<>>><>c<<<<>><><<<<<<<<<<<<<<<<< >>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0 7 27658896 N 36 >><<<>>><>t<<<<>><><<<<<<<<<<<<<<<<< >>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0 7 27658897 N 36 >><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<< >>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0 7 27658898 N 36 >><<<>>><>g<<<<>><><<<<<<<<<<<<<<<<< >>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0 7 27658899 N 36 >><<<>>><>c<<<<>><><<<<<<<<<<<<<<<<< >>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0 7 27658900 N 36 >><<<>>><>c<<<<>><><<<<<<<<<<<<<<<<< >>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0 7 27658901 N 36 >><<<>>><>a<<<<>><><<<<<<<<<<<<<<<<< >>9BA17A0>BA?AAB?:.:@?B>@>8&;:96A??0 7 27658902 N 36 >><<<>>><>g$<<<<>><><<<<<<<<<<<<<<<<< >>9BA17A0>AA?AAB?:.:@?B>@>8&;:96A??0 7 27658903 N 35 >><<<>>><><<<<>><><<<<<<<<<<<<<<<<< >>9BA17A0>A?AAB?:.:@?B>@>8&;:96A??0 7 27658904 N 35 >><<<>>><><<<<>><><<<<<<<<<<<<<<<<< >>9BA17A0>A?AAB?:.:@?B>@>8&;:96A??0 7 27658905 N 35 >><<<>>><><<<<>><><<<<<<<<<<<<<<<<< >>9BA17A0>A?AAB?:.:@?B>@>8&;:96A??0 7 27658906 N 35 >><<<>>><><<<<>><><<<<<<<<<<<<<<<<< >>9BA17A0>A?AAB?:.:@?B>@>8&;:96A??0 |
|
![]() |
![]() |
![]() |
#11 |
Junior Member
Location: Santa Cruz Join Date: Aug 2009
Posts: 3
|
![]()
Hi dariober, I was wondering if this problem of removing the gaps in the pileup step has ever been resolved for you? I ran into the same issue. Any tip would be greatly appreciated!
|
![]() |
![]() |
![]() |
#12 |
Member
Location: North Carolina Join Date: Jan 2010
Posts: 82
|
![]()
has anyone seen any discussion of the influence of intronic reads on transcript prediction? Or I suppose, mention of how people are dealing with this? I've raised this issue elsewhere and continue to be plagued by it... i really can't find any automated solution for dealing with this. it seems we have to get down to business and manually annotate genes... fun!
|
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: WashU Join Date: Aug 2010
Posts: 117
|
![]()
The ALEXA-seq and Trans-ABySS manuscripts discuss the implications of intronic mapping reads for measuring expression and transcript assembly respectively. Due to all of the possible sources of sequences that will map to introns (nicely summarized by Steven above), it is important to consider whether the signal you see for an exon is really above that of the surrounding areas. RNA-seq is not a zero noise data type.
![]() For all of these reasons, one can argue that exon-exon junctions might be a useful metric for evaluating RNA-seq library quality. The reason being that exon-exon sequences from real transcripts for the most part do not occur in the genome (and therefore not in unprocessed transcripts either). We also routinely evaluate exon-to-intron and exon-to-intergenic signal, among many other metrics. For example here is a report for a: Human Breast vHMECS RNA-seq library. Full disclosure. I am an author on the two papers linked above. Perhaps others can link to others that discuss this topic! |
![]() |
![]() |
![]() |
#14 |
Member
Location: Cambridge, United Kingdom Join Date: Jun 2011
Posts: 58
|
![]()
Maybe HTseq-count deals with counting spliced reads:
From this page: http://www-huber.embl.de/users/ander...doc/count.html "Make sure to use a splicing-aware aligner such as TopHat. HTSeq-count makes full use of the information in the CIGAR field." Chris |
![]() |
![]() |
![]() |
#15 |
Member
Location: Uppsala, Sweden Join Date: Nov 2009
Posts: 23
|
![]()
Here is an answer to why there are so many intronic reads: it's because RNA-seq measures ongoing transcription!
We have recently published a paper in Nat Struct Mol Biol where this is described. See also this thread for more info. |
![]() |
![]() |
![]() |
#16 |
Member
Location: Cambridge, United Kingdom Join Date: Jun 2011
Posts: 58
|
![]()
Hi adameur,
I'd thought to mention your paper in my previous answer as well! Do you know how much it may affect mRNA-seq studies - where people try to select processed RNA's from total RNA using poly-A and related lab methods? Chris |
![]() |
![]() |
![]() |
#17 |
Member
Location: Uppsala, Sweden Join Date: Nov 2009
Posts: 23
|
![]()
Hi Chris,
From what I have seen the choice of lab method makes a big difference.. PolyA+ gives a much higher proportion of reads mapping to exons, which is good if you're mainly interested in expression of mature mRNAs and splice variants. Total RNA on the other hand makes it possible to also study other types of events, like nascent transcript formation and co-transcriptional splicing. The drawbacks are that you need quite a lot of reads to study the intronic expression and that there are no established analysis tools. Adam |
![]() |
![]() |
![]() |
#18 |
Member
Location: Texas Join Date: May 2009
Posts: 32
|
![]()
a dumb question. How did you make that figure in the first post of this thread? which program you used?
|
![]() |
![]() |
![]() |
#19 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
Just to clarify: We have two things going on here.
(i) The coverage curve shown in post #1, produced with samtools, disagrees from the one shown in post #3, produced, for the same data, with IGC. This points to an interesting and worrying bug in samtools and explains most of the coverage in the introns. (ii) There are still reads left in the introns. They are most likely really there, and many people have seen them in RNA-Seq data. The thread has probably mentioned most of the suggestions the literature has discussed recently about what this means. |
![]() |
![]() |
![]() |
#20 |
Member
Location: Cambridge, United Kingdom Join Date: Jun 2011
Posts: 58
|
![]()
Yes is a two-issue thread!
The samtools thing is that mpileup adds '>' and '<' symbols where the cigar part of an alignment is an N. You can use "samtools depth" and that gives me 0 over introns. |
![]() |
![]() |
![]() |
Thread Tools | |
|
|