SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Duplication level of RNA-seq data gary Bioinformatics 8 06-16-2013 01:24 PM
mRNA expression level data base blueStone Bioinformatics 2 07-29-2012 11:42 AM
DEXSeq sig-regulated exons only high in one sample Pedrissimo RNA Sequencing 3 06-05-2012 02:16 AM
Finding SNPs in exons from 454 Data genehunter2 Bioinformatics 0 03-11-2011 01:45 AM

Reply
 
Thread Tools
Old 09-17-2013, 04:02 AM   #1
tirypr77
Junior Member
 
Location: Tübingen, Germany

Join Date: Sep 2013
Posts: 1
Default High level of novel exons/introns in cuffcompare data

Hello, all.

I am new to the forums and the whole of bioinformatics (I've been at it two weeks), but I have done a good deal of reading and have been playing around with the tophat->cufflinks pipeline.

Currently, I have RNA-seq libraries constructed from pineal glands of 3 aged patients. I am attempting to identify novel transcripts in this relatively small library, at which point I will move up to a larger library.

However, after assembling the transcripts with cufflinks (using the latest Ensembl human genome as my reference for the RABT), running cuffcompare to compare this pooled data back to the same Ensembl genome results in a very high percentage of transcripts identified as novel. Specifically, 47.4% of exons and 25.8% of introns are identified as novel, as are 82.4% of loci.

Now, I am fairly certain these numbers cannot be correct. I recognize that we expect to find some number of new annotations, but this seems ludicrously high. I was wondering

1) What could account for this very high report of novel transcripts? Could it just be lousy coverage resulting in many sparse transcripts being 'false positives'? I know that we did not have large amounts of RNA from these pineal glands (they're small, of course). If it's the data that is indeed the problem, how could I demonstrate this fact?

2) Do you have any suggestions on enhancing this method to identify novel transcripts? I had hoped to use Cuffcompare's 'j' tag to look at possible novel transcripts, but I am either getting identity to the reference genome (code =) or totally unknown transcripts (code u) at this point, with very little exception. I had had an idea to run the cufflinks assembly with the reference genome listed as both a reference and the mask...I think I will try that out and see how it works until I get a better idea.

Hopefully this is enough information. I look forward to any advice the sages of this forum can give.

-RP
tirypr77 is offline   Reply With Quote
Old 09-17-2013, 06:28 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Have you looked at the location of some of the predictions in IGV or another browser? You might get a better idea of why these metrics are so high by doing so.
dpryan is offline   Reply With Quote
Old 04-15-2014, 04:36 AM   #3
am@i
Member
 
Location: lucknow

Join Date: Dec 2013
Posts: 13
Default

hello Devon,

how i can identify novel transcript when i run cuffcompare ???
tophat -o output arabidopsis.fa file1_R1.fq file1_R2.fq
cufflinks -o output accepted_hits.bam
cuffmerge -s arabidopsis.fa assemblies.txt
assemblies.txt(transcripts_1.gtf........transcripts_n.gtf)
cuffcompare -s arabidopsis.fa -r known_annotation.gtf merged.gtf

when i run this command i didn't get any FPKM values in the output file !! so please any one suggest that how can i identify novel transcripts??
and output file-
ref_gene_id ref_id class_code cuff_gene_id cuff_id FMI FPKM FPKM_conf_lo FPKM_conf_hi cov len major_iso_id ref_match_len
ANAC001 AT1G01010.1 = XLOC_000001 TCONS_00000002 0 0.000000 0.000000 0.000000 0.000000 1694 TCONS_00000002 1688
ANAC001 AT1G01010.1 j XLOC_000001 TCONS_00000001 0 0.000000 0.000000 0.000000 0.000000 1674 TCONS_00000002 1688
DCL1 AT1G01040.1 j XLOC_000002 TCONS_00000004 0 0.000000 0.000000 0.000000 0.000000 6611 TCONS_00000004 6251
DCL1 AT1G01040.1 = XLOC_000002 TCONS_00000003 0 0.000000 0.000000 0.000000 0.000000 6251 TCONS_00000004 6251
DCL1 AT1G01040.2 = XLOC_000002 TCONS_00000005 0 0.000000 0.000000 0.000000 0.000000 5984 TCONS_00000004 5877
AT1G01073 AT1G01073.1 = XLOC_000003 TCONS_00000006 0 0.000000 0.000000 0.000000 0.000000 111 TCONS_00000006 111
IQD18 AT1G01110.2 = XLOC_000004 TCONS_00000007 0 0.000000 0.000000 0.000000 0.000000 1782 TCONS_00000007 1782
AT1G01115 AT1G01115.1 = XLOC_000005 TCONS_00000008 0 0.000000 0.000000 0.000000 0.000000 117 TCONS_00000008 117
GIF2 AT1G01160.1 = XLOC_000006 TCONS_00000009 0 0.000000 0.000000 0.000000 0.000000 1045 TCONS_00000010 1045
GIF2 AT1G01160.2 = XLOC_000006 TCONS_00000010 0 0.000000 0.000000 0.000000 0.000000 1129 TCONS_00000010 1129
AT1G01180 AT1G01180.1 = XLOC_000007 TCONS_00000011 0 0.000000 0.000000 0.000000 0.000000 1176 TCONS_00000011 1176
MIR165A AT1G01183.1 x XLOC_000008 TCONS_00000012 0 0.000000 0.000000 0.000000 0.000000 651 TCONS_00000012 101
F6F3.2 AT1G01210.1 = XLOC_000009 TCONS_00000013 0 0.000000 0.000000 0.000000 0.000000 616 TCONS_00000013 616
FKGP AT1G01220.1 = XLOC_000010 TCONS_00000014 0 0.000000 0.000000 0.000000 0.000000 3532 TCONS_00000014 3532

Last edited by am@i; 04-15-2014 at 05:27 AM.
am@i is offline   Reply With Quote
Reply

Tags
cuffcompare, cufflinks, novel transcripts, rna seq, transcript assembly

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:58 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO