Seqanswers Leaderboard Ad

**kwatts59** · 03-14-2014, 05:21 PM

Here is some additional info

Below is the run.log file:
/usr/local/bioinfo/cufflinks-2.1.1.Linux_x86_64/cuffmerge -o /rice2/cuffmerge4 /rice2/cuffmerge4/transcripts_gtf.list
gtf_to_sam -F /rice2/control/7961X1/cufflinks3/transcripts.gtf /rice2/cuffmerge4/tmp/gtf2sam_file4OrNLa
gtf_to_sam -F /rice2/aba_treated/7961X2/cufflinks3/transcripts.gtf /rice2/cuffmerge4/tmp/gtf2sam_file4kSObI
gtf_to_sam -F /rice2/ga_treated/8198X1/cufflinks3/transcripts.gtf /rice2/cuffmerge4/tmp/gtf2sam_file4z0eHq
gtf_to_sam -F /rice2/aba_ga_treated/8198X2/cufflinks3/transcripts.gtf /rice2/cuffmerge4/tmp/gtf2sam_fileOQfcsk
sort -k 3,3 -k 4,4n --temporary-directory=/rice2/cuffmerge4/tmp/ /rice2/cuffmerge4/tmp/gtf2sam_file4OrNLa /rice2/cuffmerge4/tmp/gtf2sam_file4kSObI /rice2/cuffmerge4/tmp/gtf2sam_file4z0eHq /rice2/cuffmerge4/tmp/gtf2sam_fileOQfcsk > /rice2/cuffmerge4/tmp/mergeSam_fileUgYASo
cufflinks -o /rice2/cuffmerge4/ -F 0.05 -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 1 /rice2/cuffmerge4/tmp/mergeSam_fileUgYASo
cuffcompare -o tmp_meta_asm -C -G /rice2/cuffmerge4//transcripts.gtf
cuffcompare -o tmp_meta_asm -C -G /rice2/cuffmerge4//merged.gtf

Only a single output file is generated and it is named "merged.gtf".
Below are the first 10 lines of the merged.gtf file. Note that the third column says "exon" throughout the entire file.
chr01 Cufflinks exon 3354 3616 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; oId "CUFF.2.1"; tss_id "TSS1";
chr01 Cufflinks exon 4357 4458 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "2"; oId "CUFF.2.1"; tss_id "TSS1";
chr01 Cufflinks exon 7133 7944 . + . gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "1"; oId "CUFF.4.1"; tss_id "TSS2";
chr01 Cufflinks exon 8028 8156 . + . gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "2"; oId "CUFF.4.1"; tss_id "TSS2";
chr01 Cufflinks exon 27054 27292 . + . gene_id "XLOC_000003"; transcript_id "TCONS_00000003"; exon_number "1"; oId "CUFF.25.1"; tss_id "TSS3";
chr01 Cufflinks exon 27370 27894 . + . gene_id "XLOC_000003"; transcript_id "TCONS_00000003"; exon_number "2"; oId "CUFF.25.1"; tss_id "TSS3";
chr01 Cufflinks exon 29682 29976 . + . gene_id "XLOC_000004"; transcript_id "TCONS_00000004"; exon_number "1"; oId "CUFF.27.1"; tss_id "TSS4";
chr01 Cufflinks exon 30146 30400 . + . gene_id "XLOC_000004"; transcript_id "TCONS_00000004"; exon_number "2"; oId "CUFF.27.1"; tss_id "TSS4";
chr01 Cufflinks exon 32716 32908 . + . gene_id "XLOC_000005"; transcript_id "TCONS_00000005"; exon_number "1"; oId "CUFF.34.1"; tss_id "TSS5";
chr01 Cufflinks exon 33277 34486 . + . gene_id "XLOC_000005"; transcript_id "TCONS_00000005"; exon_number "2"; oId "CUFF.34.1"; tss_id "TSS5";

If anybody can help me, your efforts will be appreciated.
Thanks in advance.

**csmatyi** · 04-12-2014, 12:05 PM

novel INDELs, splice variants, transcripts

Dear everybody,

I have been running the tophat->cufflinks->cuffcompare software in order to find new INDELs, splice variants, and transcripts. This is a screenshot of my cuffcompare results:

ref_gene_id ref_id class_code cuff_gene_id cuff_id FMI FPKM FPKM_conf_lo FPKM_conf_hi cov len major_iso_id ref_match_len
NM_011541 NM_011541 p CUFF.2 CUFF.2.1 100 0.359342 0.150146 0.540524 2.236101 1081 CUFF.2.1 2671
NM_001177795 NM_001177795 p CUFF.3 CUFF.3.1 100 0.498016 0.298153 0.689480 3.034472 1742 CUFF.3.1 2125
NM_001177658 NM_001177658 = CUFF.1 CUFF.1.1 100 19.493129 17.552255 21.434003 115.930347 947 CUFF.1.1 4201
NM_008866 NM_008866 = CUFF.4 CUFF.4.1 100 11.250214 9.844004 11.981710 68.637326 2460 CUFF.4.1 2433
- - u CUFF.6 CUFF.6.1 100 1.115727 0.931503 1.305515 6.851548 4600 CUFF.6.1 -
NM_021374 NM_021374 = CUFF.5 CUFF.5.1 100 1.345565 0.996552 1.666366 7.926568 1987 CUFF.5.1 1778
NM_001159750 NM_001159750 = CUFF.7 CUFF.7.1 19 4.022620 3.373178 4.662116 24.433896 2545 CUFF.7.2 2668
NM_011541 NM_011541 = CUFF.7 CUFF.7.2 100 21.482097 20.103894 22.897623 130.484947 2548 CUFF.7.2 2671
NM_133826 NM_133826 = CUFF.8 CUFF.8.1 100 23.449073 22.067371 24.844060 143.647096 2621 CUFF.8.1 1976
- - u CUFF.9 CUFF.9.1 100 0.425803 0.172667 0.656136 2.649674 940 CUFF.9.1 -
- - u CUFF.10 CUFF.10.1 100 0.539862 0.335615 0.727167 3.101017 1741 CUFF.10.1 -
- - u CUFF.11 CUFF.11.1 100 0.729683 0.292007 1.119361 3.973062 667 CUFF.11.1 -
- - u CUFF.12 CUFF.12.1 100 5.273319 1.347881 4.043642 25.459617 289 CUFF.12.1 -
NM_021511 NM_021511 = CUFF.13 CUFF.13.1 100 11.054199 9.417533 11.546153 66.036162 2013 CUFF.13.1 2048
NM_183028 NM_183028 = CUFF.14 CUFF.14.1 100 13.063534 12.150132 13.717891 79.539141 5135 CUFF.14.1 5232
NM_009826 NM_009826 j CUFF.15 CUFF.15.1 89 8.605083 7.931097 9.289027 52.128971 5126 CUFF.15.2 7046
NM_009826 NM_009826 = CUFF.15 CUFF.15.2 100 9.674768 9.110064 10.277111 58.609051 7493 CUFF.15.2 7046
NM_177547 NM_177547 c CUFF.17 CUFF.17.1 100 0.817438 0.497571 1.172846 4.754981 1091 CUFF.17.1 5408
NR_024067 NR_024067 j CUFF.18 CUFF.18.1 18 3.811516 3.105650 4.526064 20.487205 1526 CUFF.18.1 407
NR_024067 NR_024067 = CUFF.18 CUFF.18.2 100 21.203930 16.631886 25.800746 113.972831 329 CUFF.18.1 407
NM_001285425 NM_001285425 j CUFF.19 CUFF.19.1 100 0.867241 0.668385 1.069417 5.237056 4063 CUFF.19.2 3746
NM_001285425 NM_001285425 j CUFF.19 CUFF.19.2 79 0.682889 0.490159 0.872317 4.123800 4085 CUFF.19.2 3746

Now, I understand that in the 3rd column those genes with a letter j represent new transcripts. Do these include:

a) new INDELs?
b) new splice variants?
c) new transcripts?

One of my big questions is, if a new gene (one marked with j) has a Refseq id, such as NM_001285425, then why is it a new gene? I mean, if it already has a Refseq id, then why is it new? Doesn't it count as already having been discovered?

Thanks!

**blancha** · 04-13-2014, 08:03 PM

kwatts59

First, if a reference annotation for your organism already exists, add it to the cuffmerge run, so that you can identify transcripts that have already been annotated and can distinguish between known and novel transcripts. If you gave cuffmerge a reference annotation, you will be able to distinguish the novel transcripts by their gene id that will always start with XLOC_. Know genes will be identified by the gene id specified in the reference annotation.

Second, the GTF file created by cuffmerge is merely an annotation file, identifying the transcripts (transcript_id) and their exons (exon_number). To quantify the amount of transcripts, run cuffdiff with the GTF file created by cuffmerge.

**Wallysb01** · 04-13-2014, 09:05 PM

That gtf is showing genes and transcripts. You should read about the gtf format. There are many guides, here’s one: http://cufflinks.cbcb.umd.edu/gff.html

basically, in each exon line, the transcripts and genes are all specified in the last column with the 'gene_id "XLOC_000001"; transcript_id “TCONS_00000001”’ bit that sets up the parent/child relationship between the exon, transcript and gene.

As a general comment, I would say cufflinks is very loose in finding novel transcripts when you run the whole RABT mode pipeline. So, I’d suggest setting some strict parameters.

**am@i** · 04-15-2014, 04:47 AM

novel transcript in cuffcompare data

hello, all

how i can identify novel transcript when i run cuffcompare ???
tophat -o output arabidopsis.fa file1_R1.fq file1_R2.fq
cufflinks -o output accepted_hits.bam
cuffmerge -s arabidopsis.fa assemblies.txt
assemblies.txt(transcripts_1.gtf........transcripts_n.gtf)
cuffcompare -s arabidopsis.fa -r known_annotation.gtf merged.gtf

when i run this command i didn't get any FPKM values in the output file !! so please any one suggest that how can i identify novel transcripts??
and output file (cuff_compare.merged.gtf.tmap) -
ref_gene_id ref_id class_code cuff_gene_id cuff_id FMI FPKM FPKM_conf_lo FPKM_conf_hi cov len major_iso_id ref_match_len
ANAC001 AT1G01010.1 = XLOC_000001 TCONS_00000002 0 0.000000 0.000000 0.000000 0.000000 1694 TCONS_00000002 1688
ANAC001 AT1G01010.1 j XLOC_000001 TCONS_00000001 0 0.000000 0.000000 0.000000 0.000000 1674 TCONS_00000002 1688
DCL1 AT1G01040.1 j XLOC_000002 TCONS_00000004 0 0.000000 0.000000 0.000000 0.000000 6611 TCONS_00000004 6251
DCL1 AT1G01040.1 = XLOC_000002 TCONS_00000003 0 0.000000 0.000000 0.000000 0.000000 6251 TCONS_00000004 6251
DCL1 AT1G01040.2 = XLOC_000002 TCONS_00000005 0 0.000000 0.000000 0.000000 0.000000 5984 TCONS_00000004 5877
AT1G01073 AT1G01073.1 = XLOC_000003 TCONS_00000006 0 0.000000 0.000000 0.000000 0.000000 111 TCONS_00000006 111
IQD18 AT1G01110.2 = XLOC_000004 TCONS_00000007 0 0.000000 0.000000 0.000000 0.000000 1782 TCONS_00000007 1782
AT1G01115 AT1G01115.1 = XLOC_000005 TCONS_00000008 0 0.000000 0.000000 0.000000 0.000000 117 TCONS_00000008 117
GIF2 AT1G01160.1 = XLOC_000006 TCONS_00000009 0 0.000000 0.000000 0.000000 0.000000 1045 TCONS_00000010 1045
GIF2 AT1G01160.2 = XLOC_000006 TCONS_00000010 0 0.000000 0.000000 0.000000 0.000000 1129 TCONS_00000010 1129
AT1G01180 AT1G01180.1 = XLOC_000007 TCONS_00000011 0 0.000000 0.000000 0.000000 0.000000 1176 TCONS_00000011 1176
MIR165A AT1G01183.1 x XLOC_000008 TCONS_00000012 0 0.000000 0.000000 0.000000 0.000000 651 TCONS_00000012 101
F6F3.2 AT1G01210.1 = XLOC_000009 TCONS_00000013 0 0.000000 0.000000 0.000000 0.000000 616 TCONS_00000013 616
FKGP AT1G01220.1 = XLOC_000010 TCONS_00000014 0 0.000000 0.000000 0.000000 0.000000 3532 TCONS_00000014 3532

**blancha** · 04-15-2014, 06:04 AM

@am@i: As far as I can tell, there are no novel transcripts in the output you've posted. They all have a ref_gene_id, meaning that all the transcripts you've posted were found in your reference annotation file. I've been wrong before though

.

**am@i** · 04-15-2014, 11:35 PM

Hello Everyone,
I have been running the tophat->cufflinks->cuffcompare software in order to find novel transcripts!! This is a some part of my cuffcompare results:

ref_gene_id ref_id class_code cuff_gene_id cuff_id FMI FPKM FPKM_conf_lo FPKM_conf_hi cov len major_iso_id ref_match_len
AT1G01070 AT1G01070.1 = CUFF.3 CUFF.3.1 100 11.658709 8.811293 12.76999 25.330956 1334 CUFF.3.1 1311
NGA3 AT1G01030.1 o CUFF.1 CUFF.1.1 100 1.210378 0.680911 1.733229 2.87049 1376 CUFF.1.1 1905
LHY AT1G01060.3 j CUFF.4 CUFF.4.1 100 3.877988 3.027325 4.70032 8.757048 2318 CUFF.4.1 2517
LHY AT1G01060.3 j CUFF.4 CUFF.4.2 19 0.745358 0.295704 1.225058 1.683125 2196 CUFF.4.1 2517
ARV1 AT1G01020.1 c CUFF.2 CUFF.2.1 100 55.954548 13.748855 27.162373 98.898929 254 CUFF.2.1 1623
ARV1 AT1G01020.1 j CUFF.5 CUFF.5.1 46 3.093444 1.312381 4.874558 6.489805 634 CUFF.5.2 1623
ARV1 AT1G01020.1 j CUFF.5 CUFF.5.2 52 3.5225 1.734063 5.359831 7.389933 720 CUFF.5.2 1623
ARV1 AT1G01020.2 c CUFF.5 CUFF.5.3 100 6.796436 4.118432 9.41356 14.258397 614 CUFF.5.2 1085
ANAC001 AT1G01010.1 j CUFF.6 CUFF.6.1 24 3.075754 1.940899 4.185063 6.97969 1584 CUFF.6.1 1688
ATRAD51D AT1G07745.1 j CUFF.604 CUFF.604.2 53 2.043285 1.104131 2.918061 4.845783 1080 CUFF.604.1 1188
F24B9.13 AT1G07750.1 = CUFF.605 CUFF.605.1 100 27.163205 24.111159 30.21525 63.156172 1296 CUFF.605.1 1414
RPS15A AT1G07770.1 p CUFF.606 CUFF.606.1 100 19.492055 9.645981 17.653966 44.82586 468 CUFF.606.1 725
RPS15A AT1G07770.1 = CUFF.611 CUFF.611.1 74 64.794364 39.888684 57.433706 149.649577 568 CUFF.611.1 725
ATMC8 AT1G16420.1 = CUFF.1258 CUFF.1258.1 100 4.306088 2.441968 5.567687 9.482726 872 CUFF.1258.1 1338
AT1G16515 AT1G16515.1 o CUFF.1259 CUFF.1259.1 100 9.753342 4.017728 10.646979 22.304586 424 CUFF.1259.1 265
F3O9.31 AT1G16510.1 c CUFF.1261 CUFF.1261.1 100 8.44923 4.4527 9.623577 18.571708 593 CUFF.1261.1 872
AT1G16480 AT1G16480.1 c CUFF.1260 CUFF.1260.1 100 1.567716 0.589045 2.473987 3.532044 723 CUFF.1260.1 2814
F3O9.30 AT1G16500.1 = CUFF.1262 CUFF.1262.1 100 27.439232 19.789247 28.908715 64.270531 934 CUFF.1262.1 1005
AT1G16550 AT1G16550.1 o CUFF.1265 CUFF.1265.1 100 2.576989 1.439353 3.640717 5.378117 1006 CUFF.1265.1 2303
F3O9.32 AT1G16520.1 j CUFF.1263 CUFF.1263.1 57 3.932072 2.676381 5.152034 9.010123 1273 CUFF.1263.1 1291
F3O9.32 AT1G16520.1 c CUFF.1263 CUFF.1263.2 100 6.897696 2.557242 9.245414 15.805685 433 CUFF.1263.1 1291
SIR3 AT1G16540.1 j CUFF.1267 CUFF.1267.1 91 2.995379 2.221049 3.786707 6.445242 2519 CUFF.1267.2 2758
SIR3 AT1G16540.1 j CUFF.1267 CUFF.1267.2 100 3.278397 2.452575 4.087624 7.05422 2576 CUFF.1267.2 2758
AT1G16489 AT1G16489.1 j CUFF.1264 CUFF.1264.1 100 2.208782 1.080485 3.349502 5.177359 968 CUFF.1266.1 412
AT1G16489 AT1G16489.1 e CUFF.1266 CUFF.1266.1 100 1.80018 0.957649 2.633535 4.219601 1247 CUFF.1266.1 412
SR45 AT1G16610.1 o CUFF.1269 CUFF.1269.1 100 114.428505 11.655641 26.449338 162.070939 190 CUFF.1269.1 1560
- - u CUFF.1668 CUFF.1668.1 100 8.522216 1.780679 6.825938 15.61758 287 CUFF.1668.1 -

how i can identify novel transcript from my output files????
thank you for your help,
Amrita

**csmatyi** · 04-17-2014, 12:43 PM

The ones marked with j in the 3rd column are new transcripts, according to the cuffcompare manual.

**blancha** · 04-17-2014, 12:49 PM

csmatyi's answer is better than my previous answer.
Here is the full list of class codes from the manual.

---

Class Codes

If you ran cuffcompare with the -r option, tracking rows will contain the following values. If you did not use -r, the rows will all contain "-" in their class code column.
Priority Code Description
1 = Complete match of intron chain
2 c Contained
3 j Potentially novel isoform (fragment): at least one splice junction is shared with a reference transcript
4 e Single exon transfrag overlapping a reference exon and at least 10 bp of a reference intron, indicating a possible pre-mRNA fragment.
5 i A transfrag falling entirely within a reference intron
6 o Generic exonic overlap with a reference transcript
7 p Possible polymerase run-on fragment (within 2Kbases of a reference transcript)
8 r Repeat. Currently determined by looking at the soft-masked reference sequence and applied to transcripts where at least 50% of the bases are lower case
9 u Unknown, intergenic transcript
10 x Exonic overlap with reference on the opposite strand
11 s An intron of the transfrag overlaps a reference intron on the opposite strand (likely due to read mapping errors)
12 . (.tracking file only, indicates multiple classifications)

404 Not Found

http://cufflinks.cbcb.umd.edu/manual.html#class_codes

**am@i** · 04-17-2014, 10:07 PM

Thanks for d reply!!

I have another questions ...

When I compare the assembled transcripts with a reference annotation with cuffcompare, one of the output files is cuff_in.tmap. According to class code column in this file, we can see the relationship between Cufflinks transcripts with reference transcripts. So when the class code is j, it is potentially novel isoform, but how can we validate it is actually a novel isoform????

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Finding novel genes using Cufflinks/Cuffmerge

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News