Dear All,
I recently started to used cufflinks to assemble transcripts from my RNA-seq data. With intention to do a Reference Annotation Based Transcript (RABT) assembly, an annotation GTF file of hg19 downloaded from GENCODE was supplied to cufflinks. The following command line was used on several samples:
All of the run had the almost identical error message at the end, only the part within the single quotes were different for 3 samples.
The error:
I found loads of transcript IDs in the annotation file were found in the cufflinks output transcripts.gtf with FPKM and coverage values (non-zero), so I guess the 'duplicated ID' errors didn't suspend the run. But does it have other effects on the assembly result? (For example: ENST00000361547 is ignored and novel transcripts were identified on the same loci)
I noticed that someone chose to amend the annotation file to overcome the problem. But each of cufflnks run can take more than a week to finish, I hope somebody can help me with quicker fix without runing cufflinks from the very begainning. Many thanks.
Here's the bit of ENST00000361547 in the GTF annotation file:
I recently started to used cufflinks to assemble transcripts from my RNA-seq data. With intention to do a Reference Annotation Based Transcript (RABT) assembly, an annotation GTF file of hg19 downloaded from GENCODE was supplied to cufflinks. The following command line was used on several samples:
Code:
cufflinks -p 16 -g gencode.v10.annotation.gtf -M gencode.v10.annotation.rRNA.gtf -b hg19.fa -u -N -o output sorted.bam
All of the run had the almost identical error message at the end, only the part within the single quotes were different for 3 samples.
The error:
Code:
[23:28:52] Loading reference annotation and sequence. Error: duplicate GFF ID 'ENST00000361547.2' encountered!
I found loads of transcript IDs in the annotation file were found in the cufflinks output transcripts.gtf with FPKM and coverage values (non-zero), so I guess the 'duplicated ID' errors didn't suspend the run. But does it have other effects on the assembly result? (For example: ENST00000361547 is ignored and novel transcripts were identified on the same loci)
I noticed that someone chose to amend the annotation file to overcome the problem. But each of cufflnks run can take more than a week to finish, I hope somebody can help me with quicker fix without runing cufflinks from the very begainning. Many thanks.
Here's the bit of ENST00000361547 in the GTF annotation file:
Code:
chr1 HAVANA transcript 26126667 26144713 . + . gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA Selenocysteine 26128584 26128586 . + . gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA Selenocysteine 26139280 26139282 . + . gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA exon 26126667 26126904 . + . gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA CDS 26126722 26126904 . + 0 gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA start_codon 26126722 26126724 . + 0 gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA exon 26127534 26127651 . + . gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA CDS 26127534 26127651 . + 0 gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA exon 26128507 26128608 . + . gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA CDS 26128507 26128608 . + 2 gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA exon 26131633 26131766 . + . gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA CDS 26131633 26131766 . + 2 gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA exon 26135071 26135280 . + . gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA CDS 26135071 26135280 . + 0 gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA exon 26135517 26135641 . + . gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA CDS 26135517 26135641 . + 0 gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA exon 26136174 26136311 . + . gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA CDS 26136174 26136311 . + 1 gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA exon 26137945 26138026 . + . gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA CDS 26137945 26138026 . + 1 gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA exon 26138182 26138370 . + . gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA CDS 26138182 26138370 . + 0 gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA exon 26139178 26139283 . + . gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA CDS 26139178 26139283 . + 0 gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA exon 26140372 26140484 . + . gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA CDS 26140372 26140484 . + 2 gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA exon 26140568 26140669 . + . gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA CDS 26140568 26140669 . + 0 gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA exon 26142039 26144713 . + . gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA CDS 26142039 26142206 . + 0 gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA stop_codon 26142207 26142209 . + 0 gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA UTR 26126667 26126721 . + . gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1"; chr1 HAVANA UTR 26142207 26144713 . + . gene_id "ENSG00000162430.12"; transcript_id "ENST00000361547.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "SEPN1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "SEPN1-001"; level 2; tag "CCDS"; tag "seleno"; ccdsid "CCDS41282.1"; havana_gene "OTTHUMG00000007375.2"; havana_transcript "OTTHUMT00000019314.1";
Comment