WGS upload prep - table2asn

igwill

Junior Member

Join Date: Nov 2018

Posts: 4
- Share
- Tweet
#1

WGS upload prep - table2asn

12-23-2019, 06:31 PM

Hi,

I've wrapped up an assembly and will soon be uploading a new genome and annotations to NCBI - but am having a little trouble with getting everything packaged nicely for GenBank.

I have a single .fsa and .gff3 with my genome information that I am trying to use with the table2asn_GFF tool, but am getting some errors.

Running my table2asn as so:

Code:

./linux64.table2asn_GFF -i myassembly.fsa -t mytemplate.sbt -J -c w -euk -locus-tag-prefix GQ602 -M n -Z -f myannotations.gff -outdir output_dir

I get an error regarding my protein IDs, not sure why:

Code:

FEATURE_COUNT: CDS: 7455 present FEATURE_COUNT: gene: 7455 present FEATURE_COUNT: mRNA: 7455 present FATAL: MISSING_PROTEIN_ID: 7455 proteins have invalid IDs.

A bit of my gff:

Code:

##gff-version 3 ##sequence-region scaffold_01 1 5595695 scaffold_01 FGDB gene 7249 9339 . + . ID=Ophcf2|00001|gene scaffold_01 FGDB mRNA 7249 9339 . + . ID=Ophcf2|00001;Parent=Ophcf2|00001|gene;proteinId=Ophcf2|00001;Name=Ophcf2|00001 scaffold_01 FGDB exon 7249 7255 . + . ID=Ophcf2|00001|exon1;Parent=Ophcf2|00001 scaffold_01 FGDB exon 7334 9339 . + . ID=Ophcf2|00001|exon2;Parent=Ophcf2|00001 scaffold_01 FGDB CDS 7249 7255 . + 0 ID=Ophcf2|00001|CDS;Parent=Ophcf2|00001 scaffold_01 FGDB CDS 7334 9339 . + 2 ID=Ophcf2|00001|CDS;Parent=Ophcf2|00001

Perhaps something to do with my mRNA ID and proteinId being the same?

I do plan to introduce product=*** for my CDS's, but later once I can even get this first version to work.

(I've tried to poke around a bit with GAG as well, but am getting some errors I've yet to fully understand, but that's another topic)

A nudge in the right direction would be greatly appreciated, thanks!
Tags: None

Previous template Next

Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM
Strategies for Sequencing Challenging Samples

by seqadmin

Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
- Channel: Articles
03-22-2024, 06:39 AM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 47 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

WGS upload prep - table2asn

Latest Articles

ad_right_rmr

News