Hi all ,
I need to download some random samples of protein sequences in FASTA format.
So where can i find the list of all the protein sequence sequence accession ids(of all the proteins ) on NCBI.
I have downloaded gene2accession from ftp://ftp.ncbi.nih.gov/gene/DATA/
and GbAccList.0304.2012 from ftp://ftp.ncbi.nih.gov/genbank/livel...t.0304.2012.gz .
But Im totally confused about the difference between the two.
From ftp://ftp.ncbi.nlm.nih.gov/genbank/l...bank.livelists
Protein accessions can be easily distinguished from nucleotide accessions because they have a three-letter prefix, followed by five digits. The remaining accessions are nucleotide accessions, in either a one-letter/five-digit format or a two-letter/six-digit format.
So I tried this
OUTPUT
But all of these are givin this error on NCBI : protein ::
"Database is not supported: protein"
So where are the protein sequence id's then ??
I need to download some random samples of protein sequences in FASTA format.
So where can i find the list of all the protein sequence sequence accession ids(of all the proteins ) on NCBI.
I have downloaded gene2accession from ftp://ftp.ncbi.nih.gov/gene/DATA/
and GbAccList.0304.2012 from ftp://ftp.ncbi.nih.gov/genbank/livel...t.0304.2012.gz .
But Im totally confused about the difference between the two.
From ftp://ftp.ncbi.nlm.nih.gov/genbank/l...bank.livelists
Protein accessions can be easily distinguished from nucleotide accessions because they have a three-letter prefix, followed by five digits. The remaining accessions are nucleotide accessions, in either a one-letter/five-digit format or a two-letter/six-digit format.
So I tried this
Code:
cat GbAccList.0304.2012 | sed -n '/^[[:alpha:]][[:alpha:]][[:alpha:]][[:digit:]]/p' | head
Code:
EBA53284,1,134307104 EBA53285,1,134307105 EBA53286,1,134307106 EBA53287,1,134307107 EBA53283,1,134307103 EBA53288,1,134307109 EBA53289,1,134307110 EBA53290,1,134307111 EBA53291,1,134307113 EBA53292,1,134307114
"Database is not supported: protein"
So where are the protein sequence id's then ??