![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Extract gene sequences from gff3 file and reference fasta | JonB | Bioinformatics | 1 | 07-15-2014 01:13 AM |
Annotate diff file with Entrez gene ID | Parharn | Bioinformatics | 2 | 03-06-2014 10:13 AM |
fasta file manipulation- combining sequences by gene rather than species | gevielr | Bioinformatics | 2 | 11-28-2013 04:12 PM |
Question: Searching FASTA file for specific IDs | aw90 | Bioinformatics | 1 | 07-19-2013 04:14 AM |
Extract only sequence ids from fasta file with makeblastdb | angeloulivieri | Bioinformatics | 13 | 07-30-2012 03:41 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: urumqi Join Date: Jul 2014
Posts: 58
|
![]()
I want to get a protein sequences FASTA file for a given list of Entrez Gene IDs, which is shown as blow:
Code:
kurban@kurban-X550VC:~/Desktop$ more Triboliumcastaneum_tf_id.txt 100141790 100142111 100142176 100142203 100142308 654967 655070 655772 655998 |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: bethesda Join Date: Feb 2009
Posts: 700
|
![]()
Here's a hint:
Use the efetch utilty : example for mrna: wget "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=100141790,100142111,100142176,100142203,100142308,654967,655070,655772,655998&rettype=fasta&retmode=text" -O out Getting the protein is the hard part. Full solution echo -e "100141790\n100142111\n100142176\n100142203\n100142308\n654967\n655070\n655772\n655998" | while read G; do curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=gene&db=protein&id=${G}" | grep -A 1 "<Link>" | grep "<Id>" | cut -d '>' -f 2 | cut -d '<' -f 1 | while read S ; do curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=${S}&retmode=text&rettype=fasta" ; done; done from Pierre Lindenbaum's post at biostars: https://www.biostars.org/p/52652/ Note there are multiple isoforms |
![]() |
![]() |
![]() |
#3 |
Member
Location: urumqi Join Date: Jul 2014
Posts: 58
|
![]()
thanks @Richard,
the commend really works like a charm, but the total sequences i wanna extract are 519, so how could i change my file formation Code:
100141790 100142111 100142176 100142203 100142308 654967 655070 655772 655998 sorry , i am now at this. Last edited by kurban910; 07-03-2015 at 10:50 AM. |
![]() |
![]() |
![]() |
Tags |
entrez gene ids |
Thread Tools | |
|
|