Seqanswers Leaderboard Ad

**maubp** · 11-29-2012, 07:48 AM

Do you know any scripting/programming language? Both BioPerl and Biopython (and likely other libraries too) could assist you with their EMBL parsers - although in this case you could do this without a full parser.

**jiaco** · 11-30-2012, 06:02 AM

biomaRt (R/bioconductor): http://www.bioconductor.org/packages...l/biomaRt.html

Code:

library( biomaRt )

uniprot = useMart( "unimart" );
uniprot = useDataset( "uniprot", uniprot );

# these can be looked at for more options in search(filters) and retrieve(attributes)

filters = listFilters( uniprot );
attributes = listAttributes( uniprot )

useFilter = c( "accession" );
useAttributes = c( "accession", "gene_name", "go_id", "go_name" );

query = "P41932";
df = getBM( mart=uniprot, values=c(query), filters=useFilter, attributes=useAttributes )

nrow = dim( df )[ 1 ];
s=sprintf( "%s", df[1,2] );
for( i in 1:nrow ) {
        s = sprintf( "%s,GO; %s; %s;", s, df[i,3], df[i,4] );
}

If you have a text file full of accessions and want output with 1 gene per line:

Code:

query = read.table( "queryfile.txt" );
# assume 1st column is accession

query = as.character( query[,1] );

mdf = getBM( mart=uniprot, values=query, filters=useFilter, attributes=useAttributes )

uniqueAccs = unique( sort( as.character( mdf[,1] ) ) );
outvec = vector( mode="character", length=0 );
for( acc in uniqueAccs ) {
        df = mdf[ mdf[,1] == acc, ];
        nrow = dim( df )[ 1 ];
        s=sprintf( "%s", df[1,2] );
        for( i in 1:nrow ) {
                s = sprintf( "%s,GO; %s; %s;", s, df[i,3], df[i,4] );
        }
        outvec = c( outvec, s );
}
write.table( outvec, "myoutfile.txt", quote=F, row.names=F, col.names=F );

(the second code snippet depends on the preamble from the first)

EDIT: I realize I did not answer your question, but this will get the job done without any need for downloading embl files.

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 27 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 43 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 29 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

Extracting information from EMBL flat file

Comment

Comment

Latest Articles

ad_right_rmr

News