Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Aligning RNA-Seq reads in peptide space Blahah404 Bioinformatics 11 06-17-2013 01:04 AM
Multiple Alignment software for huge amount of peptide sequences/cysteine framework LucasVS Bioinformatics 3 03-28-2012 07:15 PM
PubMed: Detection of quasispecies variants predicted to use CXCR4 by ultra-deep pyros Newsbot! Literature Watch 0 12-17-2010 02:20 AM
gsAssembler - predicted genome size? Jordy224 Bioinformatics 2 11-22-2010 09:27 PM
Transcriptome database? kumtl General 13 06-13-2010 08:59 PM

Thread Tools
Old 05-24-2011, 08:04 AM   #1
Location: EU

Join Date: Sep 2010
Posts: 24
Default transcriptome -> predicted peptide database

can anyone recommend a pipeline to process transcriptome data into a predicted tryptic peptide database?

i.e. we want to do some LC-MS/MS and iTRAQ-MS/MS with our organism however the only sequence data available for it are from our own 454/illumina sequencing.

Seqasaurus is offline   Reply With Quote
Old 05-24-2011, 01:17 PM   #2
Senior Member
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178

What software are you planning to use to analyze the mass spec data, Mascot, x!Tandem (GPM)? Do you have the transcriptome data assembled into contigs?

We have used contigs assembled from 454 cDNA data directly in Mascot, you don't need to do anything. Just load the fasta file containing your contigs (as DNA sequence) and Mascot takes care of the rest, translation in all six frames, spectral prediction based on the experimental parameters provided (e.g. digestion method). Of course Mascot is a pricey commercial product but worth it if doing lots of proteomics (no I'm not associated with them). The Global Proteome Machine (GPM) is a free (as in speech and beer) alternative to Mascot. I haven't worked with x!Tandem/GPM for quite some time but I imagine it could hand this type of reference file as easily as Mascot
kmcarr is offline   Reply With Quote
Old 07-14-2011, 09:47 PM   #3
Location: Bay Area

Join Date: May 2011
Posts: 28

I am not sure exactly what you are looking for. It wouldn't be hard to write code to do the 6 frame translation. But you might need more than that depending on what search algorithm you are using. As kmcarr mentioned some search algorithms take fasta files just fine, even DNA sequence. Others require additional files. Mascot is a fine search engine but if you need a free alternative OMSSA works quite well and in our hands gives similar results to Mascot. However it requires additional files for searching. But they can be generated from a fasta file. You probably also need to make a concatenated target/decoy database so you can accruately determine FDR. If you are using OMSSA you could use COMPASS which can make all of the required files for OMSSA searching including the target-decoy database. It also has tools for doing FDR filtering and iTRAQ quantitation. Full disclosure, I was involved in developing COMPASS. But if you go that route and need help send me a message.

Since you are dealing with transcriptome data the 6 frame translation approach seems reasonable. But its definitely a bad idea with whole genome data. Your search space will be large and you will end up getting much fewer IDs at a fixed false discovery rate. Just something to be aware of.
dphansti is offline   Reply With Quote
Old 07-14-2011, 11:55 PM   #4
David Eccles (gringer)
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838

the emboss suite can do 6-frame translation:

As well as tryptic digest predictions:

Both these programs work with multiple sequences (as FASTA) input. EMBOSS is very easy to install on a debian/ubuntu-like system (e.g. install the 'emboss-explorer' package, then visit http://localhost/emboss-explorer/). There are also a few places that have a publicly-accessible emboss installation.
gringer is offline   Reply With Quote
Old 07-15-2011, 05:55 AM   #5
Senior Member
Location: East Coast, US

Join Date: Jun 2010
Posts: 177

Hi Seqasaurus,

What I want to add is that you seem to have a need to de novo assemble the reads, too. All you needs may be implemented with publicly available tools, depending on your internal bioinformatic capabilities and project timeline. Or commercial tools help you too if you want the results faster; commercial tools usually come with technical support.

Best regards,
DZhang is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 08:36 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO