SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
exon-exon junction shazi828 Bioinformatics 0 11-17-2011 07:18 AM
finding exon numbers in fasta exon file efoss Bioinformatics 1 10-20-2011 03:57 PM
Finding Exon-Intron Junctions without a reference genome brachysclereid Bioinformatics 3 05-22-2011 06:21 AM
Alignment at exon-exon junctions Boel RNA Sequencing 2 12-09-2010 11:12 AM
Exon-Junction mapping: re-assigning CDS-mapped reads to chromosomes sridharacharya RNA Sequencing 1 10-21-2010 04:07 PM

Reply
 
Thread Tools
Old 10-12-2010, 09:47 AM   #1
vincebrown
Junior Member
 
Location: Baltimore

Join Date: Jul 2010
Posts: 6
Default Finding exon-exon junction

Hi,

I have a list of around 1000 peptides and I want to find which one of those might have come from a gene coded by an exon-exon junction.

I shall be thankful if somebody can help me figure this out as I am not sure of a precise way to perform this task.

Vince
vincebrown is offline   Reply With Quote
Old 10-12-2010, 11:05 AM   #2
liux
Member
 
Location: Midwest

Join Date: Mar 2009
Posts: 30
Default

try wise2: http://www.ebi.ac.uk/Tools/Wise2/index.html
liux is offline   Reply With Quote
Old 10-12-2010, 11:23 AM   #3
malachig
Senior Member
 
Location: WashU

Join Date: Aug 2010
Posts: 117
Default

Another option is to compare them to a database of peptides corresponding to exon-exon junctions. For example, these are made available as part of a recent publication here:

ALEXA-Seq downloads

For human genes, the files are available for two versions of the genome here: hg18 and hg19

Each known or hypothetical junction corresponds to an Ensembl exon.
malachig is offline   Reply With Quote
Old 10-12-2010, 12:47 PM   #4
vincebrown
Junior Member
 
Location: Baltimore

Join Date: Jul 2010
Posts: 6
Default

Hi,

Thanks for the information. Although I forgot to mention
that these are sequences for a fungi Pichia Pastoris.

The genomic information is available at NCBI, I am planning to use the algorithm
and download the NCBI genome and try.

Do you think this is the correct way of doing it for this specific organism?

Thanks,
Vince
vincebrown is offline   Reply With Quote
Old 10-12-2010, 01:51 PM   #5
malachig
Senior Member
 
Location: WashU

Join Date: Aug 2010
Posts: 117
Default

Yes, the option suggested by liux should work for you then. If you are only concerned with identifying matches to known genes, you could also compare your list of peptides to the known ORFeome of your species (say using blastp). Or perhaps a six-frame translation of the transcriptome (say using tblastn) if you do not want to figure out the actual ORFs.
malachig is offline   Reply With Quote
Old 10-12-2010, 03:59 PM   #6
vincebrown
Junior Member
 
Location: Baltimore

Join Date: Jul 2010
Posts: 6
Default

Hi,

Thanks for being patient, I am beginner in bioinformatics analysis

All I am concerned is I have a set of peptides and I would like to know if they came from an exon-exon junction, which means there was a splicing event that took place as the coverage of the peptides were from more than one gene.

If I use a tblastn to figure, can the results distinguish between the peptides
which came entirely from one know gene and ones which came from a junction?

Liux and your reply seems quite promising but if a blast can solve the problem I would prefer that, my mentor did not indicate specific tools to do this.

Thanks again,
Vince
vincebrown is offline   Reply With Quote
Old 10-12-2010, 04:42 PM   #7
litc
Member
 
Location: China

Join Date: Oct 2010
Posts: 24
Default

Quote:
Originally Posted by liux View Post
I agree with that, Genewise can do that job.
litc is offline   Reply With Quote
Old 10-12-2010, 04:44 PM   #8
malachig
Senior Member
 
Location: WashU

Join Date: Aug 2010
Posts: 117
Default

When one says 'splicing event' it is usually understood to mean the joining of exons of a single gene. This is how a pre-messenger RNA becomes a mature messenger RNA. There are certainly peptides that correspond to the junctions of adjacent exons in a gene.

Based on your last post, it sounds like you are talking about something else because you refer to peptides from "more than one gene". This is an important distinction because it influences the analysis approach you would take. Splicing can occur between different genes by a process called 'trans-splicing' although this is much less understood than constitutive splicing and alternative splicing. You may also be referring to a 'fusion gene'. These occur when the genome itself has been rearranged. For example, if a rearrangement happens and the break point is within an intron you can get a fused gene (some people call them chimeras) where exons from two different genes may get spliced together into a novel fusion transcript. Detecting these is practically an entire field of next generation sequencing analysis. If that is what you are trying to detect, then the analysis approach would have to be altered.

Perhaps we should back up slightly to understand the goal more clearly. What is the nature or your data? How was it generated?

I would also suggest that you quickly read up about RNA splicing, trans-splicing, and fusion genes. Which of these (if any) are you interested in?.
malachig is offline   Reply With Quote
Old 10-12-2010, 05:52 PM   #9
vincebrown
Junior Member
 
Location: Baltimore

Join Date: Jul 2010
Posts: 6
Default

@malachig

Yes I should step back a little and try to focus on the goal. I am asked to find out from a list of peptide data obtained from Mass Spec, if these peptides span more than one exon ie . they are from exon-exon junctions. I may have got confused with alternate splicing.

Can you suggest a method, just to check which are likely to span more than one exon.?
vincebrown is offline   Reply With Quote
Old 10-12-2010, 06:25 PM   #10
malachig
Senior Member
 
Location: WashU

Join Date: Aug 2010
Posts: 117
Default

Sounds like the original suggestion of wise2 would be the easiest. If you don't like wise2 or would like another option, any gapped aligner that accepts protein sequence should work. For example, Exonerate "will allow introns in the alignment, but also allow frameshifts, and exon phase changes when a codon is split by an intron". Instructions for using Exonerate for this purpose are here. You can obtain the genome sequence various places including at www.pichiagenome.org and bioinformatics.psb.ugent.be

If the alignment is reported as a single block, then the peptide likely does not span a junction. If you get a nice gapped alignment, and the boundaries look like valid splice sites, then you probably have a junction peptide. There is a caveat though. Gapped aligners require a reasonable amount of sequence on both sides of the junction to create an accurate gapped alignment. If your peptides are very short it may make this task difficult.

Also, if P. pastoris is like other members of the Saccharomycetaceae family (such as bakers yeast) it may have a relatively simple transcriptome. Many genes may consist of only a single exon and only a subset may actually have multiple exons. So peptides corresponding to junctions might be rare for that reason as well.... I'm not familiar with this species. Presumably, pertinent information is readily available in the genome paper for P. pastoris
malachig is offline   Reply With Quote
Old 11-16-2010, 04:08 PM   #11
malachig
Senior Member
 
Location: WashU

Join Date: Aug 2010
Posts: 117
Default

Just noticed an alternative tool that might serve the same function as wise2 for this problem called 'ProSplign' that has recently become available (manuscript still in preparation according to the website). From the website:

Quote:
ProSplign is a utility for computing the alignment of proteins to genomic nucleotide sequence. This alignment can include eukaryotic splicing. At the heart of the program is a global alignment algorithm that specifically accounts for introns and splice signals. It is due to this algorithm that ProSplign is accurate in determining splice sites and tolerant to sequencing errors.

ProSplign uses BLAST hits to identify possible locations of genes and their duplications on genomic sequences and then to speed up the core dynamic programming.

ProSplign was developed with the following goals in mind:

* Accuracy in determining splice signals
* Recognition of short exons and non-consensus splices where feasible
* Ability to identify and separate multiple compartments typically representing gene copying events

ProSplign is used to compute transcript alignments as a part of the NCBI Genome Annotation Pipeline.

Reference: ProSplign - Protein to Genomic Alignment Tool. B. Kiryutin, A. Souvorov, T. Tatusova. Manuscript in preparation
malachig is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:31 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO