SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
TopHat Error: Splice sequence indexing failed with err =255 EugeneID Bioinformatics 1 09-27-2012 10:29 AM
splice sites-annotation rudi283 Bioinformatics 0 09-23-2011 04:36 AM
reference sequence - splice sites marada General 2 02-11-2010 05:08 AM

Reply
 
Thread Tools
Old 12-02-2016, 06:06 AM   #1
New2Bioinfo
Junior Member
 
Location: India

Join Date: Dec 2016
Posts: 4
Post Indexing- Exons and splice sites

Hi,
I have been following the github tutorial, https://github.com/griffithlab/rnaseq_tutorial/wiki

to learn RNAseq.

I was on the indexing step and it says to first export exons and splice sites from reference genome using in-built python scripts before starting the indexing. And then that this information will be used during alignment.

It would be great if someone could explain the rationale behind this step.
__________________
----
New2Bioinfo is offline   Reply With Quote
Old 12-02-2016, 07:07 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,384
Default

Sometimes if you are happy with the current state of the transcriptome (known expressed parts of the genome) then you could choose to do alignments of your data just to that part.

While that is not incorrect, you do run a small risk of having some reads mis-align (since an aligner does its best to align and the read may not have originally come from that region) by restricting to just "known" expressed parts of the genome. If splice sites are provided as well then the programs would not try to look for new ones. Both these modifications speed up the alignments to some extent.
GenoMax is offline   Reply With Quote
Old 12-03-2016, 12:36 AM   #3
New2Bioinfo
Junior Member
 
Location: India

Join Date: Dec 2016
Posts: 4
Default

Quote:
Originally Posted by GenoMax View Post
Both these modifications speed up the alignments to some extent.
That's okay. But while using the HISAT2 program, I am extracting the splice site and exon information from the .gtf file. And that information is given to the index builder (Hisat2-build). So, what I am getting is that this information during indexing is helping during alignment.

If I know the splice sites, the reads will not align to those parts where splice sites lie in the middle. Is this correct?
I still don't get how exon info is helping in alignment.

A little more detailed answer would be really really helpful.

Thank you.
__________________
----
New2Bioinfo is offline   Reply With Quote
Old 12-03-2016, 09:34 AM   #4
wdecoster
Member
 
Location: Antwerp, Belgium

Join Date: Oct 2015
Posts: 93
Default

The most intuitive explanation might be that those "known" exons and splice sites are used as a suggestion for the read mapping, making mapping much quicker since the aligner "knows" where to look. Reads that don't behave according to the "known" annotation will still get correctly aligned and new splice sites will be discovered.

You are just "telling" the aligner a priori where the splice junctions most likely are (but not restricting the mapping to those junctions/exons).
wdecoster is offline   Reply With Quote
Old 12-05-2016, 12:59 AM   #5
New2Bioinfo
Junior Member
 
Location: India

Join Date: Dec 2016
Posts: 4
Default

Okay. That makes sense.

Thank you very much.
__________________
----
New2Bioinfo is offline   Reply With Quote
Old 12-15-2016, 08:01 AM   #6
biocomputer
Member
 
Location: Canada

Join Date: Dec 2013
Posts: 62
Default

Does including exons and splice sites make the alignment more accurate, faster, or both?
biocomputer is offline   Reply With Quote
Reply

Tags
exons, hisat2, indexing, ngs, splice sites

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:38 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO