Unconfigured Ad

**GenoMax** · 12-02-2016, 08:07 AM

Sometimes if you are happy with the current state of the transcriptome (known expressed parts of the genome) then you could choose to do alignments of your data just to that part.

While that is not incorrect, you do run a small risk of having some reads mis-align (since an aligner does its best to align and the read may not have originally come from that region) by restricting to just "known" expressed parts of the genome. If splice sites are provided as well then the programs would not try to look for new ones. Both these modifications speed up the alignments to some extent.

**New2Bioinfo** · 12-03-2016, 01:36 AM

Originally posted by GenoMax View Post

Both these modifications speed up the alignments to some extent.

That's okay. But while using the HISAT2 program, I am extracting the splice site and exon information from the .gtf file. And that information is given to the index builder (Hisat2-build). So, what I am getting is that this information during indexing is helping during alignment.

If I know the splice sites, the reads will not align to those parts where splice sites lie in the middle. Is this correct?
I still don't get how exon info is helping in alignment.

A little more detailed answer would be really really helpful.

Thank you.

**wdecoster** · 12-03-2016, 10:34 AM

The most intuitive explanation might be that those "known" exons and splice sites are used as a suggestion for the read mapping, making mapping much quicker since the aligner "knows" where to look. Reads that don't behave according to the "known" annotation will still get correctly aligned and new splice sites will be discovered.

You are just "telling" the aligner a priori where the splice junctions most likely are (but not restricting the mapping to those junctions/exons).

**New2Bioinfo** · 12-05-2016, 01:59 AM

Okay. That makes sense.

Thank you very much.

**biocomputer** · 12-15-2016, 09:01 AM

Does including exons and splice sites make the alignment more accurate, faster, or both?

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, 07-02-2026, 11:08 AM	0 responses 12 views 0 reactions	Last Post by SEQadmin2 07-02-2026, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 54 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

Indexing- Exons and splice sites

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News