Seqanswers Leaderboard Ad

**Simon Anders** · 03-23-2010, 01:32 PM

I always omit the haplotype sequences from the reference index, for precisely the reason you mention.

Simon

**popto** · 03-23-2010, 01:47 PM

Thank you, Simon, this is very helpful.

**thinkRNA** · 03-23-2010, 02:23 PM

Originally posted by Simon Anders View Post

I always omit the haplotype sequences from the reference index, for precisely the reason you mention.

Simon

How do you determine which region is haplotype sequence?

**Simon Anders** · 03-24-2010, 01:58 AM

I took my reference from Ensembl: ftp://ftp.ensembl.org/pub/current_fa...o_sapiens/dna/

All the files with "HSCHR" in the file name are haplotype variants, e.g., the "HSCHR6_MHC" files contain variants to the the MHC region of chromosome 6. I suggest to simply not include these files when building the reference (unless, of course, you are specifically interested in them, but then you need to do some additional tweaking).

The "nonchromosomal" file contains the "random" contigs. I usually include them, but these contigs are so short that it does not really matter.

Do not take, by the way, the repeat masked ("rm" in the filename) sequences. You should leave checking for repeats to the aligner.

Simon

**pcg** · 02-23-2011, 08:34 PM

Simon,

I presume that if you do exclude the haplotypes in the index then you remove those chromosomes from the GTF annotation file aswell? Right?

So basically if I am understanding correctly the reason then Simon, you remove these haplotypes because there is going to be an alignment problem due to the high similarity between the two chromosomes and you may get false mapping to a chromosome?

Thanks,

**Simon Anders** · 02-24-2011, 12:58 AM

Originally posted by pcg View Post

I presume that if you do exclude the haplotypes in the index then you remove those chromosomes from the GTF annotation file aswell? Right?

Actually, no. The aligner does not need a GTF file, and when counting later (e.g. with my htseq-count script), a feature in the GTF file with a chromosome name that does not appear in the SAM file will not collect any counts anyway.

So basically if I am understanding correctly the reason then Simon, you remove these haplotypes because there is going to be an alignment problem due to the high similarity between the two chromosomes and you may get false mapping to a chromosome?

Especially when looking for differential expression, it is a good idea to discount all non-unique alignments. Now, if the aligner sees several version of, e.g., the MHC, it does not know that these are all variants of the same region but rather treats them as paralogs at different places. So. if a read maps there, the aligner will think that there are multiple mappings, flag the read accordingly, and you will exclude it, ending up with no signal at all at the variant regions, even (or: especially) at the parts of the variant region that are actually conserved and would hence have posed no problem for mapping.

Simon

**pcg** · 02-24-2011, 12:55 PM

Thanks Simon for your reply.

As you rightly point out you do not need a GTF for alignment but if you want to run a cufflinks analysis on the alignment and only want expression for what is currently annotated (in the GTF) then unless you remove those haplotypes from the GTF file you will still see hits to them and expression values?

Thanks in advance,

Topics	Statistics	Last Post
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Today, 10:17 AM	0 responses 6 views 0 reactions	Last Post by seqadmin Today, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 59 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM
Mapping the snoRNAome in Zebrafish to Advance Disease Research by seqadmin Started by seqadmin, 03-18-2025, 12:50 PM	0 responses 50 views 0 reactions	Last Post by seqadmin 03-18-2025, 12:50 PM

Seqanswers Leaderboard Ad

Haplotype and "random" chromosomes

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News