Seqanswers Leaderboard Ad

**mikesh** · 12-19-2013, 02:54 AM

Just at a glance, here is the list of HLA genes in RefSeq
HLA-F
HLA-F-AS1
HLA-F
HLA-G
HLA-H
HLA-A
HLA-J
HLA-L
HLA-E
HLA-B
HLA-C
HLA-DRA
HLA-DRB5
HLA-DQA1
HLA-DRB6
HLA-DQB1
HLA-DRB1
HLA-DQA2
HLA-DQB2
HLA-DOB
HLA-DMA
HLA-DOA
HLA-DMB
HLA-DPB2
HLA-DPA1
HLA-DPB1
HLA-DRB3
HLA-DRB4
Of course to get some trustworthy allele information (which is still far from complete, as I believe) one should check out IMGT/HLA http://www.ebi.ac.uk/ipd/imgt/

As for IGH/K/L, TRA/B/G/D loci, they are also present, as I believe. Here RefSeq only maps locus, and Ensembl transcripts provide more detailed view for V/D/J genes like TRBV7-1. The only list of alleles is available in IMGT (http://www.imgt.org), however this database is insanely hard to browse, and contains many spurious alleles (like a Variable segment allele created from mRNA reference lacking a part near conserved Cys residue). Anyways so far IMGT is the only choice.

I will try to compile our own list of immune receptor segment genes and upload it (would take about a week).

For specialized tasks, like targeted TCR sequencing one should use specialized software. Check out our MiTCR software at http://mitcr.milaboratory.com

**JackieBadger** · 12-19-2013, 05:44 AM

Although not answering your question, I might add that I heard recently that the published cod genome is missing quite a few MHC genes.

I study MHC in non-model organisms using NGS. Can someone please tell me how alleles are designated/characterized in human studies, using traditional approaches and NGS?

Another thing worth noting is that we estimated in our fish species that MHC IIb genes may be duplicated among loci, and are only distinguishable by variation in intron II.
I imagine that as with any genome sequence, accurately including CNVs (identical or very close in sequence identity) is pretty tricky.

**TiborNagy** · 01-08-2014, 01:16 AM

Yes, the hg19 still missing some MHC regions. This is the reason why can be download same haplotypes from UCSC (chr6_apd_hap1, chr6_cox_hap2, etc)

**mikesh** · 01-12-2014, 05:35 AM

Variable, Diversity and Joining segments data

Ok just as promised (sorry for a delay due to holidays). Here are the lists of segments for TRA/B/G/D and IGH/K/L genes of human and mice.
They were originally filtered from IMGT data. The script was made to parse HTMLs from IMGT web page, as no other way to download bulk data exists.
The major allele (marked as *01) was taken, and all alleles that are incomplete (e.g. V segment that missed sequence near conserved Cys residue) and non-functional were removed. We don't use all the alleles as many of them have a spurious evidence (e.g. alleles from cDNA data) and are incomplete. So the ideology here is to use the most frequent allele that is in full agreement with locus it is derived from as reference and derive SNPs from your sequencing data.

Two files are attached:

segments_cdr3.txt with a structure "Species Gene Segment_type Segment_name ReferencePoint Sequence"
Here the reference point marks the position of conserved Cys in Variable segment or Phe/Trp in Joining segment.
In case of Variable segment the reference point is the coordinate of first nucleotide after Cys, so to obtain the Cys residue, e.g. in Java:

Code:

seq.substring(ref - 3, ref)

In case of Joining segment the reference point is the coordinate of first nucleotide before Phe/Trp, to obtain it execute:

Code:

seq.substring(ref + 1, ref + 4)

Example of usage could be found in a working script that performs CDR3 extraction from HTS data:
https://github.com/mikessh/migec/blo...r3Blast.groovy

segments_cdr12.txt with a structure "Species Gene Segment_type Segment_name CDR1start CDR1end CDR2start CDR2end Sequence"
To get CDR1,2 regions use e.g.

Code:

seq.substring(CDR1start, CDR1end)

Hope this would be useful!

Regards,
Mike

Attached Files

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

how much of the MHC is represented in the reference genome?

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News