HLA Allele Frequencies and Reference Sets with Maximal Population Coverage

In a given population, certain HLA alleles are more frequently observed than others (http://www.allelefrequencies.net/ ). One practical outcome of this is that a small set of alleles can be compiled to cover most of the population. Such set may be useful in applications such as development of vaccines. Towards this end, we provide such sets for both HLA class I and II molecules:

Class I.txt (1.5 KB) (cite: Weiskopf et al.)
Class II.txt (537 Bytes) (cite: Greenbaum et al.)

Note: These files can be used with the ‘upload allele file’ feature for the MHC binding prediction tools. Also for the class II set of alleles, there is no predictor for “HLA-DPA102:01/DPB114:01”.

The reference sets were prepared using the following criteria:

  1. the most common specificities in the general population, based on data available from DbMHC and allelefrequencies.net

  2. representative of commonly shared binding specificities (i.e., supertypes).

In terms of population coverage, the reference sets for class I and II should provide > 97% and >99%, respectively.

References: