Unified Hangul Code

Unified Hangul Code
Unified Hangul Code.svg
Layout of the Unified Hangul Code
Alias(es)Windows Code Page 949, IBM Code Page 1363
StandardWHATWG Encoding Standard (as "EUC-KR")[1]
ClassificationExtended ISO 646,[a] variable-width encoding, CJK encoding
  1. ^ Not in the strictest sense of the term, as ASCII bytes can appear as trail bytes, although this is limited to letter bytes.

Unified Hangul Code (UHC),[2][a] or Extended Wansung,[4][b] also known under Microsoft Windows as Code Page 949 (Windows-949, MS949 or ambiguously CP949), is the Microsoft Windows code page for the Korean language. It is an extension of Wansung Code (KS C 5601:1987, encoded as EUC-KR) to include all 11172 Hangul syllables present in Johab (KS C 5601:1992 annex 3).[4][2] This corresponds to the pre-composed syllables available in Unicode 2.0 and later.

Wansung Code has the drawback that it only assigns codes for the 2350 precomposed Hangul syllables which have their own KS X 1001 (KS C 5601) codepoints (out of 11172 in total, not counting those using obsolete jamo), and requires others to use eight-byte composition sequences, which are not supported by some partial implementations of the standard.[5] UHC resolves this by assigning single codes for all possible syllables constructed using modern jamo, by making assignments outside of the encoding space used for KS X 1001.


Unified Hangul Code is not registered with IANA as a standard to communicate information over the Internet.[6] Alternatives include UTF-8. However, the W3C/WHATWG Encoding Standard used by HTML5 incorporates the Unified Hangul Code extensions into its definition of "EUC-KR".[1]

Microsoft assigns Windows-949 the label "ks_c_5601-1987",[7][8] which properly applies to KS X 1001 itself (KS C 5601 being the original name of KS X 1001). The WHATWG treat the label "ks_c_5601-1987" interchangeably with "EUC-KR" with the intent of being "compatible with deployed content".[9] The Unicode Consortium's "OBSOLETE/EASTASIA" collection of withdrawn mappings included mappings for Unified Hangul Code as "KSC5601.TXT", with the automatically derived mappings for 7-bit KS X 1001 being included as "KSX1001.TXT".[10]

IBM's code page 949 is another, otherwise unrelated, extension of EUC-KR. International Components for Unicode (ICU) uses "cp949", "949" or "ibm-949" to refer to that IBM code page,[11] and "ms949" or "windows-949" (or several variants of "ks_c_5601-1987") to refer to the Windows mapping of UHC.[12] Python, by contrast, recognises "cp949", "949", "ms949" and "uhc" as labels for UHC, and does not include an IBM-949 codec.[13] Out of the labels incorporating the code page number, the WHATWG recognise only "windows-949".[9]

IBM's code page for Unified Hangul Code is called Code page 1363 (IBM-1363), or "Korean MS-Win". It is a combination of Code page 1126 and Code page 1362.[14] It differs in having a single byte mapping of 0x5C to the Won sign (U+20A9);[15] Windows maps 0x5C to U+005C (the Unicode code point for the backslash) as in ASCII,[12] although fonts often still render it as a Won sign.[16] Unicode mapping of the wave dash (0xA1AD) also differs, with the IBM mapping favouring U+301C,[17] while the Microsoft mapping favours U+223C (Tilde Operator).[18] The IBM mapping for UHC is available as "ibm-1363" in ICU,[15] whereas the ICU "windows-949" codec is referred to as IBM-1261 in some ICU source code comments.[19]


  1. ^ Korean: 통합형 한글 코드[3], romanizedTonghabhyeong Hangeul Kodeu
  2. ^ Korean: 확장 완성형, romanizedHwagjang Wanseonghyeong


  1. ^ a b van Kesteren, Anne, "5. Indexes (§ index EUC-KR)", Encoding Standard, WHATWG
  2. ^ a b "INFO: Hangul (Korean) Character Sets", Microsoft Support, Microsoft
  3. ^ "한글 코드에 대하여" (in Korean). W3C.
  4. ^ a b Zsigri, Gyula (2002-06-18). "KSC and UHC".
  5. ^ Shin, Jungshik. "What are KS X 1001(KS C 5601) and other Hangul codes?". Hangul & Internet in Korea FAQ.
  6. ^ "Character Sets". Iana.org. Retrieved 2017-01-11.
  7. ^ "Encoding.WindowsCodePage Property - .NET Framework (current version)". MSDN. Microsoft.
  8. ^ "Code Page Identifiers", Windows Dev Center, Microsoft
  9. ^ a b van Kesteren, Anne. "4.2. Names and labels". Encoding Standard. WHATWG.
  10. ^ Jungshik Shin. "KSX1001.TXT: KS X 1001 to Unicode table". Unicode, Inc.
  11. ^ "ibm-949_P110-1999 (alias cp949)", Converter Explorer, International Components for Unicode
  12. ^ a b "windows-949-2000", Converter Explorer, International Components for Unicode
  13. ^ "codecs — Codec registry and base classes § Standard Encodings". Python 3.7.2 documentation. Python Software Foundation.
  14. ^ "Coded character set identifiers - CCSID 1363", IBM Globalization, IBM, archived from the original on 2014-11-29
  15. ^ a b "ibm-1363", Converter Explorer, International Components for Unicode
  16. ^ Kaplan, Michael S. (2005-09-17), "When is a backslash not a backslash?", Sorting it all out
  17. ^ "ibm-1363_P110-1997 (lead byte A1)". ICU Demonstration - Converter Explorer. International Components for Unicode.
  18. ^ "windows-949-2000 (lead byte A1)". ICU Demonstration - Converter Explorer. International Components for Unicode.
  19. ^ See, for reference, ucnv_lmb.cpp (Brendan Murray, Jim Snyder-Grant), where the lead byte 0x11 is commented as referring to "Korean: ibm-1261" after the definition of ULMBCS_GRP_KO, but it is mapped to the "windows-949" ICU codec in the OptGroupByteToCPName array later in the file.

External links[edit]