VSCII

VSCII
Alias(es)x-viet-tcvn5712[1]
Language(s)Vietnamese, English
Created byTCVN/TC1
StandardTCVN 5712:1993
Classification8-bit SBCS;
Extended ASCII (VSCII-2/-3)

VSCII (Vietnamese Standard Code for Information Interchange), also known as TCVN 5712,[2] ISO-IR-180,[3] .VN,[4] ABC[4] or simply the TCVN encodings,[4][5] is a set of three closely related Vietnamese national standard character encodings for using the Vietnamese language with computers, developed by the TCVN Technical Committee on Information Technology (TCVN/TC1) and first adopted in 1993 (as TCVN 5712:1993).[2]

VSCII (TCVN) was used extensively in the north of Vietnam, while VNI was popular in the south.[4] It should not be confused with the similarly-named unofficial VISCII encoding, which was sometimes used by overseas Vietnamese speakers.[4] Unicode and the Windows-1258 code page are now used for virtually all Vietnamese computer data, but legacy files or archived messages may need conversion.

Encodings[edit]

All three forms of VSCII keep the 95 printable characters of ASCII unmodified.

VSCII-3, also known as TCVN 5712-3, VN3 or simply TCVN3,[6] includes the fewest assignments. It is an extended ASCII, because it keeps all 128 codes of ASCII unmodified. It does not re-assign any of the C0 and C1 control codes. Compared to ASCII, it adds 75 characters:

  • 67 lowercase characters, allowing full lowercase support.
  • 7 uppercase characters, allowing uppercase support for the 29 base letters without tone marks.
  • The non-breaking space.

Tone marks on uppercase vowels is accomplished in TCVN3 by switching to an all-capital font.[7]

VSCII-2, also known as TCVN 5712-2 and VN2, is a superset of VSCII-3. It is an extended ASCII, because it keeps all 128 codes of ASCII unmodified. It does not re-assign any of the C0 and C1 control codes, making it conformant with ISO 2022 as a 96-set.[2][3] Compared to VSCII-3, it adds (for a total of 96 non-ASCII characters):

  • 16 more uppercase characters with pre-composed tone marks (for a total of 23 non-ASCII uppercase characters)
  • 5 combining diacritics for tone marks, allowing other combinations of uppercase letters and tone marks to be represented. Combining marks follow the base letter[2] as in VNI (rather than preceding them as in ANSEL).

VSCII-1, also known as TCVN 5712-1 and VN1, is an extension of VSCII-2, and is a modified ASCII, since it replaces 12 of the 33 control characters with precomposed characters. Compared to VSCII-2, it (for a total of 140 non-ASCII characters):

  • Adds 44 more pre-composed uppercase letters, bringing them to the same count as the lowercase
  • Does this by replacing 12 ASCII control characters and allocating 32 graphical characters to the C1 control area, breaking ISO 2022 compatibility

Conversion from VSCII-3 to VSCII-2 or VSCII-1 and conversion from VSCII-2 to VSCII-1 are not necessary, but can result in smaller files.

Conversion from VSCII-1 to VSCII-2 or VSCII-3 and conversion from VSCII-2 to VSCII-3 require expansion of some pre-composed characters.

Character set[edit]

VSCII-1[2]
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
0_
0
NUL
0000
Ú
00DA

1EE4
ETX
0003

1EEA

1EEC

1EEE
BEL
0007
BS
0008
HT
0009
LF
000A
VT
000B
FF
000C
CR
000D
SO
000E
SI
000F
1_
16
DLE
0010

1EE8

1EF0

1EF2

1EF6

1EF8
Ý
00DD

1EF4
CAN
0018
EM
0019
SUB
001A
ESC
001B
FS
001C
GS
001D
RS
001E
US
001F
2_
32
SP
0020
!
0021
"
0022
#
0023
$
0024
%
0025
&
0026
'
0027
(
0028
)
0029
*
002A
+
002B
,
002C
-
002D
.
002E
/
002F
3_
48
0
0030
1
0031
2
0032
3
0033
4
0034
5
0035
6
0036
7
0037
8
0038
9
0039
:
003A
;
003B
<
003C
=
003D
>
003E
?
003F
4_
64
@
0040
A
0041
B
0042
C
0043
D
0044
E
0045
F
0046
G
0047
H
0048
I
0049
J
004A
K
004B
L
004C
M
004D
N
004E
O
004F
5_
80
P
0050
Q
0051
R
0052
S
0053
T
0054
U
0055
V
0056
W
0057
X
0058
Y
0059
Z
005A
[
005B
\
005C
]
005D
^
005E
_
005F
6_
96
`
0060
a
0061
b
0062
c
0063
d
0064
e
0065
f
0066
g
0067
h
0068
i
0069
j
006A
k
006B
l
006C
m
006D
n
006E
o
006F
7_
112
p
0070
q
0071
r
0072
s
0073
t
0074
u
0075
v
0076
w
0077
x
0078
y
0079
z
007A
{
007B
|
007C
}
007D
~
007E
DEL
007F
8_
128
À
00C0

1EA2
Ã
00C3
Á
00C1

1EA0

1EB6

1EAC
È
00C8

1EBA

1EBC
É
00C9

1EB8

1EC6
Ì
00CC

1EC8
Ĩ
0128
9_
144
Í
00CD

1ECA
Ò
00D2

1ECE
Õ
00D5
Ó
00D3

1ECC

1ED8

1EDC

1EDE

1EE0

1EDA

1EE2
Ù
00D9

1EE6
Ũ
0168
A_
160
NBSP
00A0
Ă
0102
Â
00C2
Ê
00CA
Ô
00D4
Ơ
01A0
Ư
01AF
Đ
0110
ă
0103
â
00E2
ê
00EA
ô
00F4
ơ
01A1
ư
01B0
đ
0111

1EB0
B_
176
̀
0300
̉
0309
̃
0303
́
0301
̣
0323
à
00E0

1EA3
ã
00E3
á
00E1

1EA1

1EB2

1EB1

1EB3

1EB5

1EAF

1EB4
C_
192

1EAE

1EA6

1EA8

1EAA

1EA4

1EC0

1EB7

1EA7

1EA9

1EAB

1EA5

1EAD
è
00E8

1EC2

1EBB

1EBD
D_
208
é
00E9

1EB9

1EC1

1EC3

1EC5
ế
1EBF

1EC7
ì
00EC

1EC9

1EC4

1EBE

1ED2
ĩ
0129
í
00ED

1ECB
ò
00F2
E_
224

1ED4

1ECF
õ
00F5
ó
00F3

1ECD

1ED3

1ED5

1ED7

1ED1

1ED9

1EDD

1EDF

1EE1

1EDB

1EE3
ù
00F9
F_
240

1ED6

1EE7
ũ
0169
ú
00FA

1EE5

1EEB

1EED

1EEF

1EE9

1EF1

1EF3

1EF7

1EF9
ý
00FD

1EF5

1ED0

  Letter   Number   Punctuation   Symbol   Other  Undefined

Checkerboard shading indicates characters that are not in VSCII-3. The shaded characters in rows 0_, 1_, 8_, and 9_ are not in VSCII-2 or VSCII-3.

References[edit]

  1. ^ Sivonen, Henri (2014-09-26). "Character encoding changes in m-c require c-c action". mozilla.dev.apps.thunderbird.
  2. ^ a b c d e "[news] TCVN 5712:1993 (VSCII) -- Vietnamese national standard". 1993-06-02. Archived from the original on 2017-01-11.
  3. ^ a b TVCN (1993). "ISO-IR-180: Right-hand Part of the VSCII-2 Code Table" (PDF). ITSCJ/IPSJ.
  4. ^ a b c d e Ngo, Hoc Dinh; Tran, TuBinh. "5. Why Having Vietnamese Charset (Character Set – Encoding) Conversion?". Some special functions of WinVNKey.
  5. ^ Nguyen, Minh T. "Vietnamese Conversions (Vietnet/VIQR, VNI, VPS, VISCII, VNU, TCVN, VietWare, unicode)".
  6. ^ "Unicode & Vietnamese Legacy Character Encodings". Vietnamese Unicode FAQs.
  7. ^ "Unicode & Vietnamese Legacy Character Encodings". Vietnamese Unicode FAQs. TCVN3 is not double-byte, but due to the nature of its encoding, capital letters (vowels) are mapped to a separate, capital font that is similar to the normal, lowercase one.

External links[edit]