Today I was learning some C++ basics and came to know about wchar_t. I was not able to figure out, why do we actually need this datatype, and how do I use it?
7 Answers
wchar_t is intended for representing text in fixed-width, multi-byte encodings; since wchar_t is usually 2 bytes in size it can be used to represent text in any 2-byte encoding. It can also be used for representing text in variable-width multi-byte encodings of which the most common is UTF-16.
On platforms where wchar_t is 4 bytes in size it can be used to represent any text using UCS-4 (Unicode), but since on most platforms it's only 2 bytes it can only represent Unicode in a variable-width encoding (usually UTF-16). It's more common to use char with a variable-width encoding e.g. UTF-8 or GB 18030.
About the only modern operating system to use wchar_t extensively is Windows; this is because Windows adopted Unicode before it was extended past U+FFFF and so a fixed-width 2-byte encoding (UCS-2) appeared sensible. Now UCS-2 is insufficient to represent the whole of Unicode and so Windows uses UTF-16, still with wchar_t 2-byte code units.
Comments
wchar_t is a wide character. It is used to represent characters which require more memory to represent them than a regular char. It is, for example, widely used in the Windows API.
However, the size of a wchar_t is implementation-dependant and not guaranteed to be larger than char. If you need to support a specific form of character format greater than 8 bits, you may want to turn to char32_t and char16_t which are guaranteed to be 32 and 16 bits respectively.
Comments
wchar_t is used when you need to store characters with codes greater than 255 (it has a greater value than char can store).
char can take 256 different values which corresponds to entries in the ISO Latin tables. On the other hand, wide char can take more than 65536 values which corresponds to Unicode values. It is a recent international standard which allows the encoding of characters for virtually all languages and commonly used symbols.
1 Comment
sizeof(wchar_t) is greater than 2 on many platforms. I've corrected your post.I understand most of them have answered it but as I was learning C++ basics too and came to know about wchar_t, I would like to tell you what I understood after searching about it.
wchar_tis used when you need to store a character over ASCII 255 , because these characters have a greater size than our character type 'char'. Hence, requiring more memory.e.g.:
wchar_t var = L"Привет мир\n"; // hello world in russianIt generally has a size greater than 8-bit character.
The windows operating system uses it substantially.
It is usually used when there is a foreign language involved.
4 Comments
char as never negative while leaving its signedness as undefined, so char x = 255; has Undefined Behaviour.char is equivalent to either signed char or unsigned char, whichever it being is implementation-defined. Next, here it says that the behaviour of overflow during integer conversion is unspecified. So the result is altogether defined, but may vary based on the implementation.'$' == '@' can be true.wchar_t is specified in the C++ language in [basic.fundamental]/p5 as:
Type
wchar_tis a distinct type whose values can represent distinct codes for all members of the largest extended character set specified among the supported locales ([locale]).
In other words, wchar_t is a data type which makes it possible to work with text containing characters from any language without worrying about character encoding.
On platforms that support Unicode above the basic multilingual plane, wchar_t is usually 4 bytes (Linux, BSD, macOS).
Only on Windows wchar_t is 2 bytes and encoded with UTF-16LE, due to historical reasons (Windows initially supported UCS2 only).
In practice, the "1 wchar_t = 1 character" concept becomes even more complicated, due to Unicode supporting combining characters and graphemes (characters represented by sequences of code points).
Comments
The wchar_t type is used for characters of extended character sets. It is among other uses used with wstring which is a string that can hold single characters of extended character sets, as opposed to the string which might hold single characters of size char, or use more than one character to represent a single sign (like utf8).
The wchar_t size is dependent on the locales, and is by the standard said to be able to represent all members of the largest extended character set supported by the locales.
charis not enough, for example when using Unicode characters and strings.