The Many String Types

Along with the introduction of the new UnicodeString type, the updated internal representation shared by all string types (including the AnsiString type) makes room for some extra improvements in string management. The Delphi R&D team took advantage of this new internal representation (and all the work they did at the compiler level to enhance string management) to actually provide you with multiple data types and even a brand new string type definition mechanism.

35 The fact that two Unicode code points can be displayed as a single grapheme (see the section "Unicode Code Points and Graphemes" in Chapter 1) makes it even harder to map the number of WideChar in a Unicode string to the number of display characters.

The predefined reference counted36 string types, in addition to Uni-codeString, are:

  • AnsiString is a single-byte-per-character string type based on the current code page of the operating system, closely matching the classic AnsiString of past versions of Delphi;
  • UTF8String is a string based on the variable character length UTF-8 format;
  • RawByteString is an array of characters with no code page set, on which no character conversion is accomplished by the system (thus partially resembling the classic AnsiString, when used as a pure character array).37

The type definition mechanism is revealed when you look at the definition of these new string types: type

UTF8String = type AnsiString(6500l);

RawByteString = type AnsiString($FFFF);

In this section I'll cover the AnsiString and custom string types and then the UTF8String type. I'll focus on RawByteString in the following section covering string conversions, as you'll generally use this string type to avoid conversions.

