Looking at UTF

How do we create a table of Unicode characters like those I displayed earlier for ASCII ones? We can start by displaying code points in the Basic Multilingual Plane above 32 (the usual control characters) and excluding what are called surrogate pairs. Not all numeric values are true UTF-16 code points, since there are some non-valid numerical values for characters (called surrogates) used to form a paired code and represent code points above 65535.

As displaying a 256 * 256 grid was quite hard, I've actually kept the grid as is and added a TreeView control on the side to let you pick an arbitrary block of 256 code points to display. I've used a TreeView as there are 256 sections (including the surrogates), so I decided to group them at two levels:

- Char #256 [A]/Char #511 là] ■■■■ Char #512 [À]/Char #767 L] -■ Char #76B []/Chai #1023 [3]

12 Originally UTF-8 was represented by 1 to 6 bytes, to represent any theoretical Unicode code point of the future, but it was later but restricted to use only the formal Unicode definition up to code point 10FFFF. More information, including a map of the different lengths of code points in UTF-8, on http: / / en.wikipedia.org/wiki/Utf-8.

13 The big-endian byte serialization has the most significant byte first, the little-endian byte serialization has the least significant byte first. As we'll see soon, the bytes serialization is often marked in files with a header called Byte Order Mark (BOM).

When the program starts, it fills the TreeView with 16 higher level groups, each containing 16 second level subgroups, thus providing 256 items, each of which can display a grid with 256 characters, for a total of 64K code points (again, not considering those excluded):

procedure TForm30.FormCreate(Sender: TObject); var nTag: Integer; I: Integer; J: Integer; topNode: TTreeNode; begi n for I := 0 to 15 do begi n nTag := I * 16;

topNode := TreeView1.Items.Add (nil, GetCharDescr (nTag * 256) + '/' + GetCharDescr ((nTag + 15)* 256)); for J := nTag to nTag + 15 do begi n if (J < 216) or (J > 223) then begi n

TreeView1.Items.AddChi1dObject ( topNode,

GetCharDescr(J*256+255), Pointer (J));

end el se begi n

TreeView1.Items.AddChi1dObject ( topNode,

'Surrogate Code Points', Pointer (J));

end;

// helper function function GetCharDescr (nChar: Integer): string; begi n if nChar < 32 then

Result := 'Char #' + IntToStr (nChar) + ' [ ]' el se

Result := 'Char #' + IntToStr (nChar) + ' [' + Char (nChar) + ']';

end;

As you can see in the code above, every node of the TreeView gets the number with its page number or starting position as its data field (generally a pointer). This is used whenever you select a second-level element in the TreeView (that is a node that has a parent node) to compute the starting point of the grid:

procedure TForm30.TreeView1Click(Sender: TObject); var

I, nStart: Integer; begi n if (TreeViewl.Selected.Parent <> nil) then begin

// a second level node nCurrentTab := Integer(TreeViewl.Selected.Data); nStart := nCurrentTab * 256; for I := 0 to 255 do begi n

StringGridl.Cells [I mod 16 + 1, I div 16 + 1] := IfThen (I + nStart >= 32, Char (I + nStart), ");

end;

® LtninodeMap WSwl

[i Char=3iy2[ yLharlH2W2[—J

1

2

3

4

5

6

7

8

9

10

ii

12

13

14

15

C1 idi =1Z2£E [ ]/Chur #12543(1] Char =12544 [D]/Char «12799 Io] Char #12800 [H]/Chor #13055 [□] Char =13056 [S]/Char #13311JD] Thar #1311 ? [±]/Ch a r #13 567 [5_] Char #1356® [S]]/Char #13823 (J5] Char #13824 [IE]/Char #14079 [K] Char =14080 [S?8]/Char #14335 Jig] Char =14336 [fi]/Char#14591 [ti]

m

ft

a

m

£

K

»

t3

«

m

SB

t?

16

&

«

»

&

fS

is

fi

S

t

.TE

■iff

«

«

ft

K

H

il

m

m

e

a

«

m

st

15

SF

&

25

¡a

e

3F

S

B

48

fE

^

iS

fg

a

m

ti

tg

S

fit

m

fe

■S

1

64

g

$

H

»

s

«

&

■e

ffi

&

ti

■s

S

ts

80

s

■s

T5

«

M

fl§

ra

is

ti

s

•a

«

S

96

IS

ffi

■s

%

m

S

s

8)

g

t

«

m

s

91

Char J14592 [&]/Char #14847 ]E]

112

S

m

•s

te

1

s

HE

¡a

It

p

ta

S

Char =15104 [^]/Char#15359 («] Char =15360 [®]/Char #15615 [Si] Char =15616 [!5]/Char #15371 (;£] Char =15872 [®]/Char #16127 (.1] Char =?1612i3 [iS]/Chnr #16363 [5]

128

S|

IS

s

«

jE

f$

ts

«

£

«

3

ip

s

S

144 160

ffi

m

n ta

m

n m

ti m

a ss

« €

« ■s

ts 1»

» W

s m

«

g a

,-f

S

176

t

*

r-x

m

SE

a

^

«

St

«

ss

s

a

> Char#iBM [g]/Char#20224 [i2] ;. Char«2IM80 [«J/Char#21320 [JF] » ■ Char-24576 [ffl/Char #28416 [S]

192

g

«

w

W

s

it

^

fiS

K

ft

s

S3

208

15

a

M

M

SP

a

St

*-

IS

E

m

=

X

Char=24576EW/Char#24831[1H Char #24832 PSt]/Char #25087 ¡15] Char #25088 [SS]/Char #25343 !®]

224

®

m

a

iSE

s

&

£

a

m

fli

M

59

»

240

S

r'

s

is

m

Is

£

r#

Si

s

m

s

a

Char #14836 [=§]

Notice the use of the IfThen function to optionally replace the initial 32 characters with an empty string. The starting point of the current TreeView item is kept in the nCurrentTab form field. This information is needed to display the code point and its value as a user moves the mouse over the cells of the grid:

procedure TForm30.StringGridlMouseMove(Sender: TObject;

Shift: TShiftState; X, Y: Integer); var gc: TGridCoord; nChar: Integer; begi n gc := StringGrid1.MouseCoord(X, Y); nChar := (gc.Y - 1) * 16 + (gc.X - 1); StatusBarl.SimpleText :=

GetCharDescr (nCurrentTab * 256 + nChar);

end;

As you use the program and browse the various pages of code points in the various alphabets, you'll often see characters that aren't displayed properly. This is most probably due to the font you are using as not all fonts provide a proper representation for the entire Unicode character set. That's why I've added to the UnicodeMap program the ability to pick a different font (something achieved by double clicking on the grid). You can find more information about this issue in the section "Unicode and Fonts and APIs" later in this chapter.

Was this article helpful?

0 0
Project Management Made Easy

Project Management Made Easy

What you need to know about… Project Management Made Easy! Project management consists of more than just a large building project and can encompass small projects as well. No matter what the size of your project, you need to have some sort of project management. How you manage your project has everything to do with its outcome.

Get My Free Ebook


Post a comment