Unicode An Alphabet for the Entire World

As I mentioned, all this changed with the introduction of Unicode. The idea behind Unicode (which is what makes it simple) is that every single character has its own unique number (or code point, to use the proper Unicode term). I don't want to delve into the complete theory of Unicode here (if you want to you can refer to the Unicode book with the complete standard 7), but only highlight its key points.

7 More information on "The Unicode Standard" book can be found at: http://www.unicode.org/book/aboutbook.html.

In any case, I'll start by extending the FromAsciiToUnicode program, which has a third button that displays those same 256 characters (256 minus the initial 32 control characters and the space character). This is what you'll get (and this doesn't depend on your locale or Windows page code):

|o 1'

2

3

4

5

6

7

8

9

10

1 1

1:

13

14

15

0

1 6

32

#

I

X

S

(

)

*

+

-

48

[>

1

2

3

4

s

E

7

8

9

;

<

=

>

7

64

@

A

B

C

D

E

F

C

H

1

J

K

L

M

N

O

80

P

Q

R

T

u

V

W

X

V

Z

[

'I

]

A

-

96

-

a

b

c

d

e

f

9

h

i

j

k

1

m

n

1 1 2

P

q

r

s

t

u

V

w

M

V

z

{

1

1

1 28

144

1 60

i

c

£

¥

§

s

■K

-

s

-

1 76

1

+

.

3

'

|J

H

.

1

-

»

K

«

¿

1 92

A

A

A

A

A

A

s

E

E

E

E

1

i

1

1

208

O

N

6

6

O

6

D

0

u

U

U

U

Y

t>

6

224

a

a

a

a

a

a

EE

i

e

e

e

e

i

>

240

&

n

6

0

o

0

0

-

0

u

u

u

ii

V

V

You might expect to see exactly the same sequence of characters, as everyone knows that the initial portion of the Unicode character set maps the ASCII sequence, right? This is, in fact, quite wrong! Only the original ASCII-7 set has a perfect match in Unicode, and most of the other extended characters also match, but not all of them. The portion between 128 and 160, in fact, is different (although to be more precise it is different from Microsoft own interpretation of the Latin 1 code page). If you look at the previous image8, you might notice a collection of seldom used symbols... but there is one that (at least in my area of the world) is quite important, the € currency symbol.

To further test the situation, I've added the following code to the same program, again using the two different characters types, AnsiChar and Char:

8 For a more lively demo based on this example see the YouTube video "Delphi does Unicode", that I made available in August 2008, during the period that Tiburon beta testers were allowed to blog about the new features of the product. Following videos cover other examples in this chapter. The link is: http://www.youtube.com/watch?v=BJMakOY8qbw

procedure TForm30.btnEuroClick(Sender: TObject); var aChar: AnsiChar; uChar: Char; begi n aChar := '€'; uChar := '€';

IntToStr (Ord (aChar))); ShowMessage ( '€ for UnicodeChar is ' + IntToStr (Ord (uChar)));

end;

Keep in mind that the way this code snippet is compiled depends on the --codepage compiler option, which (if not specified) defaults to the operating system code page9. So if you recompile the same code in a different area of the world, without providing an explicit code page, you'll get a different compiled program (not just a different output).

Again, the output you'll get might depend on your settings and look somewhat strange... but we'll have to learn to live with it in the Unicode world. This is what I get:

Fromascntou n ¡code

1 S

€for AnsiChar islZ8

[ 1

Fromasci itou n ¡code

M

€for UnicodeChar is8364

1 OK 1

9 The code page used to compile the program affects only the way it manages the

AnsiChar character, not the Unicode Char. Unicode characters and strings, in fact, ignore the code page altogether (which is a great reason for using them!)

Was this article helpful?

0 0
Project Management Made Easy

Project Management Made Easy

What you need to know about… Project Management Made Easy! Project management consists of more than just a large building project and can encompass small projects as well. No matter what the size of your project, you need to have some sort of project management. How you manage your project has everything to do with its outcome.

Get My Free Ebook


Post a comment