Make your own free website on Tripod.com

+Unicode and Multi-language in Flash MX
by ericlin@ms1.hinet.net


Anything special in this movie ? Select some text and paste to your text editor. Those selectable characters are of different Code page. How do I embed those font ?

I guess maybe in the device font part, you notice that some characters are not displayed correctly because your system font is different from my desired font. I dont know which one is the wrong one.

Drop this topic, if you want to display English characters only. Here I am going to discuss some problems in displaying multi-language characters in Flash MX. How to display Chinese characters, Japanese characters and other non-English special characters ?

You should check the information on Macromedia page first. Here are some more points I want to say.

1. What is code page ?

2. System.useCodepage=true;

3. Use device font with unicode for Chinese characters

4. Use static text or embed font for Chinese characters

5. Embed fonts in different Code Page

6. Plug defineFont2 block to SWF


1.
What is code page ?


It is essential to know this before we go further.

There is one set of English characters, such as A-Z, a-z, numbers and puctations.   They appear on the face of Keyboard and can be typed out directly. They are used internationally. They are represented by two digit codes (a byte); Code 0x41 is 'A', Code 0x56 is 'V' and Code 0x7A is 'z'; They occupies the ASCII codes.Those codes are below 0x80;

There is second set of "European characters" or "special characters" - such as à À Æ Ç ¥  etc. We may say them as A-acute, A-tilde, A-ring, -Yen, etc. They can not be typed out directly by keyboard without combination key. They occupies the code region above 0x80; For example: Code A5 is ' ¥ ';

Two digit of codes (a byte) can represent 256 characters. Chinese characters counts more than 3000. Chinese needs 4 digits to represent a Chinese character. For example Code 'A5 56' is 冬 for traditional Chinese. This is called "wide char" because it occupies 2 bytes rather than 1 bytes. Japanese also contains wide chars.

However, Chinese still needs those ASCII character code sets such as A-Z, numbers etc. So, here is the rule. Chinese sacrifice the "European special characters table", and use them as "leading flags". When computer sees a "special characters", it does not treat it as one characters. It treat it as the first byte of a two byte wide char. 

A 4- bytes stream "34 A5 56 68" will be treated differently by different computer. European computers may treat 34 as "4", A5 as ' ¥ ' (YEN), 56 as "V" and 68 as "h"; Here we get 4 characters. =>  4¥Vh ;

Chinese computer will also treat 34 as "4". Then it meets A5 which falls beyond the ASCII part, thus a European special characte. It is treated as the first byte of a two byte character not a real  character. It read the next byte which is 56. The wide char thus is "A5 56" which  points to a glyph 冬 ("WINTER") in traditional Chinese font. The following byte is 68, and it is treated as "h" . So, Chinese computer sees that byte stream as 3 characters not 4. => 4冬h ;

Here we see a problem. English computer will never reach the glyph of a Chinese Font. And, the Chinese computer will have difficulties to reach the glyph of European special characters. Even we have "Font" installed, computer will just reach the glyph as the rule explained above.

Code page is the rule how local computer interpret non-ASCII char.

Then, what is Unicode ?

Unicode tried to solve many problems existing between different code pages. For the wide char problem, it uses a flag instead of special characters to notify computer that the character is a wide char not a special character. Thus, English computer knows the incoming char is a wide char and Chinese computer knows that incoming char is a special character not a leading byte for a wide char.

The text: "'4¥Vh "=>  wiil be \u0034\u00a5\u0056\u0068 => And the byte stream is:  34 C2 A5 56 68

Note that, the YEN SIGN is represented by two bytes "C2 A5";

The text: " 4冬h "=> will be \u0034\u51ac\u0068 =>And the byte stream is: 34 E5 86 AC 68

Note that, the Chinese character is represented by 3 bytes: "E5 86 AC"; And the unicode is not the same as original "A5 56". You can not acccess it by "\ua556". It is "\u51ac";

We should also remember that,  the original stream is " 34 A5 56 68" for non-unicode code page;


2.
System.useCodePage=true;


When we shift from Flash 5 to Flash MX, the default character handling is shift to Unicode. If we still load our old text file (for example, with a byte stream of "34 A5 56 68"), the MX player will not recognize what they are. The text file should be re-do by the unicode coding rules shown above. 

However, some create those data text file by cgi or php script dynamically. Change of coding rules to unicode requires modification of cgi or php script. That needs further working effort.

If we are reluctant to modify the cgi or php, we can modify our movie instead. We force our movie to use system code page like old Flash 5 rather than Unicode. So that, our movie can recognize the old fasion text stream. 

In case that you use Chinese computer, you may be satisfied when you see Chinese characters correctly displayed on your monitor. They will be displayed correctly in other Chinese computer too, but "Not" in English computer.  If your movie is going to be played by only Chinese computer, then it is all right.  If you are going to make it internationally, then it is bad. In English computer, those Chinese characters will be displayed as some meaningless special characters.

You lost one important feature Flash MX gives to you. You wasted your money in upgrading to Flash MX. Your boss should complain.

So, the better way is let MX flash player use default Unicode handling. Change our text file to Unicode coding as , 34 E4 BD A0 68. Then, both English computer and Chinese computer will display that Chinese character correctly. Make text file as 34 EF BF BD 41 68, then both English and Chinese computer will display special character correctly.

[System.useCodePage=true] is not a good way to solve your problem. 


3.
Use device font with Unicode

While MX palyer is Unicode compatible, the Flash MX authoring tool is not unicode compatible. 

Unlike player, Flash MX use code page to handle character bytes. Flash MX in Chinese computer treat all input either from keyboard or clipboard by the Chinese coding rules. Thus, there is no way to accept special characters on the panel option box or on the action script editing region. English computer has no way to accept Chinese character on the stage of MX authoring tool. 

You can not create a static textfield containing Chinese characters in English computer. They will be changed to special characters. Chinese computers face the same problems for special characters.

Some try to use Unicode compatible word processor to create Chinese text and then copy-paste to Flash MX authoring tools. That will succeed  between two unicode compatiable software. You can paste them from frontpage, word2000, unipad etc to input box of SWF PLAYER, that is unicode compatible. But you will not succed in pasting them to Flash MX authoring tools. 

So, to create a Chinese in English computer, the choice will be "by script"  and "display them with device font"; This is suggested by macromedia.

The pitfall is: How do I know what device fonts there are in user's computer ? What if the correct device font does not exist in user's computer and the substitute font is not correct ?

That is a potential problem. You might not notice that. Most of your colleague, your classmate, most of those guru hanging over on FlashKit board have similar fonts installed like you. No problems for them. They won't notice any problems. Only the weak user who are likely lack of the needed device font.  

For Chinese, the problem is even more complex. There are two set of Chinese font. One is "traditional Chinese Font" with glyph charmap of (big-5 code page). The other is "simplified Chinese Font" with glyph charmap of (GB code page); I have both of them. When I see a Chinese sentence contains many garbage characters, I switch to the other font and usually I get correct and understandable display. However, most Chinese people does not have both Font sets in their computer. So, the display by device font will not satisfy all Chinese.


4.
Use static text or embed font

Device font only ?  I have a fancy Font face, and no way to use it ? How can I do fancy Unicode text ?

Does that mean it is impossible to do Ainmation like mask, rotation, fade-in fade-out for Unicode text ?

We know, in Chinese computer , Chinese Flasher can do fancy Chinese text and animation without any problems. In Japanese computer, Japanese flasher can do the same magic for Japanese text. What we are talking about is how to make a Fancy Japanese characters animation in English computer.

We can not create Japanese static text in Flash MX authoring enviroment.  How about "embed font ?"

Because of code page, we can not input Chinese or Japanese characters to the option box of "embed font panel" to embed font for those characters. If we check to embed "All characters" , that still won't work. Flash MX is still not able to reach those characters in that Font to compile them into SWF.

So, the answer should be 'NO WAY', due to the limitation of code page. If your operating system is Win95 or Win98, there seems no work-around. If you are in windows 2000 or windows XP, you are lucky to have a chance to solve it. I dont know about Mac.

Macromedia says, you got to have Japanese or Chinese version of Flash MX. 

That is not completely right. The correct answer is, you got to have a Chinese or Japanese operating system. (You can use English version of Flash MX.);

The only solution is "Change your computer to Japanese computer".  If your OS is win98, you got to buy Japanese win98 to install ! If your OS is Windows XP, then goto the "setting"=>"region and language"=>"advanced option"=>Change the non-unicode coding page to "Japanese". Then reboot your computer, you got a computer working on Japanese code page. Your Flash MX will be able to handle Japanese now. Your will feel some different appearance in some Flash MX panel.

The next problem is how to input Japanese. That will be a long tutorial and discussion. Check references about microsoft global IME. In fact, to enter Japanese text or Chinese text, it does need further tools in addition to global IME. Those tools hook between keyboard and system.

Anyway, you can copy some Japanese text and paste into your Flash MX without problems now. You can create Japanese static text. You can embed fonts for those Japanese characters. You can check to embed font for 'all characters', although that will include 11441 chars into your movie and your movie will bigger than 1.6MB.

There are several things need to mention.

1. If you are not able to save your fla file, then try to save them to the root of C disk or D disk. Maybe, the present working subdirectory contains characters that are not compatible with Japanese. Naming your subdirectory by ASCII characters will solve the problem.   

2. The important treasure you get is the "SWF".  After you switch your CodePage back to original one, becareful, if you open your fla. Do not publish or test movie. That will create a new SWF and over-write the SWF you created in Japanese code page.

3. If that is static text, you may break them apart to convert to graphic. So you are able to continue your fla editing in your original Code page.

4. Bad news is that, every time you need to modify your movie, you got to switch your code page to Japanese. Annoying.

In addition to the capability to mask and animation, Static text or embedding font is the way 100% sure to display what we want to display. This is specially true for Chinese, when there are two different systems (big-5 and GB system).


5.
Embed font with different code page (combine special character and Chinese ?)


Think about that, if I could include Japanese font glyph into the swf, surely I can use Unicode to show Animation of Japanese characters without the need to edit my fla in Japanese Code page.

I could do it with shared library.

For example, I want to display European text with embed font. I switch my Code page one time to French, and make a shared library with a font symbol. Export it for link. Then I switch back to my Code page, make an fla to import the symbol. Yes, I can display the European text with imported French font. I can do animation and mask because I can make embed fonts to be true. I can edit my fla anytime without the need to switch the Code page to French. What I need is French font, and I have got it.

Can we do this for Japanese ? Not lucky enough. A library swf of Japanese font is about 1.6 MB. And, there is no way to export or embed all characters font for Chinese.

The second bad news is that,  it is allowed only for "complete" font set. It is not allowable for embed font data of "only some characters"; I made a symbol with a textbox and embed fonts of some characters only. Then I export the linkage. The export symbol search the font data in the shared library. And the font data is not available for this swf.

I have a wish for the next version of Flash. I wish the input box of "embed font panel" would allow us to enter characters by "\xxxx" format, so we can embed font for unicode without the need to switch the Code page. 

If switching the Code page is the only way, it is lousy support for Unicode. Consider that, I want to display Japanese, Chinese, French, English in a single sentence. I need to include font data about 8 Japanese characters and 2 Chinese characters ,7 French characters and 15 common English characters in my sentence to assure the correct display and perform animation, how could I do it if I can only switch to one special code page ! 

I wish there will be a "directive code to control swf compilation"; At present, #include sourceScript.as brings action script into swf. I hope "#includeFont Arial(\u4f60) will embed the Arial font of the character "\u4f60"; Anyway, it is not possible to use script to embed the fonts this way by now. 

OK, no way, by Flash MX authoring tool only. 

Lets take a try by manipulating swf in binary level. 

I have an swf containing complete set of glyph data of a Japanese font. I created this swf one time by switching code page to Japanese. From SWF format reference, I know where and how these data are located. I can pick out the vector glyph data of a single Japanese character from this 1.6mb swf which containg 11441 Japanese characters. Think about that, if I pick out vector glyph data of 7 selected Japanese characters and plug into my movie, then I could use those characters with embed font. The data size will be only several kb.

It maybe an easy job by Ming or swfGenerator. But I am not sure whether then can access those glyphs beyond local Code page. If they could, they can reach the glyphs and convert them from TTF to vector glyphs without the need to create font SWF by switching code page.

You got to have some knowledge about the format of swf before you try to understand things below. In the swf file, the data of embedded font is stored in tag 48 (defineFont2) block. Textfield find its embed fonts by fontID not by font name. While "font name" is searched when we use script to specify font name. For example, we put "<FONT FACE='jp_font'>", the player will search for corresponding defineFont2 block with jp_font font and then search the code table to pick glyph. 

We can create defineFont2 data blocks of Chinese characters, Japanese characters and plug them into this swf. So, these font glyphs will be available in the swf. 

However, this binary cracking is difficult to maintain. It needs to be re-plugging after the original swf is updated.

That is how I created the movie shown at the top of this page. The size of that movie is 20kb.


6.
Plug the  defineFont2 block to SWF

I would describe briefly the jobs to plug them in. I did that by java codes.

Step 1:Fetch font glyphs shape data

Switch system code page to Japanese. Create a movie to embed "all characters" of a selected Japanese font. This swf contains 11441 characters and is 1.6 MB in size. For some reason, it includes Chinese characters in it.

Switch system code page to French. Create a movie to embed "all characters" of a selected French font (Arial). This swf contains 244 characters.

These SWFs are published without compression.

Step 2: Create main movie. Publish it without compression.

I include a source file by #include.

The text file to be include contains:

//!-- UTF8
str="<FONT FACE='jp_font'>映画はなんですか</FONT>\nIt is <FONT FACE='ch_font'>電影</FONT>in Chinese,\n<FONT FACE='french_font'>le cinéma</FONT> in French";
txt.htmlText=str;

Step 3: I write java codes to do jobs below:

Parse main SWF to get the original size and the file offset after SWF Header.

Parse the Japanese SWF to get CodeTable, OffsetTable of that font.

Compare my characters with CodeTable and fetch out the shape glyphs from OffsetTable

Construct defineFont2 block

Write this defineFont2 block into the main Movie. 

May repeat the same procedures for French Font.

Here is the java files I write. Not optimized. => FontPlugger.zip


ericlin@ms1.hinet.net