> Also, the new xterm support -u8, there is ja and ko in ISO 10646-1, > but how to make ISO 10646-1 font works with Chinese?
Linux Chinese HOWTO is quite out dated. The latest version is v1.04, 2 June 1998, as of 2002.10.29.
http://linuxselfhelp.com/HOWTO/Chinese-HOWTO.html http://www.ibiblio.org/pub/Linux/docs/HOWTO/Chinese-HOWTO
Markus Kuhn created 1999-06-04 — last modified 2002-09-03, as of 2002.10.29.
http://www.cl.cam.ac.uk/~mgk25/unicode.html
Sampler UTF-8 web page by the Kermit project http://www.columbia.edu/kermit/utf8.html
Most recent update: Thu Oct 17 09:32:50 2002, as of 2002.10.29.
v1.0, 23 January 2001, as of 2002.10.29.
This document describes how to change your Linux system so it uses UTF-8 as text encoding.
http://www.linuxdoc.org/HOWTO/Unicode-HOWTO.html http://www.tldp.org/HOWTO/Unicode-HOWTO.html
http://rf.net/~james/perli18n.html
Link http://rf.net/~james/perli18n.html Date 2002 02 18
Q0. Do you have a checklist for internationalizing an application? Q1. I think that I'm a clever programmer. What's so hard about internationalization? Q2. Do you have a glossary of commonly used terms and acronyms?
Perl and locales, Unicode, porting, modules and CPAN
Q3: What locale support does Perl have? Q4. What support does Perl have for Unicode? Q5. How do operating systems implement Unicode and i18N? Q6. I'm a Perl Porter. What should I know about i18N and C? Q7. I'm a Perl Porter. What should I know about Perl and Unicode? Q8. I'm a CPAN module author. What should I know about Perl and Unicode? Q9. Do regular expressions work with locales? Q10. Do regular expressions work with Unicode? Q11. What are these CPAN Unicode modules for? Q11b. What about i18N POD? Q12. What is JPerl?
More General Unicode and Programming Information
Q13. Can I just do nothing and let my program be agnostic of character set? Q14. Why and where should I use Unicode instead of native encodings? Q15. What is Unicode normalization and why is it important? Q16. How do I do auto-detection of Unicode streams? Q17. Is Unicode big endian or little endian? Q18. Is there an EBCDIC-safe transformation of Unicode? Q19. Are there security implications in i18N? Q20. Are there performance issues in i18N? Q21. How do I localize strings in my program? Q22. I do database programming with Perl. Can I use Unicode? Q23. I do database programming with Perl. What are the i18N issues? Q24. How do other programming languages implement Unicode and i18N?
Internationalized Web Programming
Q25. What support for Unicode do web browsers have? Q26. How can I i18N my web pages and CGI programs? Q27. How should I structure my web server directories for international content? Q28. Can web servers automatically detect the language of the browser? Q29. What format do I send strings to the translator?
Internationalized Email Programming
Q30. What are common encodings for email?
iDNS
Q30b. What is happening with internationalized DNS?
Timezones
Q30c. How can I manage timezones in Perl?
References
Q31. Any good references?
Perl Hacks
Q32. How do I convert US-ASCII to UTF-16 on Windows NT? Q33. How do I transform the name of a character encoding to the MIME charset name?
http://people.netscape.com/ftang/i18n.html#detect
UTF-8
Is This File UTF8? isutf8.pl
IsUTF8 in C
A improved version of IsUTF8 in Mozilla source IsUTFText
Cyrillic Charset Detection - http://www.neystadt.org/cyrillic/Lingua-DetectCharset.htm
Chinese Charset Detection
Chih-Hao Tasi's Frequency and Stroke Counts of Chinese Characters
SinoDetect by Erik Peterson <eepeter@erols.com>
http://www.mandarintools.com/download/SinoDetect.h http://www.mandarintools.com/download/SinoDetect.cpp http://www.mandarintools.com/download/detecttest.cpp
Justin Yu's algorithm - http://www.ihep.ac.cn/~yumj/www/chrecog.html
http://www.xfree86.org/pipermail/i18n/2001-March/001379.html
> Also, the new xterm support -u8, there is ja and ko in ISO 10646-1, > but how to make ISO 10646-1 font works with Chinese?
The ja and ko fonts also contain all glyphs from the commonly used Chinese character sets. They just prefer glyphs from Japanese or Korean character sets where multiple glyphs were available for a single Unicode position. The ideographic *-ISO10646-1 fonts from the ucs-fonts-asian package are a bit of an experimental nature and comments would be very appreciated. I can easily generate a cn version as well using the same software that we used to merge the 18x18 ja and ko fonts. Just suggest a priority order of existing fonts that you would prefer to see merged into a cn font. See the .changes files in the ucs-fonts-asian for documentation on how these fonts were generated.
http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts-asian.tar.gz
With regard to UTF-8 support for xterm:
We seem unfortunately be heading towards a version split. XFree86 has long ago started its own very actively maintained development thread of xterm, managed by Thomas Dickey
http://dickey.his.com/xterm/ ftp://dickey.his.com/xterm/
with various extensions by Robert Brady:
http://www.zepler.org/~rwb197/xterm/
The semantics and design ideas behind this xterm version is summarized for example on
http://www.cl.cam.ac.uk/~mgk25/unicode.html#xterm
in particular considering the behaviour with regard to choosing between normal and wide characters and handling combining characters. This xterm has deliberately (temporarily) hardwired-in support for
a UTF-8 encoder/decoder
a Unicode-specific wcwidth function
Unicode-specific Normalization Form C mapper
in order to guarantee portable usability even on platforms without UTF-8 locale support.
On the other hand, there is now an independent new and not yet widely used Li18nux/X.Org patch for xterm available that is more based on the i18n mechanics of X11 (X Output Methods in particular) that was originally introduced to accomodate national CJK encodings and the suitability of which for Unicode support is still a somewhat controversial topic.
http://www.li18nux.org/subgroups/utildev/dli18npatch.html
How suitable it is in practice for UTF-8 usage (especially considering the large number of practical detail issues that have been discussed on the linux-utf8 mailing list during the past year) will have to be tested thoroughly first.
I'm somewhat disappointed that this second xterm development thread by Li18nux/X.Org was never properly announced/advertised to XFree86 xterm developers here. There still seem to be disappointing communication problems.
Markus G. Kuhn
LANG=zh_CN xterm -u8 -fn -misc-fixed-medium-r-semicondensed--0-0-75-75-c-0-iso10646-1 -e luit &
or,
LANG=zh_CN xterm -u8 -fn -misc-zysong18030-medium-r-normal--0-0-0-0-p-0-iso10646-1 -e luit &
LANG=zh_CN xterm -u8 -fn '-arphic technology co.-ar pl kaitim gb-medium-r-normal--0-0-0-0-p-0-iso10646-1' -e luit & LANG=zh_CN xterm -u8 -fn '-arphic technology co.-ar pl kaitim big5-medium-r-normal--0-0-0-0-c-0-iso10646-1' -e luit &
then
date
misc-zysong18030-…-c-0-iso10646-1 (almost?) has all charaters, but the English charaters take up double spce. awful!
xterm is still normal size.
LANG=zh_TW xterm -u8 -fn -misc-fixed-medium-r-semicondensed--0-0-75-75-c-0-iso10646-1 -e luit &
— couldn't find charset data for locale zh_TW; using ISO 8859-1.
xterm is double width.
LANG=zh_TW xterm -u8 -fn '-arphic technology co.-ar pl mingti2l big5-medium-r-normal--0-0-0-0-c-0-iso10646-1' -e luit & LANG=zh_TW xterm -u8 -fn '-arphic technology co.-ar pl kaitim big5-medium-r-normal--0-0-0-0-c-0-iso10646-1' -e luit &
— couldn't find charset data for locale zh_TW; using ISO 8859-1.
xterm is double width.
LANG=zh_TW.Big5 xterm -u8 -fn -taipei-fixed-medium-r-normal--0-0-75-75-c-0-big5-0 -e luit &
http://www.debian.or.jp/~kubota/xterm.html
I am working on internationalization (i18n)-related improvement of XTerm, which is included in the distribution of XFree86 and is the most widely used terminal emulator on X Window System in the world.
(2002-09-15) Though internationalization (i.e. LC_CTYPE locale sensibility) has almost finished on 2002-08-17 patch, automatic font selection was not implemented. This means, when XTerm automatically uses UTF-8 mode (luit-using locale-sensible mode also uses UTF-8 mode internally), *-iso10646-1 fonts should be used automatically instead of 8bit fonts.
# (2002-08-17) My 2002-07-18 patch was integrated into CVS repository of XFree86. Now you can use locale-sensibility without any of my patches. We now will use various encodings by XTerm! By improving luit, XTerm will support more encodings. (For example, TCVN, GBK, and Shift_JIS will be supported by using 2002-07-04 patch).
Internationalization (i.e. LC_CTYPE locale sensibility) has almost finished on 2002-08-17 patch, automatic font selection was implemented (patched) on 2002-09-15.
The download is two folds: XTerm (cvs 20020817), and font patch. http://www.debian.or.jp/~kubota/softwares/xterm-20020817.tar.gz http://www.debian.or.jp/~kubota/softwares/xterm-20020918-ufont.diff.gz
My work is based on:
the original XTerm by Thomas Dickey and
fine patch by Robert Brady.
rpm -ih XFree86-devel-4.2.0-72.i386.rpm
cd /usr/X11R6/lib/ ln -s libXaw.so.7.0 libXaw.so ln -s libXmu.so.6.2 libXmu.so
cd /usr/local/lib ln -s /usr/X11R6/lib/libXaw* /usr/X11R6/lib/libXmu* .
cd somewhere tar -xvzf ../xterm-20020817.tar.gz cd xterm-20020817/
cp ~/xterm-20020918-ufont.diff.gz . gunzip xterm-20020918-ufont.diff.gz
patch -p1 < xterm-20020918-ufont.diff
chmod 755 configure configure --enable-256-color --enable-logging --enable-tcap-query --enable-luit --enable-wide-chars --enable-warnings
make gcc -g -O2 -W -Wall -Wbad-function-cast -Wcast-align -Wcast-qual -DXTSTRINGDEFINES -Winline -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wpointer-arith -Wshadow -Wstrict-prototypes -L/usr/X11R6/lib -o xterm button.o charproc.o charsets.o cursor.o data.o doublechr.o fontutils.o input.o main.o menu.o misc.o print.o ptydata.o screen.o scrollbar.o tabs.o util.o xstrings.o VTPrsTbl.o TekPrsTbl.o Tekproc.o charclass.o precompose.o wcwidth.o -L/usr/X11R6/lib -lXaw -lXmu -lXext -lXt -lSM -lICE -lX11 -lnsl /lib/libtermcap.so.2.0.8 make
make install make install-ti
mkdir /tmp/xterm-i18n make -n uninstall | sed 's/^rm -f/ echo/' | sh | cpio -vpdm !$
luit -list LANG=zh_CN LC_CTYPE=zh_CN xterm & date
Turn on support of various encodings according to users' LC_CTYPE locale setting, i.e., LC_ALL, LC_CTYPE, or LANG variables. This is achieved by turning on UTF-8 mode and by invoking luit for conversion between locale encodings and UTF-8. (luit is not invoked in UTF-8 locales.) All you need is an iso10646-1 font regardless of your locale and encoding. This corresponds to the locale resource.
The actual list of encodings which are supported is determined by luit. Consult the luit manual page for futher details.
Not working:
LC_CTYPE=zh_CN.GB18030 xterm & LANG=zh_CN.GB18030 LC_CTYPE=zh_CN.GB18030 xterm & LC_CTYPE=zh_CN.GB18030 xterm & LANG=zh_CN LC_CTYPE=zh_CN xterm & LANG=zh_CN xterm -lc & LANG=zh_CN xterm -lc -u8 -e luit & LANG=zh_CN xterm -u8 -e luit LANG=GB2312 xterm -u8 -e luit LANG='GB 2312' xterm -u8 -e luit xterm -u8 -e luit -g2 'GB 2312' LANG=zh_CN xterm -u8 -e luit -g2 'GB 2312' xterm -u8 -e luit -g2 'GB 2312'
xterm -u8 -fn -misc-fixed-medium-r-semicondensed--0-0-75-75-c-0-iso10646-1 -e luit -g2 'GB 2312' &
then
LANG=zh_CN date
or,
export LANG=zh_CN
Using the luit trick, it worked fine but a great many charaters where missing. I was viewing a chinese frequency list (i.e most common characters at beginning, least common at end) and many very early ones were missing. But at least the whole mechanism seems to work.
Q: which/where Chinese font does luit looks for for the translation?
because
xfd -fn -misc-fixed-medium-r-semicondensed--0-0-75-75-c-0-iso10646-1 &
shows no Chinese fonts.
This also works!
xterm -u8 -fn -misc-zysong18030-medium-r-normal--0-0-0-0-c-0-iso10646-1 -e luit -g2 'GB 2312'
Invoking the above "working" command with LANG=zh_CN will cease to work.
Using the simsun font won't work. I.e., tried but failed:
xterm -u8 -fn -microsoft-simsun-medium-r-normal--0-0-0-0-c-0-gb18030-0 -e luit -g2 'GB 2312' &
black screen, no characters shown (thorough you know them there) big cursor.
Using bitmap fonts is also nok,
xterm -u8 -fn '-isas-fangsong ti-medium-r-normal--0-0-72-72-c-0-gb2312.1980-0' -e luit -g2 'GB 2312' &
Result is almost identical with above MS TrueType font.
The above test & result duplicated and verified in RH8 (2003.10.27 Mon), without any changing to current xfree and xterm. And even direct load works too:
LANG=zh_CN xterm -u8 -fn -misc-fixed-medium-r-semicondensed--0-0-75-75-c-0-iso10646-1 -e luit -g2 'GB 2312' &
It is hardly usable.
misc-fixed-medium-…-iso10646-1 misses a great many charaters, but it looks good.
misc-zysong18030-…-c-0-iso10646-1 (almost?) has all charaters, but it really bad. The English charaters also take up double spce. awful!
What's important, I can nolonger vew "Chinese" in xterm any more. All those familiar Chinese "luanma" are shown as blank now.
So, already having a rxvt solution is enough. And it is almost perfect. Besides, rxvt support XIM also. No bother explore any further.
documented on: 2004.03.09