The difference in shapes of certain glyphs within the original Han unification was
a reason why many Japanese initially rejected Unicode. Since then, many more
codepoints have been added, and also variations (alas, few libre fonts include
the variation selectors). More recently, there was a report where a game used a
Japanese Noto font for CJK messages and a Chinese user complained it looked
wrong.
My initial impression of the Noto CJK fonts was that they provide good coverage
and also show acceptable shapes for the Han glyphs of the targetted language.
I decided to create short PDFs with codepoints that have been reported to differ.
I show all the Noto fonts of the chosen style (Sans or Serif) and WenQuanYi Zen
Hei which is a fontconfig default, then I show the other fonts of that style for the
language. I have assumed that the Noto CJK fonts have appropriate shapes for
those glyphs which can very. I have now completed this for CJK-CN (kai, sans,
sung), CJK-HK (one file, not many fonts), CJK-JP (sans, serif) and CJK-TW (kai, sans,
serif. I have separated CN and TW Serif into kai and sung because there are too
many fonts to fit across the page. With hindsight, I'm not sure this was worth the
effort.
As part of checking to see if a CJK font can cover languages other than its target
language, I have created {CN,HK,KR,TW}-punct files. The PDFs are alongside the
other PDF-cjk/ files in The PDFs. but the templates are in files/tools/templates.
There are three main variants of current Chinese likely to be found online:
Simplified, which I identify as 'CN' and is used in the PRC and Singapore,
Traditional (primarily Mandarin from Taiwan, also used in Malayasia) which I
identify as 'TW', and Cantonese used in Hong Kong and Macao which I label
as 'HK'.
When I began the text for the lipsum files I realised it would be interesting to
compare how the exact same text is rendered for each variant. That turned out
to be a painful process and in the end I have used LuaLaTeX to do this.
Details, and links to the files, are in Chinese-variants.
In late 2024 I looked at various links re South Korean, and formed the
mistaken impression from reading the Wikipedia pages on Korean names that
Hanja is still in common use online. Now I have read that although Hanja is still
used for permitted forenames and for surnames, online usage in current HTML
is almost zero. Many younger Koreans much prefer to avoid it, even if longer
text is needed to explain ambiguous terms where Hanja would have been
used in parentheses to provide context.
I am now uploading files showing coverage of surnames and coverage of the
permitted forenames. Unfortunately, some Hanja are used more than once in
each of those. Also, my script to extract the codepoints of a file omits some
Hanja from at least the Noto CJK fonts - examination of the output from those
fonts shows complete coverage.
Please see Hangul-Hanja