My Current Process

Listing the codepoints.

I adapted a Public Domain file called has_char which I found somewhere (location forgotten, it was years ago) and named it get_codepoints.cc: that links to fontconfig and enables me to list all codepointsi in a TTF or OTF file with a value greater than or equal to space (it's a bit buggy, sometimes space is reported, other times it is not).

For TTC or potentially for OTC files I need to separate the individual fonts. I used to use fontforge, but that has always been awkward and with tiny and faint text in its user interface. If I have to do this for any future files I will be using getfonts-20240622 which I have adapted from an old package I found at github.

That is all driven by my create-codepoint-files script. That creates a file of the codepoints, and a formatted 'coverage' file listing all the codepoints under their Unicode headings. It has a backup approach, using ttf2config.pl supplied in the examples/ directory of Font-TTF-Scripts. The TTC files I knew about are listed.

I then run my generate-all-characters script. That reads the codepoints file and outputs a list (with spaces for missing items) on stdout. I run it infrequently-enough that I find using stdout to check what is happening, followed by Ctrl-C, is useful. The then run it again with the output sent to fontname-contents.txt.

To maintain the Unicode blocks and their names I have update-blocks.sh and the files it references, using a copy of Blocks.txt from Unicode. Updating is only necessary if an updated or new font of interest has a codepoint for which the block name is not known.

I then open the text contents file in Libreoffice Writer, save asan ODT file, add header and footer, Then I remove the title my text file had, ensure that each page has a heading for the block (... continued) and remove unnecessary blank lines, or occasionally move things to a new page. Then I save that as a PDF.

After that, I take my languages-full.tex template to see what the font covers. Just because a codepoint is present does not mean it has the expected glyph, there are sometimes errors. Some fonts need unusual treatment to find all the files. If there are small capitals, I attempt to find which codepoints they cover using my 'find-small-caps' script (does not always work), or by listing what is in any separate small cap font file. If I found the sc codepoints, I merge them using my 'merge-sc-codepoints' file but for monospace fonts I now ignore them because using LaTeX or XeLaTeX will add extra spacing, usually in sizes less than the standard character width. It is amazing what partial items exist in some fonts, I my script reports what I do not think is a codepoint.

I had initially overlooked some reports of missing codepoints, but am now tryting to check every current languages file. It turns out that harfbuzz can add combining accents to replicate the desired character (sometimes that looks OK, other times it looks awful), and can also generate a regular '-' where a font lacks a non-breaking hyphen. Apologies if I have missed any of these, desktop terminals and applications may or may not get those added items (it depends on how they have been linked). If in doubt, check the codepoints file.

Finally, I create a fontname-sc.tex file if there are any small caps, using that to create a PDF, and then a fontname-languages.tex file to create the main file including details about font weights and miscellania. I attempt to use polyglossia to help keep text within the default margins, but for non-Latin writing systems that needs OpenType tags and not every font has those for all its contents.

Reviewing often shows that the glyphs for the modern Greek alphabet either omit diacriticals on capital letters, or are badly placed. Therefore, not every font where I show the monotonic Greek alphabet is usable for Greek. Similarly, in the samll capitals for Turkic languages (Azeri, or Turkish) the combining dot above needed to distinguish small cap normal dotted i from small cap dotless i (both default to dotless) may be missing or very badly positioned.

My future plans

I have very little interest in trying to improve my handling of writing systems beyond Latin, Cyrillic, current Greek and current CJK. At Behdad's thoughts on the current state of rendering I can see that the process is a lot more complex than I had imagined. Therefore, my old files from 2016, which I already know have a lot of faults in my English use of capital letters, and which seem to not work with current XeLaTeX, are retained as legacy items.

I hope to progress my treatment of CJK fonts, and in particular to provide some examples of how the available fonts for a particular style (Sans or Serif variants, or monospace) compare both to the current Noto CJK fonts and to the default (WenQuanYi Zen Hei - Sans, but also the default for Serif in fontconfig!).

After that, I might get back to providing documentation of the (Latin) Sans fonts, and also to documenting fonts for Monotonic Greek (including short dummy text transliterated from Lorem Ipsum but with random changes to include accented vowels.

If I survive for long enough, it might be nice to revisit the other writing systems, but that is right at the end of my ToDo list, well behind several things not related to fonts.