In 2011 I started to look for "better" coverage from the fonts I was installing. At that
time I had some free webspace and "better" meant one or more of "looked nicer to me"
(e.g. the open 'g' of DejaVu) or "rendered things I often looked at when diving down
rabbit holes on wikipedia", or "could replace another font and save space" (my systems
were small).
My memory says that I was initially using LibreOffice to create PDFs, and that if a font
did not contain a codepoint it was immediately apparent (some sort of indication of a
missing glyph).
The Latin or Cyrillic languages I reference, and why, are listed at languages.html
In 2016 I lost that free webspace, and signed up for something bigger. By this time I
had looked at building all of TeXLive from source at Beyond Linuxfromscratch (i.e. the
extras - asymptote, biber, dvisvgm and xindy). It became clear to me that LibreOffice
Writer was now pulling in missing glyphs from other fonts, but I figured I could use
XeLaTeX to show only what was in a font (and therefore which languages it did not
adequately support).
Because I use XeLaTeX as my TeX engine I do not require support files for individual
fonts from within TeXLive (or CTAN), so I can use newer versions of fonts where
TeXLive installs an older version.
In 2016 I knew very little about LaTeX (and much of that was old or wrong, e.g. using
'\\' to force a newline, although that worked in my limited uses). So, where text either
overflowed into the margin, or hyphenated, I just accepted that.
I wasted some time looking at the current versions of all the Noto fonts, then realised
that I only really cared about languages where I could show Article 1 of the UDHR
(omniglot does that, and it seems a good idea and helped me to discard writing
systems which are not current).
Unfortunately that meant I initially covered many languages which are not particularly
relevant to me and where I have little knowledge of how they ought to be typeset.
My 2016 set of fonts were in one table, with numbers referenced in some of the 'lipsum'
files I had created to try to let people compare fonts of similar styles or for a writing
system. Later, I looked at more fonts and put them in a second table to avoid having to
renumber everything.
In 2023 I became aware of some other fonts I might be interested in, and started a
third table. A few of the earlier fonts had new versions with different names or were
forked to a different name, so I changed those.
Almost all my original fonts seemed to never get updated after their initial release
(for Noto that was not true, movements of the downloads meant I missed newer
versions). But nowadays some fonts have frequent updates and others get forked to a
new name. I try to provide information on the version (or date) of what I looked at,
but I rarely look at later versions unless as part of some other general revision.
I have now acceoted that https: is the correct way to provide PDFs (too many past
vulnerabilites) and moved to this current site. As part of that I have attempted to use
current practice for HTML and stylesheets. I have also merged all three tables into a
single table: the reference numbers are unchanged, so I have provided indexes for the
names known to Fontconfig and also for the identifiers I used, so that it is easier to
find the entry for a specific font.
Now that the table of fonts is stupidly wide (in 2011 I could read the main parts on
one screen with text at a good size) I am reworking the other text to use arbitrary line
lengths to make it easier to read. The line breaks look reasonable in my default font,
DejaVu Sans.
I did not realise that harfbuzz will sometimes generate accents or non-breaking
hyphens, and I also did not spot all cases where a codepoint was not present.
The result is that I overstated the coverage of some fonts.
I also guessed wrongly about how to stop CJK examples running off the edge of the
document, and added newlines after punctuationi which according to provisional W3C
guidelines is very wrong. My 2024 CJK files should now be correct, I hope to update
the earlier files to current versions and also to compare a few example glyphs for
each Han language and its styles. I currently think that Chinese has two 'serif' stylesi
and one 'sans' style, while Japanese and Korean each have only one 'serif' and one
'sans' style.
By 2024 I was starting to get annoyed at how certain fonts overflowed the margin
for some languages. With hindsight I ought to have changed my page layout from the
default in LaTeX to something longer (for more space at the top) and wider, but in
itself that would not have solved the problem and I would need to redo everything.
I spent some time researching what TeXLive could do, and got misled by old or wrong
posts. Many fonts did not seem to badly overlap the margins in most languages, so Ii
concentrated on those which did. After chasing down a blind alley and being misled
that 'polyglossia' was not the preferred CTAN package for handling hyphenation, I
started to research how to break problematic languages into syllables, so that I could
insert soft hyphens.
Eventually, I discovered that polyglossia is preferred and usually can handle
hyphenation. Some languages such as (Northern) Sami are mentioned in the docs
but do not seem to hyphenate. Others such as Armenian work for some fonts but not
for others, and some fonts seem to just be a pain. I eventually discovered that where
polyglossia covers a Latin alphabet it will always be applied (but some fonts might
still overflow the margin), but that all other scripts require suitable OpenType tags
before polyglossia can be used.
For CJK languages I eventually found some W3C guidance (what I had been doing in
/he past, and for those CJK fonts I have not yet updated, was very wrong). With some
help from the texlive mailing list, and using XeLaTeX, I can now set out Article 1 in
the Han scripts using what I believe is acceptable styling.
If I reformat an example and add extra spacing, I use the '\hspace' command to add
space. I do this in points (nominally 1/72.27 inches, 0.3516mm), most recently using
multiples of 0.5pt. Since all my text is nominally 10pt (small, originally to reduce
paper usage if printed) I had expected similar size adjustments from adding the same
number of points space in differnt fonts, but that does not seem to be true.
For monospace fonts, XeLaTeX may add its own microspacing to justify the
text. Where I have to reformat an Article 1 example in a monospace font I am now
starting to use uneven line lengths and sometimes non-breaking spaces, to better
represent the appearance on a graphical terminal. But most examples still have extra
spaces, and often the text does not align vertically as true monospace should.
For the following languages I always try to use my own attempt at soft hyphens:
Abkhazian, Northern Sami, Tatar. For other Cyrillic alphabets, or Greek, where a font
lacks the neccessary tag I also use my own soft hyphens - the positions of some of
my soft hyphens have changed when I left the soft hyphen version in my first-cut file
and discovered that polyglossia was supported but hyphenated differently. So some
of the examples might have older and incorrect hyphenation.
In one or two cases where a font is reported to overlap the margin I have tried
reducing some of the inter-word spaces to half spaces ('\,'). Sometimes that fixes it,
but other times it can make the words squash together. In small caps text it seems
that XeLaTeX or polyglossia is already squeezing the words together to try to fit into
the margin.
There are some languages where I have no idea how to split them into syllables to fit
in the margin, and for these I variously add extra whitespace, use shorter lines to add
less whitespace, or sometimes use uneven lines if I cannot fit them into the margins:
Adyghe, Dinka, Ewe, Hausa, Igbo, Lingala, Maltese, Yakut, Yoruba.
In October 2024 I thought I had finished with fonts which supported languages I
could not hyphenate, but in looking at some updated CJK fonts I found an example
which covered Azeri. Meanwhile, Karl Berry had shown me how to use
{\raggedright ... \par}
to avoid all that messing around. I have updated my languages templates ready
for any future revisions of CJK fonts.
I have no continuing interest in Right-to-Left languages, so if a general font covers
some of them and they overflow into the right margin I will not
always attempt to fix
that.
All my tex languages files updated in 2024 are intended to be used with TeXLive2024.
Old tex files formatted in earlier versions might not work with current TeXLive and
future versions might again break my tex source or alter how PDFs look.
I try to show the individual accents used to indicate the Vietnamese tones (the
accented letters themselves are all precomposed in Unicode), even for those fonts
which support Vietnamese but lack the necessary combining characters (usually
monospace fonts).. I have now discovered that in many fonts the combining accents
(added to a space at the start of a line to show what is added to the base letter) are
offset into the left margin and need added spacing in front of them, with reduced
spacing before the first letter. For this I somietimes use half-spaces ('\,'). In an ideal
world, the various vowels would align neatly in columns, but in many fonts they
gradually go out of alignment, particularly where a dot below indicates the tone.
I thought showing text using small capitals was an amusing diversion, and that I
could easily do using XeLaTeX. I now realise that there are several issues:
Originally, fonts had at most 2 weights (usually Regular and perhaps Bold). I continue
to process the regular weight to find what it contains (or now, for CJK fonts, medium
weights where they are available since paler fonts are harder to read) but I mention
all available weights for normal and italic or oblique styles.
When I started, all Chinese and Japanese fonts had only one weight, and I found they
tended to be very faint (more so for Serif than Sans - Chinese calls Sans 'Hei' which
means 'black' and was used in print for headings). When I looked again at the Noto
CJK fonts I decided to use the Medium weight. Now, looking at Japanese fonts I find
that some of those too have several weights, although they are not necessarily very
different. I'm now using whichever I consider to be the best normal weight for black
text on white, and a heavier weight for headings and title.
I started out by using the \section LaTeX keyword to group alphabets or other writing
systems, with \subsection for each language. None of these fonts cover everything,
and rather than mention either what is not covered, or which Latin languages I omit
as an earlier language includes all the extra glyphs, I just omit them. This means that
the numbering of sections is only within a font (or within a small caps variant).
I take the view that both the heading and all the text for a language should appear
on the same page. For languages such as Polytonic Greek or Vietnamese that can
cause very short pages in front of tihem. Similarly, all text in the Quatations section
should be on the same page, and Currency symbols should all be on the same page.
For reporting quotation marks in fonts were an item does not fit within one column
and starts a second line, I have tried not to start the second line with 'single quote' or
'double quotes' because I find reading the continuation by itself confusing and I now
prefer to offset the first line with a non-breaking space and end it with a comma, then
start the continuations using 'with right' and 'and left'.
It looks as if I was using libreoffice-5 for my initial creation of PDFs. By 2018 later
versions no longer showed an indication of a missing glyph and instead took it from
some other installed font. In early 2018 I started using TeXLive 2017, along with
harfbuzz-1.4.8 and grpahite2-1.3.10. However, for many revisions of 'lipsum' PDFs
to compare fonts for non-Latin alphabets and writing systems I think I continued to
use current versions of libreoffice.
In 2023 I used TeXLive 2023 with harfbuzz-8.1.1 and graphite2-1.3.14, along with
libreoffice-7.6.4.1 to convert the initial 'contents' text file to a PDF. In 2024 I am using
TeXLive 2024 with harfbuzz-8.5.0 and graphite2-1.3.14, and libreoffice-24.2.4.
In 2016 I used FontForge-20160604 to extract TTF files from TTCs for listing glyphs.
In 2024 I am now using my variant of getfonts by DavidBarts.
If I was to start again I would not use LaTeX's default page style with its large margins
(handy for making notes when reviewing pre-print, but a waste of space when viewing
a PDF onscreen), and I would ignore non-Latin, non-Cyrillic, non-Greek, non-CJK
languages. I would probably also ignore some of the more obscure alphabets, but I
would report every one of my chosen alphabets on a fresh page for consistent section
numbering, even if the page said only 'This font lacks some of the letters needed for
this alphabet' (or perhaps show what it did have, for comparison to other fonts).
I would probably also label the French uppercase Y with diaeresis as 'theoretical' since
it was not in Latin-1 and is very uncommon. Unfortunately, I now know to look at the
many messages from compiling some PDFs and see that HarfBuzz often generates the
diaeresis using a combining accent, in the same way as it does for accents on W and Y
in Welsh. Obviously, that will not happen in regular desktop applications.
Now that I am looking at more CJK fonts, my "standard" text has many different options
and is not always consistent between different fonts. Also, where a font has only partial
coverage of some CJK code blocks it might have been nicer to use full-width spaces
(U+3000) if available when creating the file of a font's contents, to try to keep the glyphs
in better columns.