librufl.git - RISC OS Unicode Font Library

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add checks for replacement character generation	John-Mark Bell	2022-05-22	3	-0/+24
\| \| \| \| \|	Two variants, as astral characters require an additional pair of hex digits.
*	Report astral characters in Font_EnumerateCharacters	John-Mark Bell	2022-05-22	1	-2/+2
\| \| \| \|	Thus ensuring that the UCS FM tests exercise the relevant code
*	Add a test for a broken encoding file	John-Mark Bell	2022-05-22	4	-2/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This file is broken in a number of ways: * It contains garbage content that does not form valid glyph name specifiers * It contains garbage directives * It tries to define more than 256 glyphs (which is not supported by non-UCS FontManagers) The latter point above uncovered a bug in the umap sanity checking where it failed to properly count the number of glyph indices being defined by the Encoding file.
*	Add test for symbol fonts	John-Mark Bell	2022-05-22	5	-22/+64
\|
*	Add test to ensure identical umaps are merged	John-Mark Bell	2022-05-22	2	-0/+41
\|
*	Add test for fonts with no encodings at all	John-Mark Bell	2022-05-22	3	-0/+23
\| \| \| \| \| \| \| \|	This exposed a failure to clean up any FontManager error occurring when attempting to load this kind of font. Additionally, it also exposed a failure to initialise the umap count in an internal structure. This was probably harmless in reality, but caused the test to fail.
*	Expand non-UCS tests to check umap loading	John-Mark Bell	2022-05-22	7	-10/+307
\| \| \| \| \|	These now take a configuration file defining the available encodings for each face and setting the expected number of umaps.
*	Initialise pointers with NULL	John-Mark Bell	2022-05-22	1	-7/+7
\|
*	Start brute-force scan at codepoint 1	John-Mark Bell	2022-05-22	1	-1/+1
\| \| \| \| \|	There's no point starting at 0, as it is not a valid codepoint and will never be valid.
*	Fix Font_EnumerateCharacters mock	John-Mark Bell	2022-05-22	1	-15/+26
\|
*	Conditionally support UCS Encoding formats	John-Mark Bell	2022-05-22	1	-2/+17
\| \| \| \| \| \| \| \| \| \|	While the Encoding file parser is able to parse UCS glyph "names" (of the form /uniXXXX or /uXXXX[XXXX]) and the sparse Encoding file format supported by the UCS FontManager, we currently only parse Encoding files at all on systems running a non-UCS FontManager and thus these code paths are unreachable. Guard them with appropriate preprocessor definitions so that we can easily resurrect them if they are ever needed in future.
*	Add checks for reinitialising library.	John-Mark Bell	2022-05-22	5	-5/+33
\| \| \| \| \| \|	This will cause the second initialisation attempt to load the cache file. In doing so, we discover that cache loading on non-32bit platforms didn't work -- fix that, too.
*	Squash leaks in non-UCS FM case	John-Mark Bell	2022-05-22	2	-0/+8
\|
*	Add test for initialisation on non-UCS FM	John-Mark Bell	2022-05-22	9	-6/+396
\|
*	Fix x_to_offset/split checks	John-Mark Bell	2022-05-22	3	-21/+40
\| \| \| \| \| \| \| \| \| \|	The Font_ScanString mock was deficient here, refusing to return the split point in the x_to_offset case. Additionally, the tests themselves were passing entirely the wrong units into the API. Further, the Font_EnumerateCharacters mock needed updating to ensure that all the codepoints used in the tests have glyphs defined (as otherwise, we fall down the missing glyph path). Given all of this, it's somewhat miraculous these checks passed.
*	Add test for initialisation on pre-3.64 UCS FM	John-Mark Bell	2022-05-21	5	-0/+131
\| \| \| \| \| \| \|	This exercises the broken Font_EnumerateCharacters workaround. Ensure tests don't trample on each other by having them run in a temporary directory.
*	Expand test to cover more API	John-Mark Bell	2022-05-21	2	-2/+93
\|
*	Size CHD bitmap correctly.	John-Mark Bell	2021-09-14	1	-1/+2
\| \| \| \| \| \|	Running tests under valgrind reveals that we were failing to size the CHD bitmap correctly, resulting in the opportunity for buffer overruns. Stop that happening by correcting the maths.
*	Test UCS FontManager initialisation	John-Mark Bell	2021-09-14	4	-23/+110
\| \| \| \| \|	Implement XOSFS_CanonicalisePath and XFont_ScanString and introduce a new test that uses them.
*	Ensure there is at least one menu entry	John-Mark Bell	2021-09-14	1	-4/+14
\| \| \| \| \|	In the case where there are no fonts at all on the system, ensure the menu building code copes.
*	Introduce test harness and mock more FontManager	John-Mark Bell	2021-09-14	7	-52/+406
\| \| \| \| \| \|	It is now possible to initialise a test harness which mimics the behaviour of the various versions of the FontManager we support. Rename the simple test to reflect its new purpose.
*	Introduce test infrastructure	John-Mark Bell	2021-08-16	6	-2/+456
\| \| \| \| \| \|	Mock out every OS call made by the library (they all return an unimplemented error for the timebeing). Add a trivial test case that verifies that rufl_init() fails.
*	RUfl_chars: fix undersized buffer	John-Mark Bell	2021-08-15	1	-1/+1
\| \| \| \| \| \|	Compiling for other platforms has its benefits. The first of which is x86_64 gcc rightly complaining that the buffer to receive the error message is too small. Make it big enough.
*	Make it possible to build for non-RISC OS hosts	John-Mark Bell	2021-08-15	1	-1/+10
\| \| \| \| \|	Linking fails, and the path to the OSLib headers is hard-coded, but it's a start.
*	Link RISC OS test binaries statically	John-Mark Bell	2021-08-15	1	-0/+1
\|
*	Don't assume pointers are 32bits wide	John-Mark Bell	2021-08-15	1	-2/+2
\| \| \| \| \|	Use uintptr_t to cast between pointers and integers, instead of assuming that uint32_t will suffice.
*	Restrict total font faces to 16 bit range	John-Mark Bell	2021-08-15	1	-1/+3
\| \| \| \| \| \|	The substitution tables expect there to be no more than 65535 font faces available. Enforce this at load, so there aren't any unwanted surprises later.
*	Clean up types in internal structures	John-Mark Bell	2021-08-15	2	-11/+11
\|
*	Clean up types in public API	John-Mark Bell	2021-08-15	7	-58/+66
\|
*	Make dump of unicode maps optional	John-Mark Bell	2021-08-15	3	-4/+4
\| \| \| \| \| \| \| \| \|	Add a verbose flag to rufl_dump_state() and use it to control whether to dump the individual unicode maps generated when using a non-UCS Font Manager. Change rufl_test to not dump this state (ordinarily, anyway) as it is generally uninteresting and highly verbose.
*	Ignore UCS fonts if using a non-UCS Font Manager	John-Mark Bell	2021-08-15	1	-2/+27
\| \| \| \| \| \| \| \|	Attempting to use fonts constructed for the UCS Font Manager on older systems generally results in bad outcomes up to, and including, complete system freezes. As fixing the Font Manager on these systems is impractical, simply ignore these fonts completely when scanning for glyph coverage.
*	Clean up logging in the non-UCS Font Manager path	John-Mark Bell	2021-08-15	1	-43/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To obtain the full extent of a "language" font's glyph coverage we need to open and scan it in each of the available target encodings. All of the Latin1-6 + Welsh target encodings declare that they are based on the Base0 encoding and thus will cause the Font Manager to demand the existence of corresponding IntMetric0/Outlines0 font data files. A "language" font using a different base encoding (and corresponding target encodings based on it) would thus generate an error from the Font Manager. Additionally, without reinventing the Font Manager's own logic (and poking around the filesystem looking for IntMetrics and Encoding files), we don't know if a font is a "language" or a "symbol" font until we try to use it. Thus, we expect attempts to open "symbol" fonts with an explicit target encoding to generate an error from the Font Manager as well. As these are expected errors, there is no point logging them as it just produces a load of distracting noise.
*	Accept non-UCS Font Manager rejecting UCS fonts.	John-Mark Bell	2021-08-14	1	-2/+8
\| \| \| \| \| \| \|	If you attempt to use fonts supported by the UCS Font Manager with a non-UCS Font Manager, this will either work (in a limited way) or fail because the font data is incomprehensible to the non-UCS Font Manager. Cope with one particular instance of this.
*	Fix font scanning on non-UCS Font Managers	John-Mark Bell	2021-08-14	1	-1/+1
\| \| \| \| \|	We want to update the umap itself not whatever happens to be on the stack in the vicinity of its address.
*	Fix initialisation on UCS Font Manager 3.41-3.63	John-Mark Bell	2021-08-14	1	-13/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We cannot use Font_ReadEncodingFile to find the path to a font's source encoding because that is not what the API returns (it returns the path to the encoding file corresponding to the target encoding used to open the font handle) and there is no public API for obtaining the path of the source encoding. Additionally, there is no reliable way to replicate the UCS Font Manager's mapping of undefined and duplicate glyph names into the private use space at U+E000-U+EFFF. Therefore, take a different approach to supporting these versions of the Font Manager: abuse Font_EnumerateCharacters by probing every codepoint in the range [0, first_returned) to force the Font Manager to reveal the information we want. Once we have reached the first_returned codepoint, we can happily fall through to the normal flow (which will make use of the sparse nature of the Unicode space).
*	Fix shrinkwrap moving blocks	John-Mark Bell	2021-08-14	1	-23/+18
\| \| \| \| \|	All blocks subsequent to a full one get moved up and all their indices need rewriting.
*	Ensure dumping doesn't run off the end of a plane	John-Mark Bell	2021-08-14	1	-1/+1
\|
*	Fix bug in sparse encoding parser	John-Mark Bell	2021-08-14	1	-2/+3
\| \| \| \| \|	Spaces are valid characters in the sparse encoding so ensure we consume them correctly.
*	Add MedBold and Thin weights	John-Mark Bell	2021-08-12	1	-0/+2
\|
*	Clean up logging	John-Mark Bell	2021-08-11	1	-17/+0
\|
*	Use version-specific cache location.	John-Mark Bell	2021-08-11	2	-9/+53
\| \| \| \| \| \| \| \| \|	Move the cache location to a subdirectory within Scrap and encode the cache version in the filename. This allows software using different versions of RUfl to coexist on the same system without trying to share the same cache (and thus rescanning fonts every time).
*	Optimise substitution table storage	John-Mark Bell	2021-08-11	1	-189/+570
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Consider each Unicode plane independently (as they have very different properties). This means building a table for each plane and allows us to reduce the size of each entry in the CHD-addressed table from 64 to 32bits (which provides a significant immediate saving). Also introduce a direct linear mapping backend. This stores the table in a series of 256-entry blocks which are addressed from a fixed-size index. Block entries are either 8 or 16 bits wide (depending upon the number of fonts found on the system). This restores some of the storage efficiency of the old "giant array" approach, which is generally more efficient than a CHD (or other) hash-based implementation where the load factor is reasonably high (or the glyph:block ratio is sufficiently high). Select the direct or CHD storage mechanism based upon an estimate of the storage size for the data collected for a plane. In the testing I have performed (with the same fonts available as before) the combined effect of the above is to reduce the storage used significantly. Without the 8bit direct mapping entry size (which is a somewhat unfair comparison because even the "giant array" didn't have that feature) we see: Plane Codepoints Blocks Backend Storage Alternative 1 51483 224 Direct 115480 311328 (CHD) 2 1981 13 Direct 7448 9760 (CHD) 3 2293 201 CHD 17952 103704 (Direct) Total 55757 140880 (~= 2.5 bytes/glyph) The other 14 planes have no glyph coverage at all, so require no storage. With the 8bit direct mapping, we see: Plane Codepoints Blocks Backend Storage Alternative 1 51483 224 Direct 57880 311328 (CHD) 2 1981 13 Direct 3864 9760 (CHD) 3 2293 201 CHD 17952 103704 (Direct) Total 55757 79696 (~= 1.4 bytes/glyph) In summary: * separating the planes has shaved ~50% off the storage required by the CHD backend * introducing the direct mapping backend has shaved a further ~60% off that * using 8bit direct mapping has shaved another ~50% off that Cumulatively, then, storage requirements are now ~86% smaller than with CHD only (and about 40% less than the BMP-only "giant table", but now with astral character support).
*	Teach rufl_chars about other planes.	John-Mark Bell	2021-08-10	1	-20/+46
\| \| \| \|	Selectable via the menu, like everything else.
*	Make rufl_test render an astral character	John-Mark Bell	2021-08-09	1	-1/+2
\|
*	Perform font substitution for astral characters, too.	John-Mark Bell	2021-08-09	8	-105/+726
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This significantly reworks the construction of the substitution table (and hides its implementation from the rest of the library). It is no longer practical to use a directly-indexed array so, instead, we front it with a perfect hash function. The storage required for the (unoptimised) hash data is currently about 6 bits per entry. Implementing compression would reduce this to the order of ~2 bits per entry. As the resulting data structure is sparse, we must store the original Unicode codepoint value along with the identity of the font providing a suitable glyph. This has necessitated expanding the size of substitution table entries from 16 to 64 bits (of which 27 bits are currently unused). With the 55757 codepoint coverage I have been testing with, this results in an increase in the substitution table storage requirements from the original 128kB directly-indexed array (covering the Basic Multilingual Plane only) to a rather fatter 512kB (for the codepoint+font id array) + ~41kB of hash metadata. This is still ~25% the size of a linear array, however, so is not completely outrageous.
*	Include extension plane data in RUfl_cache	John-Mark Bell	2021-08-09	2	-32/+79
\| \| \| \| \|	This requires us to bump the cache version, as it is a breaking change.
*	Merge UCS font scan implementations	John-Mark Bell	2021-08-09	1	-132/+27
\| \| \| \| \| \|	The only meaningful difference is how we enumerate the codepoints represented by a font. Factor this out so that we can share almost all of the implementation.
*	Include astral characters in font scan	John-Mark Bell	2021-08-09	1	-130/+285
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We now construct extension plane data if astral characters are present. Systems with a non-UCS Font Manager are still restricted to using the Basic Multilingual Plane (as there is no mechanism for encoding astral characters in the font or encoding data). Rewrite the UCS Font Manager 3.41-3.63 support to scan the font encoding itself (as Font_EnumerateCharacters is broken on these Font Manager versions). This also fixes the post-scan shrink-wrapping for Font Manager 3.64 or later -- previously it would not coalesce block bitmaps when determining that a block was full.
*	Parse UCS-aware Encoding files	John-Mark Bell	2021-08-08	1	-24/+130
\| \| \| \| \| \| \| \| \| \|	1. Comprehend the /uniXXXX and /uXXXX - /uXXXXXXXX glyph names 2. Comprehend the sparse Encoding file format that explicitly specifies the glyph index rather than inferring it Support for both of these is conditional on the Font Manager being UCS-aware (thus ensuring that we continue to parse Encoding files in the same way as before on systems with no UCS Font Manager).
*	Refactor Encoding file parsing	John-Mark Bell	2021-08-08	1	-21/+54
\| \| \| \| \| \| \|	Change this into a callback-driven approach so that the logic for dealing with each individual (glyph index, ucs4) pair is hoisted out of the parsing code itself. This will allow us to use the same parser implementation in different scenarios.