| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Two variants, as astral characters require an additional pair of
hex digits.
|
|
|
|
| |
Thus ensuring that the UCS FM tests exercise the relevant code
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This file is broken in a number of ways:
* It contains garbage content that does not form valid
glyph name specifiers
* It contains garbage directives
* It tries to define more than 256 glyphs (which is not
supported by non-UCS FontManagers)
The latter point above uncovered a bug in the umap sanity checking
where it failed to properly count the number of glyph indices
being defined by the Encoding file.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
This exposed a failure to clean up any FontManager error occurring
when attempting to load this kind of font. Additionally, it also
exposed a failure to initialise the umap count in an internal
structure. This was probably harmless in reality, but caused the
test to fail.
|
|
|
|
|
| |
These now take a configuration file defining the available
encodings for each face and setting the expected number of umaps.
|
| |
|
|
|
|
|
| |
There's no point starting at 0, as it is not a valid codepoint
and will never be valid.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
While the Encoding file parser is able to parse UCS glyph "names"
(of the form /uniXXXX or /uXXXX[XXXX]) and the sparse Encoding
file format supported by the UCS FontManager, we currently only
parse Encoding files at all on systems running a non-UCS
FontManager and thus these code paths are unreachable. Guard
them with appropriate preprocessor definitions so that we can
easily resurrect them if they are ever needed in future.
|
|
|
|
|
|
| |
This will cause the second initialisation attempt to load the
cache file. In doing so, we discover that cache loading on
non-32bit platforms didn't work -- fix that, too.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The Font_ScanString mock was deficient here, refusing to return
the split point in the x_to_offset case. Additionally, the tests
themselves were passing entirely the wrong units into the API.
Further, the Font_EnumerateCharacters mock needed updating to
ensure that all the codepoints used in the tests have glyphs
defined (as otherwise, we fall down the missing glyph path).
Given all of this, it's somewhat miraculous these checks passed.
|
|
|
|
|
|
|
| |
This exercises the broken Font_EnumerateCharacters workaround.
Ensure tests don't trample on each other by having them run in a
temporary directory.
|
| |
|
|
|
|
|
|
| |
Running tests under valgrind reveals that we were failing to size
the CHD bitmap correctly, resulting in the opportunity for buffer
overruns. Stop that happening by correcting the maths.
|
|
|
|
|
| |
Implement XOSFS_CanonicalisePath and XFont_ScanString and
introduce a new test that uses them.
|
|
|
|
|
| |
In the case where there are no fonts at all on the system, ensure
the menu building code copes.
|
|
|
|
|
|
| |
It is now possible to initialise a test harness which mimics the
behaviour of the various versions of the FontManager we support.
Rename the simple test to reflect its new purpose.
|
|
|
|
|
|
| |
Mock out every OS call made by the library (they all return an
unimplemented error for the timebeing). Add a trivial test case
that verifies that rufl_init() fails.
|
|
|
|
|
|
| |
Compiling for other platforms has its benefits. The first of which
is x86_64 gcc rightly complaining that the buffer to receive the
error message is too small. Make it big enough.
|
|
|
|
|
| |
Linking fails, and the path to the OSLib headers is hard-coded,
but it's a start.
|
| |
|
|
|
|
|
| |
Use uintptr_t to cast between pointers and integers, instead of
assuming that uint32_t will suffice.
|
|
|
|
|
|
| |
The substitution tables expect there to be no more than 65535
font faces available. Enforce this at load, so there aren't any
unwanted surprises later.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
Add a verbose flag to rufl_dump_state() and use it to control
whether to dump the individual unicode maps generated when using
a non-UCS Font Manager.
Change rufl_test to not dump this state (ordinarily, anyway) as
it is generally uninteresting and highly verbose.
|
|
|
|
|
|
|
|
| |
Attempting to use fonts constructed for the UCS Font Manager on
older systems generally results in bad outcomes up to, and
including, complete system freezes. As fixing the Font Manager
on these systems is impractical, simply ignore these fonts
completely when scanning for glyph coverage.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To obtain the full extent of a "language" font's glyph coverage
we need to open and scan it in each of the available target
encodings. All of the Latin1-6 + Welsh target encodings declare
that they are based on the Base0 encoding and thus will cause the
Font Manager to demand the existence of corresponding
IntMetric0/Outlines0 font data files. A "language" font using a
different base encoding (and corresponding target encodings
based on it) would thus generate an error from the Font Manager.
Additionally, without reinventing the Font Manager's own logic
(and poking around the filesystem looking for IntMetrics and
Encoding files), we don't know if a font is a "language" or a
"symbol" font until we try to use it. Thus, we expect attempts to
open "symbol" fonts with an explicit target encoding to generate
an error from the Font Manager as well.
As these are expected errors, there is no point logging them as
it just produces a load of distracting noise.
|
|
|
|
|
|
|
| |
If you attempt to use fonts supported by the UCS Font Manager with
a non-UCS Font Manager, this will either work (in a limited way)
or fail because the font data is incomprehensible to the non-UCS
Font Manager. Cope with one particular instance of this.
|
|
|
|
|
| |
We want to update the umap itself not whatever happens to be on the
stack in the vicinity of its address.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We cannot use Font_ReadEncodingFile to find the path to a font's
source encoding because that is not what the API returns (it
returns the path to the encoding file corresponding to the target
encoding used to open the font handle) and there is no public API
for obtaining the path of the source encoding. Additionally, there
is no reliable way to replicate the UCS Font Manager's mapping of
undefined and duplicate glyph names into the private use space
at U+E000-U+EFFF.
Therefore, take a different approach to supporting these versions
of the Font Manager: abuse Font_EnumerateCharacters by probing
every codepoint in the range [0, first_returned) to force the
Font Manager to reveal the information we want. Once we have
reached the first_returned codepoint, we can happily fall through
to the normal flow (which will make use of the sparse nature of
the Unicode space).
|
|
|
|
|
| |
All blocks subsequent to a full one get moved up and all their
indices need rewriting.
|
| |
|
|
|
|
|
| |
Spaces are valid characters in the sparse encoding so ensure we
consume them correctly.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
Move the cache location to a subdirectory within Scrap and encode
the cache version in the filename.
This allows software using different versions of RUfl to coexist
on the same system without trying to share the same cache (and
thus rescanning fonts every time).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider each Unicode plane independently (as they have very
different properties). This means building a table for each
plane and allows us to reduce the size of each entry in the
CHD-addressed table from 64 to 32bits (which provides a
significant immediate saving).
Also introduce a direct linear mapping backend. This stores the
table in a series of 256-entry blocks which are addressed from
a fixed-size index. Block entries are either 8 or 16 bits wide
(depending upon the number of fonts found on the system). This
restores some of the storage efficiency of the old "giant array"
approach, which is generally more efficient than a CHD (or other)
hash-based implementation where the load factor is reasonably high
(or the glyph:block ratio is sufficiently high).
Select the direct or CHD storage mechanism based upon an estimate
of the storage size for the data collected for a plane. In the
testing I have performed (with the same fonts available as before)
the combined effect of the above is to reduce the storage used
significantly.
Without the 8bit direct mapping entry size (which is a somewhat
unfair comparison because even the "giant array" didn't have that
feature) we see:
Plane Codepoints Blocks Backend Storage Alternative
1 51483 224 Direct 115480 311328 (CHD)
2 1981 13 Direct 7448 9760 (CHD)
3 2293 201 CHD 17952 103704 (Direct)
Total 55757 140880 (~= 2.5 bytes/glyph)
The other 14 planes have no glyph coverage at all, so require no
storage.
With the 8bit direct mapping, we see:
Plane Codepoints Blocks Backend Storage Alternative
1 51483 224 Direct 57880 311328 (CHD)
2 1981 13 Direct 3864 9760 (CHD)
3 2293 201 CHD 17952 103704 (Direct)
Total 55757 79696 (~= 1.4 bytes/glyph)
In summary:
* separating the planes has shaved ~50% off the storage required
by the CHD backend
* introducing the direct mapping backend has shaved a further
~60% off that
* using 8bit direct mapping has shaved another ~50% off that
Cumulatively, then, storage requirements are now ~86% smaller
than with CHD only (and about 40% less than the BMP-only
"giant table", but now with astral character support).
|
|
|
|
| |
Selectable via the menu, like everything else.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This significantly reworks the construction of the substitution
table (and hides its implementation from the rest of the library).
It is no longer practical to use a directly-indexed array so,
instead, we front it with a perfect hash function. The storage
required for the (unoptimised) hash data is currently about 6 bits
per entry. Implementing compression would reduce this to the order
of ~2 bits per entry.
As the resulting data structure is sparse, we must store the
original Unicode codepoint value along with the identity of the
font providing a suitable glyph. This has necessitated expanding
the size of substitution table entries from 16 to 64 bits (of
which 27 bits are currently unused).
With the 55757 codepoint coverage I have been testing with, this
results in an increase in the substitution table storage
requirements from the original 128kB directly-indexed array
(covering the Basic Multilingual Plane only) to a rather fatter
512kB (for the codepoint+font id array) + ~41kB of hash metadata.
This is still ~25% the size of a linear array, however, so is not
completely outrageous.
|
|
|
|
|
| |
This requires us to bump the cache version, as it is a breaking
change.
|
|
|
|
|
|
| |
The only meaningful difference is how we enumerate the codepoints
represented by a font. Factor this out so that we can share almost
all of the implementation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We now construct extension plane data if astral characters are
present. Systems with a non-UCS Font Manager are still restricted
to using the Basic Multilingual Plane (as there is no mechanism
for encoding astral characters in the font or encoding data).
Rewrite the UCS Font Manager 3.41-3.63 support to scan the font
encoding itself (as Font_EnumerateCharacters is broken on these
Font Manager versions).
This also fixes the post-scan shrink-wrapping for Font Manager 3.64
or later -- previously it would not coalesce block bitmaps when
determining that a block was full.
|
|
|
|
|
|
|
|
|
|
| |
1. Comprehend the /uniXXXX and /uXXXX - /uXXXXXXXX glyph names
2. Comprehend the sparse Encoding file format that explicitly
specifies the glyph index rather than inferring it
Support for both of these is conditional on the Font Manager being
UCS-aware (thus ensuring that we continue to parse Encoding files
in the same way as before on systems with no UCS Font Manager).
|
|
|
|
|
|
|
| |
Change this into a callback-driven approach so that the logic for
dealing with each individual (glyph index, ucs4) pair is hoisted
out of the parsing code itself. This will allow us to use the same
parser implementation in different scenarios.
|