| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
Ensure that only one copy of the storage for the log state is
needed (previously, it injected these into every compilation unit,
which now results in compiler warnings).
Replace __PRETTY_FUNCTION__ with the standardised __func__. The
output is the same in either case in our usage here (and testing
all the way back to GCC 3.4.6 yields no difference in output).
This also fixes compilation with GCC 10 (which warns about the
use of __PRETTY_FUNCTION__ in -pedantic mode).
|
|
|
|
|
| |
Modern GCC correctly warned about a narrowing cast. This was
unnecessary, so rework the code to stop using it.
|
|
|
|
|
|
| |
The direct substitution table constructor failed to allocate
sufficient space to store the table in the case where there are
more than 255 fonts installed on the system.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
a4c41198 made a variety of consistency changes to the public API,
including changing the type of the "string" parameter passed to
many entry points from const char * to const uint8_t *, as that
better reflects the data. However, this then forces the user of
the API to explicitly cast when passing string constants, or
other strings (which, would be passed to standard library APIs
as const char *, even if UTF-8 encoded).
Revert this part of the change so the type of "string" is once
more const char * and cast to the type we actually want internally.
|
|
|
|
|
| |
This allows us to see the total extent of glyph coverage and which
planes are the largest.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This file is broken in a number of ways:
* It contains garbage content that does not form valid
glyph name specifiers
* It contains garbage directives
* It tries to define more than 256 glyphs (which is not
supported by non-UCS FontManagers)
The latter point above uncovered a bug in the umap sanity checking
where it failed to properly count the number of glyph indices
being defined by the Encoding file.
|
|
|
|
|
|
|
|
| |
This exposed a failure to clean up any FontManager error occurring
when attempting to load this kind of font. Additionally, it also
exposed a failure to initialise the umap count in an internal
structure. This was probably harmless in reality, but caused the
test to fail.
|
| |
|
|
|
|
|
| |
There's no point starting at 0, as it is not a valid codepoint
and will never be valid.
|
|
|
|
|
|
|
|
|
|
| |
While the Encoding file parser is able to parse UCS glyph "names"
(of the form /uniXXXX or /uXXXX[XXXX]) and the sparse Encoding
file format supported by the UCS FontManager, we currently only
parse Encoding files at all on systems running a non-UCS
FontManager and thus these code paths are unreachable. Guard
them with appropriate preprocessor definitions so that we can
easily resurrect them if they are ever needed in future.
|
|
|
|
|
|
| |
This will cause the second initialisation attempt to load the
cache file. In doing so, we discover that cache loading on
non-32bit platforms didn't work -- fix that, too.
|
| |
|
|
|
|
|
|
| |
Running tests under valgrind reveals that we were failing to size
the CHD bitmap correctly, resulting in the opportunity for buffer
overruns. Stop that happening by correcting the maths.
|
|
|
|
|
| |
In the case where there are no fonts at all on the system, ensure
the menu building code copes.
|
|
|
|
|
| |
Use uintptr_t to cast between pointers and integers, instead of
assuming that uint32_t will suffice.
|
|
|
|
|
|
| |
The substitution tables expect there to be no more than 65535
font faces available. Enforce this at load, so there aren't any
unwanted surprises later.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
Add a verbose flag to rufl_dump_state() and use it to control
whether to dump the individual unicode maps generated when using
a non-UCS Font Manager.
Change rufl_test to not dump this state (ordinarily, anyway) as
it is generally uninteresting and highly verbose.
|
|
|
|
|
|
|
|
| |
Attempting to use fonts constructed for the UCS Font Manager on
older systems generally results in bad outcomes up to, and
including, complete system freezes. As fixing the Font Manager
on these systems is impractical, simply ignore these fonts
completely when scanning for glyph coverage.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To obtain the full extent of a "language" font's glyph coverage
we need to open and scan it in each of the available target
encodings. All of the Latin1-6 + Welsh target encodings declare
that they are based on the Base0 encoding and thus will cause the
Font Manager to demand the existence of corresponding
IntMetric0/Outlines0 font data files. A "language" font using a
different base encoding (and corresponding target encodings
based on it) would thus generate an error from the Font Manager.
Additionally, without reinventing the Font Manager's own logic
(and poking around the filesystem looking for IntMetrics and
Encoding files), we don't know if a font is a "language" or a
"symbol" font until we try to use it. Thus, we expect attempts to
open "symbol" fonts with an explicit target encoding to generate
an error from the Font Manager as well.
As these are expected errors, there is no point logging them as
it just produces a load of distracting noise.
|
|
|
|
|
|
|
| |
If you attempt to use fonts supported by the UCS Font Manager with
a non-UCS Font Manager, this will either work (in a limited way)
or fail because the font data is incomprehensible to the non-UCS
Font Manager. Cope with one particular instance of this.
|
|
|
|
|
| |
We want to update the umap itself not whatever happens to be on the
stack in the vicinity of its address.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We cannot use Font_ReadEncodingFile to find the path to a font's
source encoding because that is not what the API returns (it
returns the path to the encoding file corresponding to the target
encoding used to open the font handle) and there is no public API
for obtaining the path of the source encoding. Additionally, there
is no reliable way to replicate the UCS Font Manager's mapping of
undefined and duplicate glyph names into the private use space
at U+E000-U+EFFF.
Therefore, take a different approach to supporting these versions
of the Font Manager: abuse Font_EnumerateCharacters by probing
every codepoint in the range [0, first_returned) to force the
Font Manager to reveal the information we want. Once we have
reached the first_returned codepoint, we can happily fall through
to the normal flow (which will make use of the sparse nature of
the Unicode space).
|
|
|
|
|
| |
All blocks subsequent to a full one get moved up and all their
indices need rewriting.
|
| |
|
|
|
|
|
| |
Spaces are valid characters in the sparse encoding so ensure we
consume them correctly.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
Move the cache location to a subdirectory within Scrap and encode
the cache version in the filename.
This allows software using different versions of RUfl to coexist
on the same system without trying to share the same cache (and
thus rescanning fonts every time).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider each Unicode plane independently (as they have very
different properties). This means building a table for each
plane and allows us to reduce the size of each entry in the
CHD-addressed table from 64 to 32bits (which provides a
significant immediate saving).
Also introduce a direct linear mapping backend. This stores the
table in a series of 256-entry blocks which are addressed from
a fixed-size index. Block entries are either 8 or 16 bits wide
(depending upon the number of fonts found on the system). This
restores some of the storage efficiency of the old "giant array"
approach, which is generally more efficient than a CHD (or other)
hash-based implementation where the load factor is reasonably high
(or the glyph:block ratio is sufficiently high).
Select the direct or CHD storage mechanism based upon an estimate
of the storage size for the data collected for a plane. In the
testing I have performed (with the same fonts available as before)
the combined effect of the above is to reduce the storage used
significantly.
Without the 8bit direct mapping entry size (which is a somewhat
unfair comparison because even the "giant array" didn't have that
feature) we see:
Plane Codepoints Blocks Backend Storage Alternative
1 51483 224 Direct 115480 311328 (CHD)
2 1981 13 Direct 7448 9760 (CHD)
3 2293 201 CHD 17952 103704 (Direct)
Total 55757 140880 (~= 2.5 bytes/glyph)
The other 14 planes have no glyph coverage at all, so require no
storage.
With the 8bit direct mapping, we see:
Plane Codepoints Blocks Backend Storage Alternative
1 51483 224 Direct 57880 311328 (CHD)
2 1981 13 Direct 3864 9760 (CHD)
3 2293 201 CHD 17952 103704 (Direct)
Total 55757 79696 (~= 1.4 bytes/glyph)
In summary:
* separating the planes has shaved ~50% off the storage required
by the CHD backend
* introducing the direct mapping backend has shaved a further
~60% off that
* using 8bit direct mapping has shaved another ~50% off that
Cumulatively, then, storage requirements are now ~86% smaller
than with CHD only (and about 40% less than the BMP-only
"giant table", but now with astral character support).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This significantly reworks the construction of the substitution
table (and hides its implementation from the rest of the library).
It is no longer practical to use a directly-indexed array so,
instead, we front it with a perfect hash function. The storage
required for the (unoptimised) hash data is currently about 6 bits
per entry. Implementing compression would reduce this to the order
of ~2 bits per entry.
As the resulting data structure is sparse, we must store the
original Unicode codepoint value along with the identity of the
font providing a suitable glyph. This has necessitated expanding
the size of substitution table entries from 16 to 64 bits (of
which 27 bits are currently unused).
With the 55757 codepoint coverage I have been testing with, this
results in an increase in the substitution table storage
requirements from the original 128kB directly-indexed array
(covering the Basic Multilingual Plane only) to a rather fatter
512kB (for the codepoint+font id array) + ~41kB of hash metadata.
This is still ~25% the size of a linear array, however, so is not
completely outrageous.
|
|
|
|
|
| |
This requires us to bump the cache version, as it is a breaking
change.
|
|
|
|
|
|
| |
The only meaningful difference is how we enumerate the codepoints
represented by a font. Factor this out so that we can share almost
all of the implementation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We now construct extension plane data if astral characters are
present. Systems with a non-UCS Font Manager are still restricted
to using the Basic Multilingual Plane (as there is no mechanism
for encoding astral characters in the font or encoding data).
Rewrite the UCS Font Manager 3.41-3.63 support to scan the font
encoding itself (as Font_EnumerateCharacters is broken on these
Font Manager versions).
This also fixes the post-scan shrink-wrapping for Font Manager 3.64
or later -- previously it would not coalesce block bitmaps when
determining that a block was full.
|
|
|
|
|
|
|
|
|
|
| |
1. Comprehend the /uniXXXX and /uXXXX - /uXXXXXXXX glyph names
2. Comprehend the sparse Encoding file format that explicitly
specifies the glyph index rather than inferring it
Support for both of these is conditional on the Font Manager being
UCS-aware (thus ensuring that we continue to parse Encoding files
in the same way as before on systems with no UCS Font Manager).
|
|
|
|
|
|
|
| |
Change this into a callback-driven approach so that the logic for
dealing with each individual (glyph index, ucs4) pair is hoisted
out of the parsing code itself. This will allow us to use the same
parser implementation in different scenarios.
|
|
|
|
|
|
|
|
|
|
|
| |
As we introduce support for discovering and rendering astral
characters, ensure that we pass UCS-4 to the relevant Font Manager
APIs and extend our replacement hex code generation to emit
6 digits for codepoints outside the Basic Multilingual Plane.
This has necessitated a change to the API of the callback function
provided to rufl_paint_callback(). Where, previously, a 16 bit
UCS-2 string was exposed, we now expose UCS-4.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
No functional change, but redefine the meaning of the old "size"
member of the rufl_character_set structure to allow for the
addition of extension structures in future. This change is
backwards compatible as it is reusing previously unused bits in
the size field (which will be set to zero in all existing
RUfl_caches). Rename the "size" field to "metadata" which better
reflects its new usage.
Update rufl_character_set_test and rufl_dump_state to follow this
change (and fix up their parameter types while we're here).
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a non-Unicode world, a (non-Base) encoding may define glyphs for
up to 256 character codes. Ensure that at most 256 Encoding file
entries are used (as, otherwise, the character code will overflow).
In particular, if symbol fonts created for the Unicode Font Manager
(which does not have a 256 character limit for an encoding) are
installed on a non-Unicode-capable system, only the first 256
glyphs in the font are accessible although the Encoding file may
have more than 256 entries.
Note, however, that the first 32 character codes will never be used
as they are considered control codes. Thus, at most 224 usable
characters may be defined.
A further wrinkle is that glyph names may map to multiple Unicode
codepoints, thus consuming multiple slots in the unicode map (which
itself has a fixed size of 256 entries). Thus, it is technically
possible for the unicode map to further limit the number of usable
characters in a font to fewer than 224.
However, unless the font is particularly baroque, this isn't a
problem in the real world, because there are only 12 glyph names
which map to more than one Unicode codepoint (they map to 2, each,
for a total of 24 unicode map entries, if they're all present).
Thus, to run out of space in the unicode map, you'd need a font
which defines at least 4 of those glyphs twice (and defines the
others once, and also defines known glyphs for every other
character code).
Fixes #2577.
|
| |
|
| |
|
| |
|