From 703427a48612bf98fba599dfcd6e91485efd5b77 Mon Sep 17 00:00:00 2001 From: Vincent Sanders Date: Fri, 9 Jun 2017 17:28:55 +0100 Subject: Update documentation removing junk and moving to markdown for most text files --- docs/ideas/cache.txt | 178 ++++++++++++++++++++ docs/ideas/css-engine.txt | 381 ++++++++++++++++++++++++++++++++++++++++++ docs/ideas/render-library.txt | 121 ++++++++++++++ 3 files changed, 680 insertions(+) create mode 100644 docs/ideas/cache.txt create mode 100644 docs/ideas/css-engine.txt create mode 100644 docs/ideas/render-library.txt (limited to 'docs/ideas') diff --git a/docs/ideas/cache.txt b/docs/ideas/cache.txt new file mode 100644 index 000000000..fda0617a3 --- /dev/null +++ b/docs/ideas/cache.txt @@ -0,0 +1,178 @@ +Content caching +=============== + +NetSurf's existing fetch/cache architecture has a number of problems: + +1) Content dependencies are not modelled. +2) Content source data for non-shareable contents is duplicated. +3) Detection of content sharability is dependent on Content-Type, which + requires content cloning (which will fail for dependent contents). +4) Detection of cycles in content dependency graphs is not performed + (e.g. content1 includes content2, which includes content1). +5) All content caching is in-memory, there's no offline storage. + +Proposal +-------- + +A split-level cache. + +Low-level cache: + + + Responsible for source data (+header) management. + + Interfaces with low-level fetch system to retrieve data from network. + + Is responsible for offline storage (if any) of cache objects. + + Returns opaque handles to low-level cache objects. + + Handles HTTP redirects, recording URLs encountered when retrieving resource. + + May perform content-type sniffing (requires usage context) + +High-level cache: + + + Responsible for content objects. + + Tracks content dependencies (and potential cycles). + + Returns opaque handles to content objects. + + Manages content sharability & reusability (see below). + + Contents with unknown types are never shared and thus get unique handles. + + Content handles <> content objects: they're an indirection mechanism. + +Content sharability & reusability +-------------------------------- + + If a content is shareable, then it may have multiple concurrent users. + Otherwise, it may have at most one user. + + If a content is reusable, then it may be retained in the cache for later use + when it has no users. Otherwise, it will be removed from the cache when + it has no users. + +Example: retrieving a top-level resource +---------------------------------------- + + 1) Client requests an URL, specifying no parent handle. + 2) High-level cache asks low-level cache for low-level handle for URL. + 3) Low-level cache looks for appropriate object in its index. + a) it finds one that's not stale and returns its handle + b) it finds only stale entries, or no appropiate entry, + so allocates a new entry, requests a fetch for it, + and returns the handle. + 4) High-level cache looks for content objects that are using the low-level + handle. + a) it finds one that's shareable and selects its handle for use. + b) it finds only non-shareable entries, or no appropriate entry, + so allocates a new entry and selects its handle for use. + 5) High-level cache registers the parent and client with the selected handle, + then returns the selected handle. + 6) Client carries on, happy in the knowledge that a content is available. + +Example: retrieving a child resource +------------------------------------ + + 1) Client requests an URL, specifying parent handle. + 2) High-level cache searches parent+ancestors for requested URL. + a) it finds the URL, so returns a non-fatal error. + b) it does not find the URL, so proceeds from step 2 of the + top-level resource algorithm. + + NOTE: this approach means that shareable contents may have multiple parents. + +Handling of contents of unknown type +------------------------------------ + + Contents of unknown type are, by definition, not shareable. Therefore, each + client will be issued with a different content handle. + + Content types are only known once a resource's headers are fetched (or once + the type has been sniffed from the resource's data when the headers are + inconclusive). + + As a resource is fetched, users of the resource are informed of the fetch + status. Therefore, the high-level cache is always informed of fetch progress. + Cache clients need not care about this: they are simply interested in + a content's readiness for use. + + When the high-level cache is informed of a low-level cache object's type, + it is in a position to determine whether the corresponding content handles + can share a single content object or not. + + If it detects that a single content object may be shared by multiple handles, + it simply creates the content object and registers each of the handles as + a user of the content. + + If it detects that each handle requires a separate content object, then it + will create a content object for each handle and register the handle as a + user. + + This approach requires that clients of the high-level cache get issued with + handles to content objects, rather than content objects (so that the decision + whether to create multiple content objects can be deferred until suitable + information is available). + + Handles with no associated content object will act as if they had a content + object that was not ready for use. + +A more concrete example +----------------------- + + + bw1 contains html1 which includes css1, css2, img1, img2 + + bw2 contains html2 which includes css1, img1, img2 + + bw3 contains img1 + + Neither HTML nor CSS contents are shareable. + All shareable contents are requested from the high-level cache + once their type is known. + + Low-level cache contains source data for: + + 1 - html1 + 2 - html2 + 3 - css1 + 4 - css2 + 5 - img1 + 6 - img2 + + High-level cache contains: + + Content objects (ll-handle in parentheses): + + + c1 (1 - html1) + + c2 (2 - html2) + + c3 (3 - css1) + + c4 (4 - css2) + + c5 (5 - img1) + + c6 (6 - img2) + + c7 (3 - css1) + + Content handles (objects in parentheses): + + + h1 (c1, used by bw1) + + h2 (c3, used by h1) + + h3 (c4, used by h1) + + h4 (c2, used by bw2) + + h5 (c7, used by h4) + + h6 (c5, used by h1,h4,bw3) + + h7 (c6, used by h1,h4) + + If img1 was not of known type when requested: + + Content handles (objects in parentheses): + + + h1 (c1, used by bw1) + + h2 (c3, used by h1) + + h3 (c4, used by h1) + + h4 (c2, used by bw2) + + h5 (c7, used by h4) + + h6 (c5, used by h1) + + h7 (c6, used by h1,h4) + + h8 (c5, used by h4) + + h9 (c5, used by bw3) + +This achieves the desired effect that: + + + source data is shared between contents + + content objects are only created when absolutely necessary + + content usage/dependency is tracked and cycles avoided + + offline storage is possible + +Achieving this requires the use of indirection objects, but these are expected +to be small in comparison to the content objects / ll-cache objects that they +are indirecting. + diff --git a/docs/ideas/css-engine.txt b/docs/ideas/css-engine.txt new file mode 100644 index 000000000..1ea8778d5 --- /dev/null +++ b/docs/ideas/css-engine.txt @@ -0,0 +1,381 @@ +CSS engine +========== + +Requirements +------------ + + + Parse stylesheets conforming to the forward compatible CSS grammar + (Note that in the short term, the semantic analysis stage only need + support CSS2.1) + + Stylesheet management/merging (i.e. multiple stylesheets may be added + to a single engine context and thus affect style selection) + + Be able to select a style for a DOM node based upon the current stylesheets + in the engine context. + + Implemented as a standalone, reusable, library -- ideally MIT licensed. + +Suggested API +------------- + +struct css_context; +struct css_style; +struct css_stylesheet; + +typedef struct css_context css_context; +typedef struct css_style css_style; +typedef struct css_stylesheet css_stylesheet; + +typedef enum css_error { + CSS_OK, + CSS_NOMEM, + /* etc */ +} css_error; + +typedef enum css_origin { + CSS_ORIGIN_UA, + CSS_ORIGIN_USER, + CSS_ORIGIN_AUTHOR +} css_origin; + +#define CSS_MEDIA_SCREEN (1<<0) +#define CSS_MEDIA_PRINT (1<<1) +/* etc */ +#define CSS_MEDIA_ALL (0xffffffff) + +#define CSS_PSEUDO_CLASS_NONE (0) +#define CSS_PSEUDO_CLASS_LINK (1<<0) +#define CSS_PSEUDO_CLASS_VISITED (1<<1) +#define CSS_PSEUDO_CLASS_HOVER (1<<2) +#define CSS_PSEUDO_CLASS_ACTIVE (1<<3) +#define CSS_PSEUDO_CLASS_FOCUS (1<<4) + +typedef enum css_property { + CSS_BACKGROUND_ATTACHMENT, + /* etc */ +} css_property; + +typedef struct css_value { + css_property property; + + union { + css_background_attachment background_attachment; + /* etc */ + } value; +} css_value; + +typedef css_error (*css_import_handler)(void *pw, const char *url, + css_stylesheet *sheet); + +/* Initialise library */ +css_error css_init(void); +/* Finalise library */ +css_error css_fini(void); + +/* Create a stylesheet associated with the given URL, + * specifying the sheet's origin, the media type(s) it applies to and + * a callback routine for fetching imported sheets */ +css_stylesheet *css_stylesheet_create(const char *url, + css_origin origin, uint32_t media, + css_import_handler import_callback, void *pw); +/* Destroy a stylesheet */ +void css_stylesheet_destroy(css_stylesheet *sheet); + +/* Append data to a stylesheet, parsing progressively */ +css_error css_stylesheet_append_data(css_stylesheet *sheet, + const uint8_t *data, size_t len); +/* Tell stylesheet parser that there's no more data (will complete parsing) */ +css_error css_stylesheet_data_done(css_stylesheet *sheet); + +/* Retrieve the URL associated with a stylesheet */ +const char *css_stylesheet_get_url(css_stylesheet *sheet); +/* Retrieve the origin of a stylesheet */ +css_origin css_stylesheet_get_origin(css_stylesheet *sheet); +/* Retrieve the media type(s) applicable to a stylesheet */ +uint32_t css_stylesheet_get_media(css_stylesheet *sheet); + +/* Create a selection context */ +css_context *css_context_create(void); +/* Destroy a selection context */ +void css_context_destroy(css_context *context); + +/* Append a top-level stylesheet to a selection context */ +css_error css_context_append_sheet(css_context *context, + css_stylesheet *sheet); +/* Insert a top-level stylesheet into a selection context, at the given index */ +css_error css_context_insert_sheet(css_context *context, + css_stylesheet *sheet, uint32_t index); +/* Remove a top-level stylesheet from a selection context */ +css_error css_context_remove_sheet(css_context *context, + css_stylesheet *sheet); + +/* Retrieve the total number of top-level sheets in a selection context */ +uint32_t css_context_count_sheets(css_context *context); +/* Get a stylesheet from a selection context given an index [0, count) */ +const css_stylesheet *css_context_get_sheet(css_context *context, + uint32_t index); + +/* Select a style for a given DOM node with the given pseudo classes active + * and media type. + * + * If the document language contains non-CSS presentational hints (e.g. HTML + * presentational attributes etc), then these are passed in through + * property_list and treated as if they were encountered at the start of the + * author stylesheet with a specificity of 0. */ +css_style *css_style_select(css_context *context, + *node, uint32_t pseudo_classes, uint32_t media, + css_value **property_list, uint32_t property_list_length); +/* Destroy a selected style */ +void css_style_destroy(css_style *style); + +/* Retrieve a property value from a style */ +css_value *css_value_get(css_style *style, css_property property); +/* Destroy a property value */ +void css_value_destroy(css_value *value); + +Memory management +----------------- + + + Stylesheets are owned by their creator. Selection contexts reference them. + + Selection contexts are owned by the client. + + Selected styles are owned by the client. + + Property values are owned by the client. + + Therefore, the only difficulty lies within the handling of stylesheets + inserted into a selection context. The client code must ensure that a + stylesheet is destroyed after it has been removed from any selection + contexts which are using it. + +DOM node types & tree traversal +------------------------------- + + This is currently undecided. Either the CSS engine is tied to a DOM + implementation (and makes API calls directly), or it's more generic and + performs API calls through a vtable provided by the client. + +Imported stylesheets +-------------------- + + Imported stylesheets are handled by the CSS engine creating an appropriate + css_stylesheet object for the imported sheet and then asking the client + to fetch the data and append it to the sheet. The imported sheet is then + stored in the sheet that imported it. This effectively creates a tree of + stylesheets beneath the initial top-level sheet created by the client. + +Style selection algorithm +------------------------- + + css_style_select(context, node, pseudo_classes, media, + property_list, property_list_length): + result = blank_style; + done_props = false; + foreach sheet in context: + # Assumes that sheets are in the order UA, USER, AUTHOR + if !done_props && css_stylesheet_get_origin(sheet) == CSS_ORIGIN_AUTHOR: + fake_rule = fake_rule(node, property_list, property_list_length); + cascade(result, fake_rule, CSS_ORIGIN_AUTHOR); + done_props = true; + process_sheet(sheet, node, pseudo_classes, media, result); + return result; + + fake_rule(node, property_list, property_list_length): + rule = (node.name, 0); # Specificity is 0 + foreach (property, value, importance) in property_list: + rule[property] = (value, importance); + return rule; + + process_sheet(sheet, node, pseudo_classes, media, result): + if (css_stylesheet_get_media(sheet) & media) == 0: + return; + foreach import in sheet: + process_sheet(import, node, pseudo_classes, media, result); + origin = css_stylesheet_get_origin(sheet); + foreach rule in sheet: + if matches_rule(rule, node, pseudo_classes): + cascade(result, rule, origin); + + cascade(result, rule, origin): + foreach (property, value, importance) in rule: + insert = false; + if result[property]: + rOrigin = result[property].origin; + rImportance = result[property].importance; + rSpecificity = result[property].specificity; + if rOrigin < origin: + if rImportance == "important": + if rOrigin != CSS_ORIGIN_USER: + insert = true; + else: + insert = true; + else if rOrigin == origin: + if rImportance == "" && importance == "important": + if rOrigin == CSS_ORIGIN_UA: + if rSpecificity <= rule.specificity: + insert = true; + else: + insert = true; + else if rImportance == "important" && importance == "": + if rOrigin == CSS_ORIGIN_UA: + if rSpecificity <= rule.specificity: + insert = true; + else: + if rSpecificity <= rule.specificity: + insert = true; + else: + if origin == CSS_ORIGIN_USER && importance == "important": + insert = true; + else: + insert = true; + if insert: + result[property] = (value, origin, importance, rule.specificity); + +Outstanding issues +------------------ + + + Parsing/selection quirks. + + Probably as an argument to css_stylesheet_create() and possibly + css_style_select(). This could either take the form of a blanket + full/almost/not quirks mode flag or be more granular and permit the + toggling of individual quirks. + + References: + + + http://developer.mozilla.org/en/docs/Mozilla_Quirks_Mode_Behavior + + http://www.opera.com/docs/specs/doctype/ + + http://www.quirksmode.org/css/quirksmode.html + + http://www.cs.tut.fi/~jkorpela/quirks-mode.html + + Grep WebKit sources for inCompatMode() + + + The :lang pseudo-class + + Need to pass the current language string into css_style_select() + + + Pseudo-elements + + Probably as an argument to css_style_select(). Most likely a bitfield + like the way in which pseudo-classes are handled. + + The inheritance model of :first-line and :first-letter is such that: + + + css_style_select() must begin with a blank style and not the + parent node's style + + an API for cascading one style onto another is needed + + This is because pseudo-elements may be nested inside children of the + node to which they are logically connected. e.g.: + +
+

+ first paragraph +

+
+ + is logically equivalent to + +
+

+ + + first paragraph + + +

+
+ + so the actual cascade order is only known at the time the render tree is + built. Note that, courtesy of scripting, the location of pseudo-elements + can move around (e.g. if some text was inserted just before the

within + the div, above, then would move). Additionally, the actual + content that pseudo-elements apply to can change due to reflow. + + Pseudo-elements may also affect the processing of inline boxes. e.g.: + +

+ foo bar baz bat +

+ + becomes (logically): + +

+ + foo bar baz + + bat +

+ + In terms of interaction between pseudo-elements, :first-letter inherits + from :first-line e.g.: + +

+ first line + second line +

+ + becomes (logically): + +

+ + + f + + irst line + + second line +

+ + :first-line and :first-letter apply to the relevant content _including_ any + text inserted using :before and :after. + + List of CSS 3 pseudo-elements: + + + :(:)?first-line + + :(:)?first-letter + + :(:)?before + + :(:)?after + + ::selection + + ::footnote-call + + ::footnote-marker + + ::before-page-break + + ::after-page-break + + ::line-number-left + + ::line-number-right + + ::line-number-inside + + ::line-number-outside + + ::slot() + + ::value + + ::choices + + ::repeat-item + + ::repeat-index + + ::marker + + ::outside + + ::alternate + + ::line-marker + + References: + + + CSS 2.1 $$5.12 and $$12.1 + + + Stylesheet charset handling + + An embedded stylesheet shares the charset of the containing document. + + The charset of a stand-alone stylesheet can be specified by (in order of + priority, highest -> lowest): + + + the transport layer + + a BOM and/or @charset at the immediate start of the sheet + + or other metadata from the linking mechanism + + charset of referring stylesheet or document + + assuming UTF-8 + + The API currently has no way of conveying the first, third, or fourth of + these to the engine. This can be realised through the addition of a + parameter to css_stylesheet_create() + + CSS 2.1 $4.4 specifies that a stylesheet's transport encoding must be a + superset of US-ASCII. + + The internal encoding will be UTF-8. + + All strings passed in by the client are assumed to be UTF-8 encoded. + Strings retrieved from DOM nodes are assumed to be UTF-8 encoded. + diff --git a/docs/ideas/render-library.txt b/docs/ideas/render-library.txt new file mode 100644 index 000000000..db645c427 --- /dev/null +++ b/docs/ideas/render-library.txt @@ -0,0 +1,121 @@ +Rendering library +================= + +General notes +------------- + + + Potentially long-running routines probably want to exit early and + ask to be resumed (or similar) + + There's loads of stuff missing from here (like a typesystem :) + +Possible API +------------ + + /* Initialise library */ + error html_init(void); + /* Finalise library */ + error html_fini(void); + + /* Create a context */ + ctx html_create(void); + /* Destroy a context */ + void html_destroy(ctx); + + /* Configure a context + * + * Things that need configuring: + * + * Callbacks from library -> client: + * + * + Handler for embedded object fetch requests (how to handle frames?) + * + Event notification handler (e.g. form submission / link navigation, + * mouse pointer shape changing, redraw request, position caret, etc) + * + * Other stuff: + * + * + Scale? (should this be handled by the client?) + * + Whether to run scripts? (possibly, not needed yet) + */ + error html_setopt(ctx, opttype, optparams); + + /* Feed HTML data to a context */ + error html_process_data(ctx, data, len); + /* Flag end of data to context */ + error html_data_done(ctx); + + /* Reflow context, to given width/height */ + error html_reflow(ctx, width, height); + + /* Redraw context, using provided plotters */ + error html_redraw(ctx, rect, plot_table); + + /* Some kind of input event notification APIs. + * These are called by the client to notify the library + * that something's happened. + * + * e.g.: + */ + error html_mouse_move(ctx, x, y); + error html_mouse_press(ctx, x, y, buttons, modifiers); + error html_mouse_release(ctx, x, y, buttons, modifiers); + error html_key_press(ctx, key, modifiers); + error html_key_release(ctx, key, modifiers); + error html_scroll_x(ctx, offset); + error html_scroll_y(ctx, offset); + + /* Retrieve properties of document in context + * + * e.g.: + */ + error html_get_title(ctx, title); + +Example usage +------------- + +/* Main routine */ +main: + /* Initialise library */ + html_init(); + + /* Create a context */ + ctx = html_create(); + + /* Configure the context */ + html_setopt(ctx, FETCH_HANDLER, my_fetcher); + html_setopt(ctx, EVENT_HANDLER, my_event_handler); + + /* Get it to process data */ + foreach (chunk, len) in data: + html_process_data(ctx, chunk, len); + html_data_done(ctx); + + /* Reflow content to desired dimensions */ + html_reflow(ctx, width, height); + + /* Main client event loop -- processes UI-toolkit events */ + do: + on mouse event: + html_mouse_{move,press,release}(ctx, event.x, event.y ...); + on key event: + html_key_{press,release}{ctx, event.key, event.modifiers); + on scroll event: + html_scroll_{x,y}(ctx, event.offset); + on redraw event: + html_redraw(ctx, event.rect, my_plotters); + until quit; + + /* Destroy context */ + html_destroy(ctx); + + /* Finalise library */ + html_fini(); + +/* Event handler for library-generated events */ +my_event_handler: + on pointer shape change: + set_pointer_shape(shape); + on redraw request: + redraw_window(window); + on position caret: + position caret(x, y); + -- cgit v1.2.3