3 files changed, 680 insertions, 0 deletions
diff --git a/docs/ideas/cache.txt b/docs/ideas/cache.txt
new file mode 100644
index 000000000..fda0617a3
--- /dev/null
+++ b/docs/ideas/cache.txt
@@ -0,0 +1,178 @@
+Content caching
+===============
+
+NetSurf's existing fetch/cache architecture has a number of problems:
+
+1) Content dependencies are not modelled.
+2) Content source data for non-shareable contents is duplicated.
+3) Detection of content sharability is dependent on Content-Type, which
+   requires content cloning (which will fail for dependent contents).
+4) Detection of cycles in content dependency graphs is not performed
+   (e.g. content1 includes content2, which includes content1).
+5) All content caching is in-memory, there's no offline storage.
+
+Proposal
+--------
+
+A split-level cache.
+
+Low-level cache:
+
+  + Responsible for source data (+header) management.
+  + Interfaces with low-level fetch system to retrieve data from network.
+  + Is responsible for offline storage (if any) of cache objects.
+  + Returns opaque handles to low-level cache objects.
+  + Handles HTTP redirects, recording URLs encountered when retrieving resource.
+  + May perform content-type sniffing (requires usage context)
+
+High-level cache:
+
+  + Responsible for content objects.
+  + Tracks content dependencies (and potential cycles).
+  + Returns opaque handles to content objects.
+  + Manages content sharability & reusability (see below).
+  + Contents with unknown types are never shared and thus get unique handles.
+  + Content handles <> content objects: they're an indirection mechanism.
+
+Content sharability & reusability
+--------------------------------
+
+  If a content is shareable, then it may have multiple concurrent users.
+  Otherwise, it may have at most one user.
+
+  If a content is reusable, then it may be retained in the cache for later use
+  when it has no users. Otherwise, it will be removed from the cache when
+  it has no users.
+
+Example: retrieving a top-level resource
+----------------------------------------
+
+  1) Client requests an URL, specifying no parent handle.
+  2) High-level cache asks low-level cache for low-level handle for URL.
+  3) Low-level cache looks for appropriate object in its index.
+     a) it finds one that's not stale and returns its handle
+     b) it finds only stale entries, or no appropiate entry,
+        so allocates a new entry, requests a fetch for it, 
+        and returns the handle.
+  4) High-level cache looks for content objects that are using the low-level 
+     handle.
+     a) it finds one that's shareable and selects its handle for use.
+     b) it finds only non-shareable entries, or no appropriate entry,
+        so allocates a new entry and selects its handle for use.
+  5) High-level cache registers the parent and client with the selected handle,
+     then returns the selected handle.
+  6) Client carries on, happy in the knowledge that a content is available.
+
+Example: retrieving a child resource
+------------------------------------
+
+  1) Client requests an URL, specifying parent handle.
+  2) High-level cache searches parent+ancestors for requested URL.
+     a) it finds the URL, so returns a non-fatal error.
+     b) it does not find the URL, so proceeds from step 2 of the 
+        top-level resource algorithm.
+
+  NOTE: this approach means that shareable contents may have multiple parents.
+
+Handling of contents of unknown type
+------------------------------------
+
+  Contents of unknown type are, by definition, not shareable. Therefore, each
+  client will be issued with a different content handle.
+
+  Content types are only known once a resource's headers are fetched (or once
+  the type has been sniffed from the resource's data when the headers are
+  inconclusive).
+
+  As a resource is fetched, users of the resource are informed of the fetch
+  status. Therefore, the high-level cache is always informed of fetch progress.
+  Cache clients need not care about this: they are simply interested in
+  a content's readiness for use.
+
+  When the high-level cache is informed of a low-level cache object's type,
+  it is in a position to determine whether the corresponding content handles
+  can share a single content object or not.
+
+  If it detects that a single content object may be shared by multiple handles,
+  it simply creates the content object and registers each of the handles as
+  a user of the content.
+
+  If it detects that each handle requires a separate content object, then it
+  will create a content object for each handle and register the handle as a
+  user.
+
+  This approach requires that clients of the high-level cache get issued with
+  handles to content objects, rather than content objects (so that the decision
+  whether to create multiple content objects can be deferred until suitable
+  information is available). 
+
+  Handles with no associated content object will act as if they had a content
+  object that was not ready for use.
+
+A more concrete example
+-----------------------
+
+  + bw1 contains html1 which includes css1, css2, img1, img2
+  + bw2 contains html2 which includes css1, img1, img2
+  + bw3 contains img1
+
+  Neither HTML nor CSS contents are shareable.
+  All shareable contents are requested from the high-level cache 
+  once their type is known.
+
+  Low-level cache contains source data for:
+
+    1 - html1
+    2 - html2
+    3 - css1
+    4 - css2
+    5 - img1
+    6 - img2
+
+  High-level cache contains:
+
+    Content objects (ll-handle in parentheses):
+
+      + c1 (1 - html1)
+      + c2 (2 - html2)
+      + c3 (3 - css1)
+      + c4 (4 - css2)
+      + c5 (5 - img1)
+      + c6 (6 - img2)
+      + c7 (3 - css1)
+
+    Content handles (objects in parentheses):
+
+      + h1 (c1, used by bw1)
+      + h2 (c3, used by h1)
+      + h3 (c4, used by h1)
+      + h4 (c2, used by bw2)
+      + h5 (c7, used by h4)
+      + h6 (c5, used by h1,h4,bw3)
+      + h7 (c6, used by h1,h4)
+
+  If img1 was not of known type when requested:
+
+    Content handles (objects in parentheses):
+
+      + h1 (c1, used by bw1)
+      + h2 (c3, used by h1)
+      + h3 (c4, used by h1)
+      + h4 (c2, used by bw2)
+      + h5 (c7, used by h4)
+      + h6 (c5, used by h1)
+      + h7 (c6, used by h1,h4)
+      + h8 (c5, used by h4)
+      + h9 (c5, used by bw3)
+
+This achieves the desired effect that:
+
+  + source data is shared between contents
+  + content objects are only created when absolutely necessary
+  + content usage/dependency is tracked and cycles avoided
+  + offline storage is possible
+
+Achieving this requires the use of indirection objects, but these are expected
+to be small in comparison to the content objects / ll-cache objects that they
+are indirecting.
+
diff --git a/docs/ideas/css-engine.txt b/docs/ideas/css-engine.txt
new file mode 100644
index 000000000..1ea8778d5
--- /dev/null
+++ b/docs/ideas/css-engine.txt
@@ -0,0 +1,381 @@
+CSS engine
+==========
+
+Requirements
+------------
+
+ + Parse stylesheets conforming to the forward compatible CSS grammar
+   (Note that in the short term, the semantic analysis stage only need
+    support CSS2.1)
+ + Stylesheet management/merging (i.e. multiple stylesheets may be added 
+   to a single engine context and thus affect style selection)
+ + Be able to select a style for a DOM node based upon the current stylesheets
+   in the engine context.
+ + Implemented as a standalone, reusable, library -- ideally MIT licensed.
+
+Suggested API
+-------------
+
+struct css_context;
+struct css_style;
+struct css_stylesheet;
+
+typedef struct css_context css_context;
+typedef struct css_style css_style;
+typedef struct css_stylesheet css_stylesheet;
+
+typedef enum css_error {
+	CSS_OK,
+	CSS_NOMEM,
+	/* etc */
+} css_error;
+
+typedef enum css_origin {
+	CSS_ORIGIN_UA,
+	CSS_ORIGIN_USER,
+	CSS_ORIGIN_AUTHOR
+} css_origin;
+
+#define CSS_MEDIA_SCREEN (1<<0)
+#define CSS_MEDIA_PRINT  (1<<1)
+/* etc */
+#define CSS_MEDIA_ALL    (0xffffffff)
+
+#define CSS_PSEUDO_CLASS_NONE    (0)
+#define CSS_PSEUDO_CLASS_LINK    (1<<0)
+#define CSS_PSEUDO_CLASS_VISITED (1<<1)
+#define CSS_PSEUDO_CLASS_HOVER   (1<<2)
+#define CSS_PSEUDO_CLASS_ACTIVE  (1<<3)
+#define CSS_PSEUDO_CLASS_FOCUS   (1<<4)
+
+typedef enum css_property {
+	CSS_BACKGROUND_ATTACHMENT,
+	/* etc */
+} css_property;
+
+typedef struct css_value {
+	css_property property;
+
+	union {
+		css_background_attachment background_attachment;
+		/* etc */
+	} value;
+} css_value;
+
+typedef css_error (*css_import_handler)(void *pw, const char *url,
+		css_stylesheet *sheet);
+
+/* Initialise library */
+css_error css_init(void);
+/* Finalise library */
+css_error css_fini(void);
+
+/* Create a stylesheet associated with the given URL,
+ * specifying the sheet's origin, the media type(s) it applies to and 
+ * a callback routine for fetching imported sheets */
+css_stylesheet *css_stylesheet_create(const char *url, 
+		css_origin origin, uint32_t media,
+		css_import_handler import_callback, void *pw);
+/* Destroy a stylesheet */
+void css_stylesheet_destroy(css_stylesheet *sheet);
+
+/* Append data to a stylesheet, parsing progressively */
+css_error css_stylesheet_append_data(css_stylesheet *sheet,
+		const uint8_t *data, size_t len);
+/* Tell stylesheet parser that there's no more data (will complete parsing) */
+css_error css_stylesheet_data_done(css_stylesheet *sheet);
+
+/* Retrieve the URL associated with a stylesheet */
+const char *css_stylesheet_get_url(css_stylesheet *sheet);
+/* Retrieve the origin of a stylesheet */
+css_origin css_stylesheet_get_origin(css_stylesheet *sheet);
+/* Retrieve the media type(s) applicable to a stylesheet */
+uint32_t css_stylesheet_get_media(css_stylesheet *sheet);
+
+/* Create a selection context */
+css_context *css_context_create(void);
+/* Destroy a selection context */
+void css_context_destroy(css_context *context);
+
+/* Append a top-level stylesheet to a selection context */
+css_error css_context_append_sheet(css_context *context,
+		css_stylesheet *sheet);
+/* Insert a top-level stylesheet into a selection context, at the given index */
+css_error css_context_insert_sheet(css_context *context, 
+		css_stylesheet *sheet, uint32_t index);
+/* Remove a top-level stylesheet from a selection context */
+css_error css_context_remove_sheet(css_context *context, 
+		css_stylesheet *sheet);
+
+/* Retrieve the total number of top-level sheets in a selection context */
+uint32_t css_context_count_sheets(css_context *context);
+/* Get a stylesheet from a selection context given an index [0, count) */
+const css_stylesheet *css_context_get_sheet(css_context *context, 
+		uint32_t index);
+
+/* Select a style for a given DOM node with the given pseudo classes active
+ * and media type.
+ *
+ * If the document language contains non-CSS presentational hints (e.g. HTML
+ * presentational attributes etc), then these are passed in through 
+ * property_list and treated as if they were encountered at the start of the
+ * author stylesheet with a specificity of 0. */
+css_style *css_style_select(css_context *context,
+		<dom_node_type> *node, uint32_t pseudo_classes, uint32_t media,
+		css_value **property_list, uint32_t property_list_length);
+/* Destroy a selected style */
+void css_style_destroy(css_style *style);
+
+/* Retrieve a property value from a style */
+css_value *css_value_get(css_style *style, css_property property);
+/* Destroy a property value */
+void css_value_destroy(css_value *value);
+
+Memory management
+-----------------
+
+ + Stylesheets are owned by their creator. Selection contexts reference them.
+ + Selection contexts are owned by the client.
+ + Selected styles are owned by the client.
+ + Property values are owned by the client.
+
+ Therefore, the only difficulty lies within the handling of stylesheets 
+ inserted into a selection context. The client code must ensure that a
+ stylesheet is destroyed after it has been removed from any selection
+ contexts which are using it.
+
+DOM node types & tree traversal
+-------------------------------
+
+ This is currently undecided. Either the CSS engine is tied to a DOM 
+ implementation (and makes API calls directly), or it's more generic and
+ performs API calls through a vtable provided by the client.
+
+Imported stylesheets
+--------------------
+
+ Imported stylesheets are handled by the CSS engine creating an appropriate
+ css_stylesheet object for the imported sheet and then asking the client
+ to fetch the data and append it to the sheet. The imported sheet is then 
+ stored in the sheet that imported it. This effectively creates a tree of
+ stylesheets beneath the initial top-level sheet created by the client.
+
+Style selection algorithm
+-------------------------
+
+ css_style_select(context, node, pseudo_classes, media, 
+	 property_list, property_list_length):
+   result = blank_style;
+   done_props = false;
+   foreach sheet in context:
+     # Assumes that sheets are in the order UA, USER, AUTHOR
+     if !done_props && css_stylesheet_get_origin(sheet) == CSS_ORIGIN_AUTHOR:
+       fake_rule = fake_rule(node, property_list, property_list_length);
+       cascade(result, fake_rule, CSS_ORIGIN_AUTHOR);
+       done_props = true;
+     process_sheet(sheet, node, pseudo_classes, media, result);
+   return result;
+
+ fake_rule(node, property_list, property_list_length):
+   rule = (node.name, 0); # Specificity is 0
+   foreach (property, value, importance) in property_list:
+     rule[property] = (value, importance);
+   return rule;
+
+ process_sheet(sheet, node, pseudo_classes, media, result):
+   if (css_stylesheet_get_media(sheet) & media) == 0:
+     return;
+   foreach import in sheet:
+     process_sheet(import, node, pseudo_classes, media, result);
+   origin = css_stylesheet_get_origin(sheet);
+   foreach rule in sheet:
+     if matches_rule(rule, node, pseudo_classes):
+       cascade(result, rule, origin);
+
+ cascade(result, rule, origin):
+   foreach (property, value, importance) in rule:
+     insert = false;
+     if result[property]:
+       rOrigin = result[property].origin;
+       rImportance = result[property].importance;
+       rSpecificity = result[property].specificity;
+       if rOrigin < origin:
+         if rImportance == "important":
+           if rOrigin != CSS_ORIGIN_USER:
+             insert = true;
+         else:
+           insert = true;
+       else if rOrigin == origin:
+         if rImportance == "" && importance == "important":
+           if rOrigin == CSS_ORIGIN_UA:
+             if rSpecificity <= rule.specificity:
+               insert = true;
+           else:
+             insert = true;
+         else if rImportance == "important" && importance == "":
+           if rOrigin == CSS_ORIGIN_UA:
+             if rSpecificity <= rule.specificity:
+               insert = true;
+         else:
+           if rSpecificity <= rule.specificity:
+             insert = true;
+       else:
+         if origin == CSS_ORIGIN_USER && importance == "important":
+           insert = true;
+     else:
+       insert = true;
+     if insert:
+       result[property] = (value, origin, importance, rule.specificity);
+
+Outstanding issues
+------------------
+
+  + Parsing/selection quirks.
+  
+    Probably as an argument to css_stylesheet_create() and possibly 
+    css_style_select(). This could either take the form of a blanket 
+    full/almost/not quirks mode flag or be more granular and permit the 
+    toggling of individual quirks.
+
+    References:
+
+     + http://developer.mozilla.org/en/docs/Mozilla_Quirks_Mode_Behavior
+     + http://www.opera.com/docs/specs/doctype/
+     + http://www.quirksmode.org/css/quirksmode.html
+     + http://www.cs.tut.fi/~jkorpela/quirks-mode.html
+     + Grep WebKit sources for inCompatMode()
+
+  + The :lang pseudo-class
+
+    Need to pass the current language string into css_style_select()
+
+  + Pseudo-elements
+
+    Probably as an argument to css_style_select(). Most likely a bitfield
+    like the way in which pseudo-classes are handled.
+    
+    The inheritance model of :first-line and :first-letter is such that:
+
+      + css_style_select() must begin with a blank style and not the 
+        parent node's style
+      + an API for cascading one style onto another is needed
+
+    This is because pseudo-elements may be nested inside children of the
+    node to which they are logically connected. e.g.:
+
+      <div>
+       <p>
+        first paragraph
+       </p>
+      </div> 
+    
+    is logically equivalent to
+
+      <div>
+       <p>
+        <div:first-line>
+         <p:first-line>
+          first paragraph
+         </p:first-line>
+        </div:first-line>
+       </p>
+      </div>
+
+    so the actual cascade order is only known at the time the render tree is
+    built. Note that, courtesy of scripting, the location of pseudo-elements
+    can move around (e.g. if some text was inserted just before the <p> within
+    the div, above, then <div:first-line> would move). Additionally, the actual
+    content that pseudo-elements apply to can change due to reflow.
+
+    Pseudo-elements may also affect the processing of inline boxes. e.g.:
+
+      <p>
+       <span>foo bar baz bat</span>
+      </p>
+
+    becomes (logically):
+
+      <p>
+       <p:first-line>
+        <span>foo bar baz </span>
+       </p:first-line>
+       <span>bat</span>
+      </p>
+
+    In terms of interaction between pseudo-elements, :first-letter inherits
+    from :first-line e.g.:
+
+      <p>
+       first line
+       second line
+      </p>
+
+    becomes (logically):
+
+      <p>
+       <p:first-line>
+        <p:first-letter>
+         f
+        </p:first-letter>
+        irst line
+       </p:first-line>
+       second line
+      </p>
+
+    :first-line and :first-letter apply to the relevant content _including_ any
+    text inserted using :before and :after.
+
+    List of CSS 3 pseudo-elements:
+
+     + :(:)?first-line
+     + :(:)?first-letter
+     + :(:)?before
+     + :(:)?after
+     + ::selection
+     + ::footnote-call
+     + ::footnote-marker
+     + ::before-page-break
+     + ::after-page-break
+     + ::line-number-left
+     + ::line-number-right
+     + ::line-number-inside
+     + ::line-number-outside
+     + ::slot()
+     + ::value
+     + ::choices
+     + ::repeat-item
+     + ::repeat-index
+     + ::marker
+     + ::outside
+     + ::alternate
+     + ::line-marker
+
+    References:
+
+     + CSS 2.1 $$5.12 and $$12.1
+
+  + Stylesheet charset handling
+
+    An embedded stylesheet shares the charset of the containing document.
+
+    The charset of a stand-alone stylesheet can be specified by (in order of 
+    priority, highest -> lowest):
+    
+     + the transport layer
+     + a BOM and/or @charset at the immediate start of the sheet
+     + <link charset=""> or other metadata from the linking mechanism
+     + charset of referring stylesheet or document
+     + assuming UTF-8
+
+    The API currently has no way of conveying the first, third, or fourth of 
+    these to the engine. This can be realised through the addition of a 
+    parameter to css_stylesheet_create()
+
+    CSS 2.1 $4.4 specifies that a stylesheet's transport encoding must be a 
+    superset of US-ASCII.
+
+    The internal encoding will be UTF-8.
+
+    All strings passed in by the client are assumed to be UTF-8 encoded.
+    Strings retrieved from DOM nodes are assumed to be UTF-8 encoded.
+
diff --git a/docs/ideas/render-library.txt b/docs/ideas/render-library.txt
new file mode 100644
index 000000000..db645c427
--- /dev/null
+++ b/docs/ideas/render-library.txt
@@ -0,0 +1,121 @@
+Rendering library
+=================
+
+General notes
+-------------
+
+ + Potentially long-running routines probably want to exit early and 
+   ask to be resumed (or similar)
+ + There's loads of stuff missing from here (like a typesystem :)
+
+Possible API
+------------
+
+  /* Initialise library */
+  error html_init(void);
+  /* Finalise library */
+  error html_fini(void);
+
+  /* Create a context */
+  ctx   html_create(void);
+  /* Destroy a context */
+  void  html_destroy(ctx);
+
+  /* Configure a context 
+   *
+   * Things that need configuring:
+   *
+   * Callbacks from library -> client:
+   *
+   * + Handler for embedded object fetch requests (how to handle frames?)
+   * + Event notification handler (e.g. form submission / link navigation, 
+   *   mouse pointer shape changing, redraw request, position caret, etc)
+   *
+   * Other stuff:
+   *
+   * + Scale? (should this be handled by the client?)
+   * + Whether to run scripts? (possibly, not needed yet)
+   */
+  error html_setopt(ctx, opttype, optparams);
+
+  /* Feed HTML data to a context */
+  error html_process_data(ctx, data, len);
+  /* Flag end of data to context */
+  error html_data_done(ctx);
+
+  /* Reflow context, to given width/height */
+  error html_reflow(ctx, width, height);
+
+  /* Redraw context, using provided plotters */
+  error html_redraw(ctx, rect, plot_table);
+
+  /* Some kind of input event notification APIs. 
+   * These are called by the client to notify the library 
+   * that something's happened.
+   *
+   * e.g.:
+   */
+  error html_mouse_move(ctx, x, y);
+  error html_mouse_press(ctx, x, y, buttons, modifiers);
+  error html_mouse_release(ctx, x, y, buttons, modifiers);
+  error html_key_press(ctx, key, modifiers);
+  error html_key_release(ctx, key, modifiers);
+  error html_scroll_x(ctx, offset);
+  error html_scroll_y(ctx, offset);
+
+  /* Retrieve properties of document in context 
+   *
+   * e.g.:
+   */
+  error html_get_title(ctx, title);
+
+Example usage
+-------------
+
+/* Main routine */
+main:
+  /* Initialise library */
+  html_init();
+
+  /* Create a context */
+  ctx = html_create();
+
+  /* Configure the context */
+  html_setopt(ctx, FETCH_HANDLER, my_fetcher);
+  html_setopt(ctx, EVENT_HANDLER, my_event_handler);
+
+  /* Get it to process data */
+  foreach (chunk, len) in data:
+    html_process_data(ctx, chunk, len);
+  html_data_done(ctx);
+
+  /* Reflow content to desired dimensions */
+  html_reflow(ctx, width, height);
+
+  /* Main client event loop -- processes UI-toolkit events */
+  do:
+    on mouse event:
+      html_mouse_{move,press,release}(ctx, event.x, event.y ...);
+    on key event:
+      html_key_{press,release}{ctx, event.key, event.modifiers);
+    on scroll event:
+      html_scroll_{x,y}(ctx, event.offset);
+    on redraw event:
+      html_redraw(ctx, event.rect, my_plotters);
+   until quit;
+
+  /* Destroy context */
+  html_destroy(ctx);
+
+  /* Finalise library */
+  html_fini();
+
+/* Event handler for library-generated events */
+my_event_handler:
+  on pointer shape change:
+    set_pointer_shape(shape);
+  on redraw request:
+    redraw_window(window);
+  on position caret:
+    position caret(x, y);
+