summaryrefslogtreecommitdiff
path: root/include/hubbub
Commit message (Collapse)AuthorAgeFilesLines
* This is perhaps the best way to treat an incoming script content_model_flag. ↵Rupinder Singh Khokhar2014-08-011-1/+2
| | | | Black-boxing is mantained, & a switch is allowed only to a script data state. Script content model can't be incorporated in the style of rcdata & rawtext data, wherein it was easy to make a 1to1 matching between handlers and states. Also fixed the tokeniser to properly handle script tags. tokeniser was earlier modified in commit 7b6b8eb6fcbdd175540902ca699e7e704b90f9e0, has now been tested & bugs removed. Additionaly, in every loop of the dispatcher, it will be checked whether it is safe for tokeniser to process CDATA, and corresponding opts on the tokeniser will be set. this may slow the library down because of repeated checking in every loop. The tokeniser code has become unbearably messy due to the script tags, so a little tiding up & optimisation will be done later ;)
* Adding PLAINTEXT State & fixing the tester at placesRupinder Singh Khokhar2014-07-091-1/+2
|
* Remove client allocation function and update for new lpu API.Michael Drake2013-12-142-13/+1
|
* Add ability to pause tokenisationVincent Sanders2012-07-102-2/+6
|
* Add Script complete callbackVincent Sanders2012-07-051-0/+10
|
* add hubbub_parser_insert_chunkVincent Sanders2012-07-031-0/+17
|
* Remove init/final and embed entity trie at build time. r=vinceDaniel Silverstone2010-12-041-7/+0
| | | | svn path=/trunk/hubbub/; revision=10976
* Sprinkle some C++ scoping aroundJohn Mark Bell2010-10-236-0/+54
| | | | svn path=/trunk/hubbub/; revision=10901
* Lose trailing commas.John Mark Bell2009-04-152-2/+2
| | | | | | GCC 2.95 compatibility. svn path=/trunk/hubbub/; revision=7095
* Manually merge r7070 into trunkJohn Mark Bell2009-04-151-37/+62
| | | | svn path=/trunk/hubbub/; revision=7082
* Remove hubbub_parser_parse_extraneous_chunk -- this is not supported yet. ↵John Mark Bell2009-04-041-4/+0
| | | | | | Surround implementation with #if 0 pending its reintroduction svn path=/trunk/hubbub/; revision=7045
* ConstifyJohn Mark Bell2009-04-041-2/+2
| | | | svn path=/trunk/hubbub/; revision=7044
* hubbub_alloc -> hubbub_allocator_fnJohn Mark Bell2009-04-043-4/+4
| | | | svn path=/trunk/hubbub/; revision=7043
* Move tree callback definitions into tree.hJohn Mark Bell2009-04-042-218/+218
| | | | svn path=/trunk/hubbub/; revision=7042
* Move hubbub_error_from_string into testutils.h and remove it from the library.John Mark Bell2009-04-041-2/+0
| | | | svn path=/trunk/hubbub/; revision=7041
* Improve documentation of tree handler APIs.John Mark Bell2009-01-101-18/+157
| | | | svn path=/trunk/hubbub/; revision=6019
* Use doxygen to create API documentation.John Mark Bell2009-01-083-26/+26
| | | | | | Add a bunch of extra commentary to stop doxygen warning. svn path=/trunk/hubbub/; revision=5994
* Port to changed lpu API.John Mark Bell2009-01-061-3/+2
| | | | | | | Drop HUBBUB_OOD and just use HUBBUB_NEEDDATA, instead. Currently aborts in bogus comment handling if it encounters a \r at the end of the inputstream's utf-8 buffer. svn path=/trunk/hubbub/; revision=5966
* Convert PARSERUTILS_BADENCODING into HUBBUB_BADENCODINGJohn Mark Bell2008-11-091-1/+2
| | | | svn path=/trunk/hubbub/; revision=5667
* Return errors from parser constructor/destructor. This changes the public API.John Mark Bell2008-11-091-3/+3
| | | | svn path=/trunk/hubbub/; revision=5666
* Fixup dubious charsetsJohn Mark Bell2008-10-141-1/+2
| | | | svn path=/trunk/hubbub/; revision=5575
* Purge redundant APIJohn Mark Bell2008-09-251-4/+0
| | | | svn path=/trunk/hubbub/; revision=5432
* Report errors from libparserutils better.Andrew Sidwell2008-09-241-0/+2
| | | | svn path=/trunk/hubbub/; revision=5431
* Move one step closer to getting encoding changes working.Andrew Sidwell2008-08-111-5/+4
| | | | svn path=/trunk/hubbub/; revision=5000
* Make the encoding change callback send the textual name rather than the ↵Andrew Sidwell2008-08-101-1/+1
| | | | | | mibenum value. svn path=/trunk/hubbub/; revision=4992
* Add <meta charset> support in the treebuilder.Andrew Sidwell2008-08-103-0/+7
| | | | svn path=/trunk/hubbub/; revision=4991
* Switch to using hubbub_error for reprocessing state from just a bool, to ↵Andrew Sidwell2008-08-101-0/+1
| | | | | | allow for encoding change info to be returned more easily. svn path=/trunk/hubbub/; revision=4989
* Propagate the use of hubbub_error up into at least a bit of the treebuilder.Andrew Sidwell2008-08-091-1/+2
| | | | svn path=/trunk/hubbub/; revision=4979
* Move tokeniser.c across to using hubbub_error for return codes, not bools, ↵Andrew Sidwell2008-08-091-5/+6
| | | | | | so that "encoding change" requests can be sent back down the chain from the treebuilder at some point. svn path=/trunk/hubbub/; revision=4978
* Er yes. Let's commit the headers, too, shall we?John Mark Bell2008-08-051-0/+3
| | | | svn path=/trunk/hubbub/; revision=4913
* Stop pretending Hubbub has an internal encoding.Andrew Sidwell2008-08-021-1/+1
| | | | svn path=/trunk/hubbub/; revision=4859
* Merged revisions 4631-4838 via svnmerge from John Mark Bell2008-07-313-22/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | svn://source.netsurf-browser.org/branches/takkaria/hubbub-parserutils ........ r4631 | takkaria | 2008-07-13 12:54:30 +0100 (Sun, 13 Jul 2008) | 2 lines Initial hatchet job moving to libparserutils (search and replace and a bit of cleaning up). This doesn't compile. ........ r4632 | takkaria | 2008-07-13 15:28:52 +0100 (Sun, 13 Jul 2008) | 2 lines libparserutilize everything up to the "before attribute name" state. (Not compiling) ........ r4633 | takkaria | 2008-07-13 15:32:14 +0100 (Sun, 13 Jul 2008) | 2 lines Replace all uses of "current_{comment|chars}" with just "chars". ........ r4634 | takkaria | 2008-07-13 16:12:06 +0100 (Sun, 13 Jul 2008) | 2 lines Fix lots of compile errors, lpuise "before attribute name" state. ........ r4636 | takkaria | 2008-07-13 17:23:17 +0100 (Sun, 13 Jul 2008) | 2 lines Finish lpuising the tag states, apart from character references. ........ r4637 | takkaria | 2008-07-13 19:58:52 +0100 (Sun, 13 Jul 2008) | 2 lines lpuise the comment states. ........ r4638 | takkaria | 2008-07-13 20:04:31 +0100 (Sun, 13 Jul 2008) | 2 lines Switch to setting hubbub_string::len to 0 instead of hubbub_string::ptr to NULL to indicate an empty buffer, as it was previously. ........ r4639 | takkaria | 2008-07-13 21:02:11 +0100 (Sun, 13 Jul 2008) | 2 lines "lpu up" about half of the DOCTYPE handling stages. ........ r4640 | takkaria | 2008-07-13 21:23:00 +0100 (Sun, 13 Jul 2008) | 2 lines Finish off LPUing the doctype modes. ........ r4641 | takkaria | 2008-07-13 21:37:33 +0100 (Sun, 13 Jul 2008) | 2 lines The tokeniser uses lpu apart from the entity matcher, now. ........ r4643 | takkaria | 2008-07-14 01:20:36 +0100 (Mon, 14 Jul 2008) | 2 lines Fix up the character reference matching stuff--still not properly dealt with, but compiles futher. ........ r4644 | takkaria | 2008-07-14 01:24:49 +0100 (Mon, 14 Jul 2008) | 2 lines Get the tokeniser compiling in its LPU'd form. ........ r4645 | takkaria | 2008-07-14 01:26:34 +0100 (Mon, 14 Jul 2008) | 2 lines Remember to advance the stream position after emitting tokens. ........ r4646 | takkaria | 2008-07-14 01:34:36 +0100 (Mon, 14 Jul 2008) | 2 lines Nuke the src/input directory and start work on the treebuilder. ........ r4647 | takkaria | 2008-07-14 01:56:27 +0100 (Mon, 14 Jul 2008) | 2 lines Get hubbub building in its LPU'd form. ........ r4648 | takkaria | 2008-07-14 02:41:03 +0100 (Mon, 14 Jul 2008) | 2 lines Get the tokeniser2 testrunner working. ........ r4649 | takkaria | 2008-07-14 02:48:55 +0100 (Mon, 14 Jul 2008) | 2 lines Fix test LDFLAGS so things link properly. ........ r4650 | takkaria | 2008-07-14 16:25:51 +0100 (Mon, 14 Jul 2008) | 2 lines Get testcases compiling, remove ones now covered by libparserutils. ........ r4651 | takkaria | 2008-07-14 16:37:09 +0100 (Mon, 14 Jul 2008) | 2 lines Remove more tests covered by libpu. ........ r4652 | takkaria | 2008-07-14 17:53:18 +0100 (Mon, 14 Jul 2008) | 2 lines Fix up the tokeniser a bit. ........ r4653 | takkaria | 2008-07-14 19:02:15 +0100 (Mon, 14 Jul 2008) | 3 lines - Remove the buffer_handler stuff from hubbub - Add the basics of a buffer for attribute values and text. ........ r4654 | takkaria | 2008-07-14 20:00:45 +0100 (Mon, 14 Jul 2008) | 2 lines Get character references working in attribute values, start trying to make them work in character tokens. ........ r4656 | takkaria | 2008-07-14 23:28:52 +0100 (Mon, 14 Jul 2008) | 2 lines Get entities working a bit better. ........ r4657 | takkaria | 2008-07-14 23:37:16 +0100 (Mon, 14 Jul 2008) | 2 lines Get entities working properly. (!) ........ r4658 | takkaria | 2008-07-14 23:56:10 +0100 (Mon, 14 Jul 2008) | 2 lines Make doctypes work a bit better. ........ r4659 | takkaria | 2008-07-15 00:18:49 +0100 (Tue, 15 Jul 2008) | 2 lines Get DOCTYPEs working. ........ r4660 | takkaria | 2008-07-15 00:26:36 +0100 (Tue, 15 Jul 2008) | 2 lines Fix CDATA sections. ........ r4661 | takkaria | 2008-07-15 01:01:16 +0100 (Tue, 15 Jul 2008) | 2 lines Get comments working again. ........ r4662 | takkaria | 2008-07-15 01:14:19 +0100 (Tue, 15 Jul 2008) | 2 lines Fix EOF in "after attribute name" state. ........ r4664 | takkaria | 2008-07-15 01:30:27 +0100 (Tue, 15 Jul 2008) | 2 lines Put the tests in better order, remove one now superceded with libpu. ........ r4665 | takkaria | 2008-07-15 01:46:29 +0100 (Tue, 15 Jul 2008) | 2 lines Remove a lot of now-redunant clearings of the current stream offset. ........ r4667 | jmb | 2008-07-15 11:56:54 +0100 (Tue, 15 Jul 2008) | 2 lines Completely purge charset stuff from hubbub. Parserutils handles this now. ........ r4677 | takkaria | 2008-07-15 21:03:42 +0100 (Tue, 15 Jul 2008) | 2 lines Get more tests passing, handle NUL bytes in data state. ........ r4694 | takkaria | 2008-07-18 17:55:44 +0100 (Fri, 18 Jul 2008) | 3 lines - Handle CRs correctly in some token states. - Handle NULs correctly in the CDATA state. ........ r4706 | takkaria | 2008-07-19 14:58:48 +0100 (Sat, 19 Jul 2008) | 2 lines Improve the tokeniser2 output a bit. ........ r4721 | takkaria | 2008-07-21 20:57:29 +0100 (Mon, 21 Jul 2008) | 2 lines Get a better framework in place to allow switching to using a buffer mid-collect. This fails a couple of testcases and doesn't implement proper CR or NUL support yet. ........ r4725 | takkaria | 2008-07-23 17:20:07 +0100 (Wed, 23 Jul 2008) | 2 lines Make comment tokens in tokeniser2 display both expected and actual output. ........ r4726 | takkaria | 2008-07-23 19:10:23 +0100 (Wed, 23 Jul 2008) | 4 lines - Add FINISH() macro which stops using buffered character collection. - Make the encoding U+FFFD in UTF-8 a global varabile, for sanity - Make the bogus comment state deal with NULs correctly. ........ r4730 | takkaria | 2008-07-24 00:35:16 +0100 (Thu, 24 Jul 2008) | 2 lines Try to get NUL bytes handled as the spec says. ........ r4731 | takkaria | 2008-07-24 00:40:59 +0100 (Thu, 24 Jul 2008) | 2 lines Get CRs working in the data state. ........ r4732 | takkaria | 2008-07-24 00:47:45 +0100 (Thu, 24 Jul 2008) | 2 lines Set force-quirks correctly when failing to match PUBLIC or SYSTEM in DOCTYPEs. ........ r4773 | takkaria | 2008-07-28 15:34:41 +0100 (Mon, 28 Jul 2008) | 2 lines Fix up the tokeniser, finally. ........ r4801 | takkaria | 2008-07-29 15:59:31 +0100 (Tue, 29 Jul 2008) | 2 lines Refactor macros a bit. ........ r4802 | takkaria | 2008-07-29 16:04:17 +0100 (Tue, 29 Jul 2008) | 2 lines Do s/HUBBUB_TOKENISER_STATE_/STATE_/, for shorter line lengths. ........ r4805 | takkaria | 2008-07-29 16:58:37 +0100 (Tue, 29 Jul 2008) | 4 lines Start cleaning up the hubbub tokeniser; - refactor to use new inline emit_character_token() and emit_current_tag() functions; makes code clearer - check EOF before using the CHAR() macro, so eventually it can be removed. ........ r4806 | takkaria | 2008-07-29 17:45:36 +0100 (Tue, 29 Jul 2008) | 2 lines More cleanup like the previous commit. ........ r4807 | takkaria | 2008-07-29 19:48:44 +0100 (Tue, 29 Jul 2008) | 2 lines Rewrite comment-handling code to be just the one function, whilst updating it to handle CRs and NULs properly. (All comments now always use the buffer.) ........ r4820 | takkaria | 2008-07-30 14:14:49 +0100 (Wed, 30 Jul 2008) | 2 lines Finish off the first sweep of cleaning up and refactoring the tokeniser. ........ r4821 | takkaria | 2008-07-30 15:12:22 +0100 (Wed, 30 Jul 2008) | 2 lines Add copyright statement. ........ r4822 | takkaria | 2008-07-30 17:23:01 +0100 (Wed, 30 Jul 2008) | 2 lines Apply changes made to tokeniser2 to tokeniser3. ........ r4829 | takkaria | 2008-07-31 01:59:07 +0100 (Thu, 31 Jul 2008) | 4 lines - Make the tokeniser save everything into the buffer, at least for now. - Fix logic errors introduced in refactoring - Avoid emitting more tokens than we have to (e.g. instead of emitting "<>" and switching back to the data state, just switch back to the data state and let it take care of it) ........ r4830 | takkaria | 2008-07-31 02:03:08 +0100 (Thu, 31 Jul 2008) | 2 lines Small treebuilder <isindex> fix. ........ r4831 | takkaria | 2008-07-31 02:32:29 +0100 (Thu, 31 Jul 2008) | 2 lines Stop holding on to pointers to character data across treebuilder calls. ........ r4832 | takkaria | 2008-07-31 02:45:09 +0100 (Thu, 31 Jul 2008) | 18 lines Merge revisions 4620-4831 from trunk hubbub to libinputstream hubbub, modulo one change to test/Makefile which makes the linker choke when linking tests. ------------------------------------------------------------------------ r4666 | jmb | 2008-07-15 11:52:13 +0100 (Tue, 15 Jul 2008) | 3 lines Make tree2 perform reference counting. Fix bits of the treebuilder to perform reference counting correctly in the face of *result not pointing to the same object as the node passed in to the treebuilder client callbacks. ------------------------------------------------------------------------ r4668 | jmb | 2008-07-15 12:37:30 +0100 (Tue, 15 Jul 2008) | 2 lines Fully document treebuilder callbacks. ------------------------------------------------------------------------ r4675 | takkaria | 2008-07-15 21:01:03 +0100 (Tue, 15 Jul 2008) | 2 lines Fix memory leak in tokeniser2. ------------------------------------------------------------------------ ........ r4834 | jmb | 2008-07-31 09:57:51 +0100 (Thu, 31 Jul 2008) | 2 lines Fix infinite loop in charset detector ........ r4835 | jmb | 2008-07-31 13:01:24 +0100 (Thu, 31 Jul 2008) | 2 lines Actually store namespaces on formatting list. Otherwise we read uninitialised memory. Add some semblance of filling allocations with junk to myrealloc(). ........ r4836 | jmb | 2008-07-31 13:06:07 +0100 (Thu, 31 Jul 2008) | 2 lines Lose debug again ........ r4837 | jmb | 2008-07-31 15:09:19 +0100 (Thu, 31 Jul 2008) | 2 lines Lose obsolete testdata (this is now part of lpu) ........ svn path=/trunk/hubbub/; revision=4839
* Export a hubbub_doctype type to create_doctype() directly, rather than ↵Andrew Sidwell2008-07-111-2/+2
| | | | | | passing all its members as individual arguments. svn path=/trunk/hubbub/; revision=4602
* Add an explict null namespace to hubbub_ns.Andrew Sidwell2008-07-091-0/+1
| | | | svn path=/trunk/hubbub/; revision=4550
* Add namespaces to attributes, too.Andrew Sidwell2008-06-261-12/+13
| | | | svn path=/trunk/hubbub/; revision=4453
* Add the basics of namespace support.Andrew Sidwell2008-06-261-0/+13
| | | | svn path=/trunk/hubbub/; revision=4452
* Commit the relevant header files for r4354-r4356.Andrew Sidwell2008-06-161-1/+9
| | | | svn path=/trunk/hubbub/; revision=4357
* Implement "in body" insertion mode.John Mark Bell2008-04-072-7/+34
| | | | | | | Modify treebuilder test driver to bring it in line with API changes. A few minimal bits of testdata for various bits of in body. Proper testing will come once we're actually building a tree. svn path=/trunk/hubbub/; revision=4076
* hubbub_strings may now be either an offset into the data buffer or a pointer ↵John Mark Bell2008-03-211-1/+10
| | | | | | | | | | | | | to constant data. Fix up tokeniser and treebuilder to deal with this. Fix up testcases, too. The tokeniser will only ever emit strings of type HUBBUB_STRING_OFF. Anything else is a bug which should be fixed. The treebuilder may emit strings of either type. svn path=/trunk/hubbub/; revision=4014
* More treebuilder (really 8.2.4.8 this time)John Mark Bell2008-03-112-0/+7
| | | | | | Add tree handler entrypoint for creating elements with verbatim names svn path=/trunk/hubbub/; revision=3940
* More treebuilder (8.2.4.8)John Mark Bell2008-03-113-2/+11
| | | | | | | | | Make tree_handler a pointer rather than value. Check for tree_handler's presence in hubbub_treebuilder_token_handler rather than scattering checks all over the treebuilder code. Add test driver (doesn't actually build a tree but will exercise the core code correctly and verify that the treebuilder code releases all the node references it gains) Enhance quirks mode reporting to distinguish between standards, limited, and full quirks modes. svn path=/trunk/hubbub/; revision=3939
* More treebuilder (up to 8.2.4.7)John Mark Bell2008-03-113-5/+14
| | | | | | | | Loads of issues still outstanding, including a distinct lack of error handling Change tree handler API to allow (de)referencing of nodes rather than explicit destruction. Change create_element handler to take an entire hubbub_tag rather than just the tag name -- the DOM binding can deal with the issue of attaching attributes to the created element node. svn path=/trunk/hubbub/; revision=3932
* Beginnings of a tree builder.John Mark Bell2008-03-073-2/+96
| | | | | | Distinct lack of any real functionality beyond creation/destruction & option setting. svn path=/trunk/hubbub/; revision=3894
* Import hubbub -- an HTML parsing library.John Mark Bell2007-06-235-0/+270
Plenty of work still to do (like tree generation ;) svn path=/trunk/hubbub/; revision=3359