summaryrefslogtreecommitdiff
path: root/src/libnsgif.c
Commit message (Collapse)AuthorAgeFilesLines
* LIB: Rename public header and source file.Michael Drake2022-02-231-1460/+0
| | | | | This allows the library header to have a name consistent with the library soname.
* LIB: Use consistent nsgif_ / NSGIF_ namespace.Michael Drake2022-02-231-320/+319
|
* GIF: Rename flag for whether the frame image has been decoded.Michael Drake2022-02-211-3/+3
|
* gif: Fix graphic control extension handler comment.Michael Drake2022-02-211-1/+1
|
* gif: Frame parse: Only grow image to accommodate first frame.Michael Drake2022-02-211-3/+9
| | | | This matches the behaviour of other browsers.
* gif: Background restoration: Add support for clipped frames.Michael Drake2022-02-211-1/+4
|
* gif: decode: Add support for clipped frames.Michael Drake2022-02-211-4/+75
|
* GIF: Reorder #includes.Michael Drake2021-11-241-4/+4
|
* LOG: Remove unused logging code.Michael Drake2021-11-241-3/+1
|
* API: Use stdint types.Michael Drake2021-11-241-1/+1
|
* GIF: Style: Use double underscore for remaining static functions.Michael Drake2021-11-241-8/+14
|
* GIF: Update code documentation.Michael Drake2021-11-241-7/+14
|
* GIF: Delay LZW creation until frame proccessing.Michael Drake2021-11-201-8/+8
|
* GIF: Simplify check for no frame data.Michael Drake2021-11-201-9/+4
|
* GIF: Split out logical screen descriptor extraction.Michael Drake2021-11-191-27/+46
|
* GIF: Remove double blank lines.Michael Drake2021-11-191-3/+0
|
* GIF: Initilisation: Make frame initialisation loop more readable.Michael Drake2021-11-191-1/+3
|
* GIF: Unify insufficient data error codes.Michael Drake2021-11-191-33/+15
| | | | | | | | | | | | | | | There is no difference in what the client needs to do. If there are displayable frames, they can display them. Otherwise more data is needed. Internally only `GIF_INSUFFICIENT_DATA` is used now. To remove the `GIF_INSUFFICIENT_FRAME_DATA` could make existing client applications fail to compile, so it is left as an alias to the same value. At some point the API will be changed drastically, but for now I want existing applications to still build.
* GIF: Use common colour table code for global colour table.Michael Drake2021-11-191-22/+12
|
* GIF: Split out colour table extraction.Michael Drake2021-11-191-21/+48
| | | | | This will allow the local and global colour table extraction to share code.
* GIF: Split header reader out into helper function.Michael Drake2021-11-191-16/+44
|
* Add myself to the copyright headers.Michael Drake2021-11-191-0/+1
|
* GIF: Initialisation: Remove misleading comment.Michael Drake2021-11-181-1/+0
|
* GIF: Optimise deinterlacing of frame rows.Michael Drake2021-11-081-23/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original interlacing handling looked like this: ```c static inline uint32_t f1(int height, int y) { if ((y << 3) < height) { return (y << 3); /* Pass 1 */ } y -= ((height + 7) >> 3); if ((y << 3) < (height - 4)) { return (y << 3) + 4; /* Pass 2 */ } y -= ((height + 3) >> 3); if ((y << 2) < (height - 2)) { return (y << 2) + 2; /* Pass 3 */ } y -= ((height + 1) >> 2); return (y << 1) + 1; /* Pass 4 */ } ``` Annotated with comments to show the passes. Thought it should be possible to do the job without all the calculations and subtractions from `y`, and I also didn't like that it was not super obvious how it aligned with the spec or how it dealt with pass 4 last, when it is the most common case, happening for 50% of image rows. See the following excerpt from from the GIF89a Specification. > The rows of an Interlaced images are arranged in the following order: > > Group 1 : Every 8th. row, starting with row 0. (Pass 1) > Group 2 : Every 8th. row, starting with row 4. (Pass 2) > Group 3 : Every 4th. row, starting with row 2. (Pass 3) > Group 4 : Every 2nd. row, starting with row 1. (Pass 4) > > The Following example illustrates how the rows of an interlaced image are > ordered. > > Row Number Interlace Pass > > 0 ----------------------------------------- 1 > 1 ----------------------------------------- 4 > 2 ----------------------------------------- 3 > 3 ----------------------------------------- 4 > 4 ----------------------------------------- 2 > 5 ----------------------------------------- 4 > 6 ----------------------------------------- 3 > 7 ----------------------------------------- 4 > 8 ----------------------------------------- 1 > 9 ----------------------------------------- 4 > 10 ----------------------------------------- 3 > 11 ----------------------------------------- 4 > 12 ----------------------------------------- 2 > 13 ----------------------------------------- 4 > 14 ----------------------------------------- 3 > 15 ----------------------------------------- 4 > 16 ----------------------------------------- 1 > 17 ----------------------------------------- 4 > 18 ----------------------------------------- 3 > 19 ----------------------------------------- 4 -- https://www.w3.org/Graphics/GIF/spec-gif89a.txt To test performance I ran it for all possible GIF heights: ```c for (unsigned height = 1; height < UINT16_MAX; height++) { for (uint32_t y = 0; y < height; y++) { use_val(f1(height, y)); } } ``` For the original implementation (`f1`), I got the following performance: | Compiler | Time | | -------- | ------ | | Clang | 3.830s | | GCC | 4.737s | All tests used GCC 11.2 and Clang 13.0, with `-O2` optimisation. Next I rewrote it to more closely resemble my reading of the spec. The compiler can convert the multiplies to shifts and I found this more readable than `f1`, and more closely matching the spec. The `height / N * N` bits all use powers of two for N so they boil down the a simple bitwise AND with an appropriate mask to round down to the nearest multiple of N. ```c static inline uint32_t f2(uint32_t height, uint32_t y) { height--; if (y * 8 <= height) { return y * 8; /* Pass 1 */ } if (y * 4 <= height) { return y * 8 - height / 8 * 8 - 4; /* Pass 2 */ } if (y * 2 <= height) { return y * 4 - height / 4 * 4 - 2; /* Pass 3 */ } return y * 2 - height / 2 * 2 - 1; /* Pass 4 */ } ``` For `f2`, I got the following performance: | Compiler | Time | | -------- | ------ | | Clang | 2.758s | | GCC | 3.909s | So it's faster than `f1`, and Clang is still doing better than GCC. After that I reversed the order, so that pass 4 would be handled first. ```c static inline uint32_t f3(uint32_t height, uint32_t y) { height--; if (y * 2 > height) { return y * 2 - height / 2 * 2 - 1; } if (y * 4 > height) { return y * 4 - height / 4 * 4 - 2; } if (y * 8 > height) { return y * 8 - height / 8 * 8 - 4; } return y * 8; } ``` For `f3`, I got the following performance: | Compiler | Time | | -------- | ------ | | Clang | 3.329s | | GCC | 3.630s | Interestingly this was consistently faster than `f2` with GCC, but slower than `f2` when built with Clang. (It's still faster than the original `f1` with both compilers.) I thought it might be because the branches in `f2` are more predictable than in `f3`. For example the first branch in `f2` is only true one eighth of the time, while all the branches in `f3` are 50:50. However, running under `perf` showed this was not the case. Actually the faster `f2` for Clang had more branch misses than the Clang `f3`. | Compiler | Implementation | Branches | Branch misses | | -------- | -------------- | -------------- | ------------- | | Clang | `f2` | 13,690,378,148 | 656,517 | | Clang | `f3` | 10,738,688,711 | 275,147 | | GCC | `f2` | 10,738,659,995 | 336,992 | | GCC | `f3` | 10,201,842,339 | 1,314,445 | After this I tried changing the implementation completely. All of the previous implementations were based on the idea of mapping from an uninterlaced row number to an interlaced one. ```c static inline bool f4(uint32_t height, uint32_t *y, uint8_t *pass) { *y += *pass == 0 ? 8 : 16 >> *pass; if (*y < height) return true; switch (*pass) { case 0: *pass += 1; *y = 4; if (*y < height) return true; case 1: *pass += 1; *y = 2; if (*y < height) return true; case 2: *pass += 1; *y = 1; return true; } return false; } ``` With `f4` we simply step to the next line in interlaced order. The `16 >> *pass` generates the correct step size for all but the first pass, so the first pass is special cased. The switch statement is for handling advancing from one pass to the next, so it is reached a maximum of four times per frame. The rest of the time, we return on the `if (*y < height) return true;` above. The switch falls through to handle the case where the frame height is less than the start row for the given pass. For `f4`, I got the following performance: | Compiler | Time | | -------- | ------ | | Clang | 2.315s | | GCC | 3.479s | Again, this was tested for all possible GIF heights, but the `for` loop for rows changes to a do-while loop: ```c for (unsigned height = 1; height < UINT16_MAX; height++) { uint8_t pass = 0; uint32_t y = 0; do { use_val(y); } while (f4(height, &y, &pass)); } ``` This is the fastest yet for both compilers, with Clang still performing best. One thing I didn't like about `f4` is that it calculates the y step size every call, although the step only actually changes three times, when we change to the next pass. So for `f5` the switch was updated to set the step size. ```c static inline bool f5(uint32_t height, uint32_t *y, uint8_t *step) { *y += *step & 0xf; if (*y < height) return true; switch (*step) { case 24: *y = 4; *step = 8; if (*y < height) return true; case 8: *y = 2; *step = 4; if (*y < height) return true; case 4: *y = 1; *step = 2; return true; } return false; } ``` To avoid the caller having to maintain extra state, the pass number is no longer used, and we use the step size to keep track of which pass we are on. However, the step size for the first two passes is the same, so that led to the strange use of 24 for the initial step size, and the masking of step to ensure only the bottom four bits are used. For `f5`, I got the following performance: | Compiler | Time | | -------- | ------ | | Clang | 2.364s | | GCC | 2.891s | This is the fastest yet for both compilers, with Clang still winning easily. Note that when the functions are not allowed to inline, `f2` and `f3` are the fastest.
* GIF: Constify raw source data.Michael Drake2021-11-041-14/+14
|
* GIF: Don't write into the GIF source data to handle errors.Michael Drake2021-11-041-18/+6
|
* GIF: Use same function for frame initialisation and decoding.Michael Drake2021-11-041-83/+42
| | | | | Now there aren't two entirely separate code paths that must be kept in sync.
* GIF: Pass data position into the decoders.Michael Drake2021-11-041-69/+65
| | | | | This avoids storing the state on in the gif structure, and having to save and restore it when processing frames.
* GIF: Frame initialisation: Don't need to calculate gif_bytes.Michael Drake2021-11-041-5/+3
|
* GIF: Use frame rather than indexing frames array.Michael Drake2021-11-041-4/+4
|
* GIF: Remove redundant check and comment from frame initialiser.Michael Drake2021-11-041-14/+0
|
* GIF: Use image parsing frunction for frame initialisation.Michael Drake2021-11-041-59/+4
|
* GIF: Reorder functions so frame initialise can use image decoder.Michael Drake2021-11-041-526/+523
|
* GIF: Split out image parsing.Michael Drake2021-11-041-36/+103
|
* GIF: Bitmaps: Encapsulate bitmap handling.Michael Drake2021-11-031-78/+134
|
* GIF: Unify frame clearing functions.Michael Drake2021-11-031-42/+29
|
* GIF: Align disposal methods with spec.Michael Drake2021-11-031-13/+16
|
* GIF: Clear can't fail.Michael Drake2021-11-031-9/+4
|
* GIF: Helper to get the frame structure for given frame index.Michael Drake2021-11-031-35/+44
| | | | Removes weird special casing of first frame.
* GIF: Store transparancy index on the frame.Michael Drake2021-11-031-7/+2
| | | | If the frame has no transparency, it is set to an invalid value.
* GIF: Pass frame index into frame initialiser.Michael Drake2021-11-031-25/+23
|
* GIF: Move frame parameter extraction into decode wrapper.Michael Drake2021-11-031-22/+12
|
* GIF: Clear: Don't need to decode the previous frame metadata now.Michael Drake2021-11-021-48/+1
|
* GIF: Unify calculation of transparency index.Michael Drake2021-11-021-22/+23
|
* GIF: Change background colour handling.Michael Drake2021-11-021-13/+16
| | | | | | | | | | | We used to memset a colour we looked up in the colour table when clearing the previous frame. This didn't make sense, because the colour is four bytes wide. Now we write the actual colour in properly. Also, the background color comes from the global colour table (if one exists), rather than from the local colour table. If there is no colour table, black is used.
* GIF: Rename frame record/recover functions.Michael Drake2021-11-021-4/+4
| | | | Calling it "previous" frame doesn't really make sense.
* GIF: Decode: Split out bitmap wipe to transparent.Michael Drake2021-11-021-9/+16
|
* GIF: Decode: Clean up frame clearing/restoration.Michael Drake2021-11-021-23/+22
|
* GIF: Decode: Avoid indexing to frame structue.Michael Drake2021-11-021-21/+24
|
* GIF: Rename frame to frame_idx through decode function.Michael Drake2021-11-021-27/+27
|