diff options
| -rw-r--r-- | HACKING.txt | 561 | 
1 files changed, 561 insertions, 0 deletions
| diff --git a/HACKING.txt b/HACKING.txt new file mode 100644 index 0000000..fc4662a --- /dev/null +++ b/HACKING.txt @@ -0,0 +1,561 @@ +Hacking on rototiller / libtil: + + Introduction: + +    This document primarily attempts to describe how one goes about +  hacking on new rototiller modules. + +    Initially only a bare minimum module addition is described.  This +  is a single-threaded, unconfigurable at runtime, simple module, requiring +  only a single rendering function be implemented. + +    Later more advanced topics like threaded rendering and runtime +  configurability will be covered.  These are completely optional and can +  safely be ignored until such facilities are desired. + +    The creative process of developing a new module often starts with +  writing nothing more than a rendering function, later evolving to become +  more complex in pursuit of better performance via threaded rendering, and +  greater flexibility via runtime settings. + + + Getting started: + +    After acquiring a copy of the source, adding a new module to rototiller +  consists of four steps: + +  1. Giving the module a unique name, creating a directory as named under +     src/modules.  This can be a temporary working name just to get +     started, what's important is that it not conflcit with any existing +     module names. + +  2. Implementing at least a ${new}_render_fragment() method for the module +     in a file placed in its directory at "src/modules/${new}/${new}.c". + +  3. Integrating the module into the build system by adding its directory +     to the existing "configure.ac" and "src/modules/Makefile.am" files, +     and creating its own "src/modules/${new}/Makefile.am" file. + +  4. Binding the module into libtil exposing it to the world by adding it +     to the modules[] array in "src/til.c". + +    Most of these steps are self-explanatory after looking at the existing +  code/build-system files.  It's common to bootstrap a new module by copying +  a "Makefile.am" and "${new}.c" file from one of the existing modules. + +    There's also a "stub" branch provided in the git repository, adding a +  bare minimum module rendering a solid white canvas every frame.  This is +  intended for use as a clean slate for bootstrapping new modules, there's +  no harm in deriving new modules from either this "stub" branch, or +  existing modules. + + + Configuring and building the source: + +    Rototiller uses GNU Autotools for its build system.  Generally all +  that's required for building the source is the following sequence of +  shell commands: + +      $ ./bootstrap +      $ mkdir build; cd build; ../configure +      $ make + +    The source is all C, so a C compiler is required.  Autotools is also +  required for `bootstrap` to succeed in generating the configure script +  and Makefile templates, `pkg-config` is used by configure, and a `make` +  program for executing the build.  On Debian systems installing the +  "build-essential" meta package should get things at least building +  successfully. + +    To actually produce a `rototiller` binary usable for rendering visual +  output, libsdl2 and/or libdrm development packages will also be needed. +  Look at the `../configure` output for SDL and DRM lines to see which have +  been enabled.  If both report "no" then the build will only produce a +  libtil library for potential use in other frontends, with no rototiller +  binary for the included CLI frontend. + +    After successfully building rototiller with the CLI frontend, an +  executable will be at "src/rototiller" in the build tree.  If the steps +  above were followed verbatim, that would be at `build/src/rototiller`. + + + Quickly testing modules via the CLI frontend: + +     The included frontend supports both an interactive stdio-style setup +   and specifying those same settings via commandline arguments.  If run +   without any arguments, i.e. just running `build/src/rototiller`, a +   comprehensive interactive setup will be performed for determining both +   module and video settings. + +     Prior to actually proceeding with a given setup, the configured setup +   about to be used is always printed on stdout as valid commandline +   argument syntax.  This may be copied and reused for an automated +   non-interactive startup using those settings. + +     One can also partially specify any setup in the commandline arguments, +   resulting in an interactive setup for just the unspecified settings. +   When developing a new module it's common to specify the video settings, +   and just the module name under development, leaving the module's +   settings for interactive specification during the experimentation +   process.  i.e.: + +      $ build/src/rototiller --module=newmodule --video=sdl,fullscreen=off,size=640x480 + +     This way, if "newmodule" implements settings, only those unspecified +   will be asked for interactively. + + + The render function, a bare minimum module: + +    All rendering in rototiller is performed using the CPU, in 24-bit "True +  color" pixel format, with 32-bits/4-bytes of space used per pixel. + +    The surface for rendering into is described using a display system +  agnostic "framebuffer fragment" structure type named +  "til_fb_fragment_t", defined in "src/til_fb.h" as: + +      typedef struct til_fb_fragment_t { +        uint32_t *buf;           /* pointer to the first pixel in the fragment */ +        unsigned x, y;           /* absolute coordinates of the upper left corner of this fragment */ +        unsigned width, height;  /* width and height of this fragment */ +        unsigned frame_width;    /* width of the frame this fragment is part of */ +        unsigned frame_height;   /* height of the frame this fragment is part of */ +        unsigned stride;         /* number of bytes from the end of one row to the start of the next */ +        unsigned pitch;          /* number of bytes separating y from y + 1, including any padding */ +        unsigned number;         /* this fragment's number as produced by fragmenting */ +        unsigned zeroed:1;       /* if this fragment has been zeroed since last flip */ +      } til_fb_fragment_t; + +    For most modules these members are simply used as provided, and +  there's no need to manipulate them.  For simple non-threaded cases only +  the "buf" and "frame_{width,height}" members are required, with "stride" +  or "pitch" becoming important for algorithms directly manipulating buf's +  memory contents to properly address rows of pixels since fragments may +  be discontiguous in buf at row boundaries for a variety of reasons. + +    When using threaded rendering, the "{width,height}" members become +  important as they describe a fragment's dimensions within the frame being +  rendered. + +    The module_render() function prototype is declared within the +  "til_module_t" struct in "src/til.h" as: + +    void (*render_fragment)(void *context, unsigned ticks, unsigned cpu, til_fb_fragment_t *fragment); + +    Every module must provide a "til_module_t" instance having at least this +  "render_fragment" member initialized to its rendering function.  This is +  typically done using a global instance named with the module's prefix. + +    None of the other function pointer members in "til_module_t" are +  required, and the convention is to use designated initialization in +  assigning a module's "til_module_t" members ensuring zero-initialization +  of omitted members, i.e.: + +      static void minimal_render_fragment(void *context, unsigned ticks, unsigned cpu, til_fb_fragment_t *fragment) +      { +        /* render into fragment->buf */ +      } + +      til_module_t minimal_module = { +              .render_fragment = minimal_render_fragment, +              .name = "minimal", +              .description = "Minimal example module", +      } + +    Note that the render_fragment() prototype has additional arguments than +  just the "til_fb_fragment_t *fragment": + +    void *context: + +      For modules implementing a create_context() function, this will be +      the pointer returned by that function.  Intended for modules that +      require state persisted across frames rendered. + +    unsigned ticks: + +      A convenient time-like counter the frontend advances during +      operation.  Instead of calling some kind of time function in every +      module which may become costly, "ticks" may be used to represent +      time. + +    unsigned cpu: + +      An identifier representing which logical CPU # the render function is +      executing on.  This isn't interesting for simple single-threaded +      modules, but when implementing more advanced threaded renderers this +      may be useful for indexing per-cpu resources to avoid contention. + +    For simple modules these can all be safely ignored, "ticks" does tend +  to be useful for even simple modules however. + +    Rendering functions shouldn't make assumptions about the contents of +  "fragment->buf", in part because rototiller will always use multiple +  buffers for rendering which may be recycled in any order.  Additionally, +  it's possible a given fragment will be further manipulated in composited +  scenarios.  Consequently it's important that every render_fragment() +  function fully render the region described by the fragment. + +    There tends to be two classes of rendering algorithms; those that +  always produce a substantial color for every pixel available in the +  output, and those producing more sparse output resembling an overlay. + +    In the latter case it's common to require bulk-clearing the fragment +  before the algorithm draws its sparse overlay-like contents onto the +  canvas.  To facilitate potential compositing of such modules, the +  "til_fb_fragment_t" structure contains a "zeroed" member used to indicate +  if a given fragment's buf contents have been fully initialized yet for +  the current frame.  When "zeroed" is already set, the bulk clearing +  operation should be skipped, allowing the existing contents to serve as +  the logically blank canvas. + +    A convenience helper for such modules is provided named +  til_fb_fragment_zero().  Simply call this at the start of the +  render_fragment() function, and the conditional zeroed vs. non-zeroed +  details will be handled automatically.  Otherwise see the implementation +  in "src/til_fb.h" to see what's appropriate.  To clarify, modules +  implementing algorithms that naturally always write every pixel in the +  fragment may completely ignore this aspect, and need not set the zeroed +  member; that's handled automatically. + + + Stateful rendering: + +    It's common to require some state persisting from one frame to the +  next.  Achieving this is a simple matter of providing create_context() +  and destroy_context() functions when initializing til_module_t, i.e.: + +      typedef struct minimal_context_t { +        int stateful_variables; +      } minimal_context_t; + +      static void * minimal_create_context(unsigned ticks, unsigned n_cpus, void *setup) +      { +        /* this can include more elaborate initialization of minimal_context_t as needed */ +        return calloc(1, sizeof(minimal_context_t)); +      } + +      static void minimal_destroy_context(void *context) +      { +        free(context); +      } + +      static void minimal_render_fragment(void *context, unsigned ticks, unsigned cpu, til_fb_fragment_t *fragment) +      { +        minimal_context_t *ctxt = context; + +        /* render into fragment->buf utilizing/updating ctxt->stateful_variables */ +      } + +      til_module_t minimal_module = { +              .create_context = minimal_create_context, +              .destroy_context = minimal_destroy_context, +              .render_fragment = minimal_render_fragment, +              .name = "minimal", +              .description = "Minimal example module", +      } + +    Note that the create_context() function prototype includes some +  arguments: + +    unsigned ticks: + +        Same as render_fragment; a time-like counter.  This is provided to +      the create_context() function in the event that some ticks-derived +      state must be initialized continuously with the ticks value +      subsequently passed to render_fragment(). +      This is often ignored. + +    unsigned n_cpus: + +        This is the number of logical CPUs rototiller is running atop, +      which is potentially relevant for threaded renderers.  The "unsigned +      cpu" parameter supplied to render_fragment() will always be < this +      n_cpus value, and the two are intended to complement eachother.  When +      creating the context, one may allocate per-cpu cache-aligned in +      n_cpus quantity.  Then the render_fragment() function would address +      the per-cpu space using the cpu parameter as an index into the n_cpus +      sized allocation. +      This is often ignored. + +    void *setup: + +        For modules implementing runtime-configuration by providing a +      setup() function in their til_module_t initializer, this will contain +      the pointer returned in res_setup by their setup() function. +      Unless implementing runtime configuration, this would be ignored. + +    As mentioned above in describing the rendering function, this is +  entirely optional.  One may create 100% valid modules implementing only +  the render_fragment(). + + + Runtime-configurable rendering: + +    For myriad reasons ranging from debugging and creative experimentation, +  to aesthetic variety, it's important to support runtime configuration of +  modules. + +    Everything configurable that is potentially interesting to a viewer is +  best exposed via runtime settings, as opposed to hidden behind +  compile-time constants like #defines or magic numbers in the source. + +    It's implied that when adding runtime configuration to a module, it +  will also involve stateful rendering as described in the previous +  section.  This isn't absolutely required, but without an allocated +  context to apply the runtime-configuration to, the configuration will be +  applied in some global fashion.  Any modules to be merged upstream +  shouldn't apply their configuration globally if at all avoidable. + +    Adding runtime configuration requires implementing a setup() function +  for a given module.  This setup() function is then provided when +  initializing til_module_t.  Building upon the previous minimal example +  from stateful rendering: + +      typedef struct minimal_setup_t { +        int foobar; +      } minimal_setup_t; + +      typedef struct minimal_context_t { +        int stateful_variables; +      } minimal_context_t; + +      static void * minimal_create_context(unsigned ticks, unsigned n_cpus, void *setup) +      { +        minimal_context_t *ctxt; + +        ctxt = calloc(1, sizeof(minimal_context_t)); +        if (!ctxt) +          return NULL; + +        ctxt->stateful_variables = ((minimal_setup_t *)setup)->foobar; + +        return ctxt; +      } + +      static void minimal_destroy_context(void *context) +      { +        free(context); +      } + +      static void minimal_render_fragment(void *context, unsigned ticks, unsigned cpu, til_fb_fragment_t *fragment) +      { +        minimal_context_t *ctxt = context; + +        /* render into fragment->buf utilizing/updating ctxt->stateful_variables */ +      } + +      static int minimal_setup(const til_settings_t *settings, til_setting_t **res_setting, const til_setting_desc_t **res_desc, void **res_setup) +      { +        const char  *values[] = { +              "off", +              "on", +              NULL +            }; +        const char  *foobar; +        int         r; + +        r = til_settings_get_and_describe_value(settings, +                  &(til_setting_desc_t){ +                    .name = "Foobar configurable setting", +                    .key = "foobar", +                    .regex = "^(off|on)", +                    .preferred = values[0], +                    .values = values, +                    .annotations = NULL +                  }, +                  &foobar, +                  res_setting, +                  res_desc); +        if (r) +          return r; + +        if (res_setup) { +          minimal_setup_t  *setup; + +          setup = calloc(1, sizeof(*setup)); +          if (!setup) +            return -ENOMEM; + +          if (!strcasecmp(foobar, "on")) +            setup->foobar = 1; + +          *res_setup = setup; +        } + +        return 0; +      } + +      til_module_t minimal_module = { +              .create_context = minimal_create_context, +              .destroy_context = minimal_destroy_context, +              .render_fragment = minimal_render_fragment, +              .setup = minimal_setup, +              .name = "minimal", +              .description = "Minimal example module", +      } + + +    In the above example a the minimal module now has a "foobar" boolean +  style setting supporting the values "on" and "off".  It may be specified +  at runtime to rototiller (or any other frontend) via the commandline +  argument: + +      "--module=minimal,foobar=true" + +    And if the "foobar=true" setting were omitted from the commandline, in +  rototiller's CLI frontend an interactive setup dialog would occur, i.e: + +      Foobar configurable setting: +       0: off +       1:  on +      Enter a value 0-1 [0 (off)]: + +    Much can be said on the subject of settings, this introduction should +  be enough to get started.  Use the existing modules as a reference on how +  to implement sttings.  The sparkler modules in particular has one of the +  more complicated setup() functions involving dependencies where some +  settings become expected and described only if others are enabled. + +    None of the frontends currently enforce the regex, but it's best to +  always populate it with something valid as enforcement will become +  implemented at some point in the future.  Module authors should be able +  to largely assume the input is valid at least in terms of passing the +  regex. + +    Note how the minimal_setup_t instance returned by setup() in res_setup +  is subsequently supplied to minimal_create_context() in its setup +  parameter.  In the previous Stateful rendering example, this setup +  parameter was ignored as it would always be NULL lacking any setup() +  function.  But here we use it to retrieve the "foobar" value wired up by +  the minimal_setup() function supplied for minimal_module.setup. + + + Threaded rendering: + +    Rototiller deliberately abstains from utilizing any GPU hardware- +  acceleration for rendering.  Instead, all rendering is done using the CPU +  programmed simply in C, without incurring a bunch of GPU API complexity +  like OpenGL/Direct3D or any need manage GPU resources. + +    Modern systems tend to have multiple CPU cores, enabling parallel +  execution similar to how GPUs employ multiple execution units for +  parallel rendering of pixels.  With some care and little effort +  rototiller modules may exploit these additional CPU resources. + +    Rototiller takes care of the low-level minutia surrounding creating +  threads and efficiently scheduling rendering across them for every frame. +  The way modules integrate into this threaded rendering machinery is by +  implementing a prepare_frame() function that gets called at the start of +  every frame in a single-threaded fashion, followed by parallel execution +  of the module's render_fragment() function from potentially many threads. + +    The prepare_frame() function prototype is declared within the +  "til_module_t" struct in "src/til.h" as: + +      void (*prepare_frame)(void *context, unsigned ticks, unsigned n_cpus, til_fb_fragment_t *fragment, til_fragmenter_t *res_fragmenter); + +    The context, ticks, n_cpus, and fragment parameters here are +  semantically identical to their use in the other til_module_t +  functions explained previously in this document. + +    What's special here is the res_fragmenter parameter.  This is where +  your module is expected to provide a fragmenter function rototiller will +  call repeatedly while splitting up the frame's fragment being rendered +  into smaller subfragments for passing to the module's render_fragment() +  in place of the larger frame's fragment. + +    This effectively gives modules control over the order, quantity, size, +  and shape, of individually rendered subfragments.  Logically speaking, +  one may view the fragments described by the fragmenter function returned +  in res_fragmenter as the potentially parallel units of work dispatched to +  the rendering threads. + +    The fragmenter function's prototype is declared in the +  "til_fragmenter_t" typedef, also in "src/til.h": + +      typedef int (*til_fragmenter_t)(void *context, const til_fb_fragment_t *fragment, unsigned number, til_fb_fragment_t *res_fragment); + +    While rototiller renders a frame using the provided fragmenter, it +  repeatedly calls the fragmenter with an increasing number parameter until +  the fragmenter returns 0.  The fragmenter is expected to return 1 when it +  describes another fragment for the supplied number in *res_fragment.  For +  a given frame being rendered this way, the context and fragment +  parameters will be uniform throughout the frame.  The produced fragment +  in *res_fragment is expected to describe a subset of the provided +  fragment. + +    Some rudimentary fragmenting helpers have been provided in +  "src/til_fb.[ch]": + +      int til_fb_fragment_slice_single(const til_fb_fragment_t *fragment, unsigned n_fragments, unsigned num, til_fb_fragment_t *res_fragment); +      int til_fb_fragment_tile_single(const til_fb_fragment_t *fragment, unsigned tile_size, unsigned num, til_fb_fragment_t *res_fragment); + +    It's common for threaded modules to simply call one of these in their +  fragmenter function, i.e. in the "ray" module: + +      static int ray_fragmenter(void *context, const til_fb_fragment_t *fragment, unsigned number, til_fb_fragment_t *res_fragment) +      { +        return til_fb_fragment_tile_single(fragment, 64, number, res_fragment); +      } + +    This results in tiling the frame into 64x64 tiles which are then passed +  to the module's render_fragment().  The other helper, +  til_fb_fragment_slice_single(), instead slices up the input fragment into +  n_fragments horizontal slices.  Which is preferable depends on the +  rendering algorithm.  Use of these helpers is optional and provided just +  for convenience, modules are free to do whatever they wish here. + +    Building upon the first minimal example from above, here's an example +  adding threaded (tiled) rendering: + +      static int minimal_fragmenter(void *context, const til_fb_fragment_t *fragment, unsigned number, til_fb_fragment_t *res_fragment) +      { +        return til_fb_fragment_tile_single(fragment, 64, number, res_fragment); +      } + +      static void minimal_prepare_frame)(void *context, unsigned ticks, unsigned n_cpus, til_fb_fragment_t *fragment, til_fragmenter_t *res_fragmenter) +      { +        *res_fragmenter = minimal_fragmenter; +      } + +      static void minimal_render_fragment(void *context, unsigned ticks, unsigned cpu, til_fb_fragment_t *fragment) +      { +        /* render into fragment->buf, which will be a 64x64 tile within the frame (modulo clipping) */ + +        /* Note: +         *  fragment->frame_{width,height} reflect the dimensions of the +         *    whole-frame fragment provided to prepare_frame() +         * +         *  fragment->{x,y,width,height} describe this fragment's tile +         *    within the frame, which fragment->buf points at the upper left +         *    corner of. +         */ +      } + +      til_module_t minimal_module = { +              .prepare_frame = minimal_prepare_frame, +              .render_fragment = minimal_render_fragment, +              .name = "minimal", +              .description = "Minimal threaded example module", +      } + + +    That's all one must do to achieve threaded rendering.  Note however +  this places new constraints on what's safe to do from within the module's +  render_fragment() function. + +    When using threaded rendering, any varying state accessed via +  render_fragment() must either be thread-local or synchronized using a +  mutex or atomic intrinsics.  For performance reasons, the thread-local +  option is strongly preferred, as it avoids the need for any atomics. + +    Both the create_context() and prepare_frame() functions receive an +  n_cpus parameter primarily for the purpose of preparing +  per-thread/per-cpu resources that may then be trivially indexed using the +  cpu parameter supplied to render_fragment().  When preparing such +  per-thread resources, care must be taken to avoid sharing of cache +  lines.  A trivial (though wasteful) way to achieve this is to simply +  page-align the per-cpu allocation.  With more intimate knowledge of the +  cache line size (64 bytes is very common), one can be more frugal.  See +  the "snow" module for an example of using per-cpu state for lockless +  threaded stateful rendering. | 
