Age | Commit message (Collapse) | Author |
|
My current version of mingw in debian 9.11 doesn't seem to have
rand_r(), so let's just use plain rand() and lose the improved
parallelization on win builds.
|
|
Most modules find themselves wanting some kind of "t" value increasing
with time or frames rendered. It's common for them to create and
maintain this variable locally, incrementing it with every frame
rendered.
It may be interesting to introduce a global notion of ticks since
rototiller started, and have all modules derive their "t" value from
this instead of having their own private versions of it.
In future modules and general innovations it seems likely that playing
with time, like jumping it forwards and backwards to achieve some
visual effects, will be desirable. This isn't applicable to all
modules, but for many their entire visible state is derived from their
"t" value, making them entirely reversible.
This commit doesn't change any modules functionally, it only adds the
plumbing to pull a ticks value down to the modules from the core.
A ticks offset has also been introduced in preparation for supporting
dynamic shifting of the ticks value, though no API is added for doing
so yet.
It also seems likely an API will be needed for disabling the
time-based ticks advancement, with functions for explicitly setting
its value. If modules are created for incorporating external
sequencers and music coordination, they will almost certainly need to
manage the ticks value explicitly. When a sequencer jumps
forwards/backwards in the creative process, the module glue
responsible will need to keep ticks synchronized with the
sequencer/editor tool.
Before any of this can happen, we need ticks as a first-class core
thing shared by all modules.
Future commits will have to modify existing modules to use the ticks
appropriately, replacing their bespoke variants.
|
|
Since snow_context_t needs another member anyways, stick n_cpus in there
to inform the fragmenter of precisely how many fragments to make.
This renderer doesn't benefit from tiling or any such locality, so it uses
the slice fragmenter and really only benefits from as many fragments as there
are CPUs. Any additional fragments is just wasted fragmenting overhead.
|
|
Snow was already threaded, but used a global seed with rand_r() meaning
the CPUs were hammering on the same address. There wasn't any locking or
atomics, as it isn't terribly critical when generating white noise if the
seed access is racy. But the writes still caused cache lines to ping-pong.
This commit gives a ~15.5% speedup in my measurements on an i7-2640M.
Note without the padded union, using just an array of ints, zero gain
is realized. I used a padding of 256 just to have some headroom, x86
is 64 but other CPUs vary, POWER9 is 128 for example.
|
|
Mechanical change removing abbreviation for consistency
|
|
Mostly mechanical change, though threads.c needed some jiggering to
make the logical cpu id available to the worker threads.
Now render_fragment() can easily addresss per-cpu data created by
create_context().
|
|
I wanted to add some noise to the rtv module and figured why not
just add a snow module and make rtv pass through it briefly when
switching modules.
It's not interesting by itself, but as more composite/meta modules
like rtv get made it might be handy beyond rtv.
|