Age | Commit message (Collapse) | Author |
|
Writing this down so it doesn't completely fall off my radar.
|
|
s->count isn't always perfectly divisable by n_cpus, which is why
ctxt->n_elements is computed from n_cpus * elements_per_cpu in
the transition to threaded rendering for flow.
That's all fine and dandy, but the ctxt->elements initialization
loop was still using the vestigial s->count from the pre-threaded
implementation. So on core counts where ctxt->n_elements was
smaller than s->count, initialization scribbled.
Thanks Sketch for assistance in chasing this down w/ASAN enabled
on a box that exhibited crashing w/rtv,channels=flow.
|
|
57bae7 removed the default from the settings list when bumping the counts,
oops!
|
|
The convention has been to label threaded modules in their
description.
|
|
Now that there's threaded rendering, handling larger counts
without bogging down the frame rate on anything remotely modern
is feasible.
|
|
While optimizing the threaded rendering in commit 6d6c141, the
pos.{xy} expanding from 0-1 to -1..+1 were eliminated from the
inner loops in favor of just having the positions always in
-1..+1 coordinates. But I missed that it was only the x/y
coordinates which were being expanded, with .z being left in the
0-1 space, which had a desirable aesthetic effect of condensing
the Z space, flattening everything.
This commit undoes that, without reintroducing the expansion to
the inner loops. It's a bit crufty because now .z is treated
exceptionally throughout as 0..1 while {.x,.y} are in -1..+1, but
it's fine for now.
|
|
This exploits the just added multipass rendering support.
In the first pass, the flow-field is sampled and applied to the
elements, with every thread operating on its own subset of the
elements list. Since the flow-field sampling is all read-only,
it's perfectly safe too do in parallel. Nothing is drawn in the
first pass, it's only the elements updating according to the
flow-field which is performed.
In the second pass, the elements are rendered in parallel using
the slice_per_cpu fragmenter. Since the elements are kept on a
simple array, with no spatial indexing, every thread must visit
every element.
Since the fragmenter used divides the frame into horizontal
slices, every thread needing to reject elements not overlapping
its region can take some shortcuts in easily identifying elements
entirely outside its region. But the whole 3d->2d projection
step must still be performed for every element's current position
and +n_iters final position for the frame, which does have a
divide unfortunately.
Nonetheless, this change improves frame rates substantially on my
2c/4t i7 X230 as benchmarked w/--video=mem,1366x768:
--seed=0x64fa9508 '--module=rtv,channels=flow,duration=3,context_duration=3,caption_duration=0,log_channels=on,snow_duration=0,snow_module=none' '--video=mem,size=1366x768'
rtv channel settings: 'flow,size=4,count=40000,speed=.8'
FPS: 261
FPS: 265
rtv channel settings: 'flow,size=4,count=1000,speed=.9'
FPS: 1153
FPS: 3204
FPS: 2934
rtv channel settings: 'flow,size=8,count=5000,speed=.9'
FPS: 2923
FPS: 1634
FPS: 1592
rtv channel settings: 'flow,size=2,count=50000,speed=.4'
FPS: 1006
FPS: 219
FPS: 268
rtv channel settings: 'flow,size=16,count=30000,speed=.8'
FPS: 304
FPS: 350
FPS: 343
rtv channel settings: 'flow,size=16,count=30000,speed=.02'
FPS: 379
FPS: 503
FPS: 472
rtv channel settings: 'flow,size=8,count=1000,speed=.16'
FPS: 1393
FPS: 3822
FPS: 3876
---
Prior to this commit:
--seed=0x64fa9508 '--module=rtv,channels=flow,duration=3,context_duration=3,caption_duration=0,log_channels=on,snow_duration=0,snow_module=none' '--video=mem,size=1366x768'
rtv channel settings: 'flow,size=4,count=40000,speed=.8'
FPS: 53
FPS: 53
rtv channel settings: 'flow,size=4,count=1000,speed=.9'
FPS: 426
FPS: 1366
FPS: 1335
rtv channel settings: 'flow,size=8,count=5000,speed=.9'
FPS: 1097
FPS: 368
FPS: 367
rtv channel settings: 'flow,size=2,count=50000,speed=.4'
FPS: 279
FPS: 73
FPS: 74
rtv channel settings: 'flow,size=16,count=30000,speed=.8'
FPS: 71
FPS: 71
FPS: 70
rtv channel settings: 'flow,size=16,count=30000,speed=.02'
FPS: 136
FPS: 305
FPS: 305
rtv channel settings: 'flow,size=8,count=1000,speed=.16'
FPS: 972
FPS: 2593
FPS: 2634
|
|
Modules can now use the til_module_t.finish_frame() return value
to trigger re-rendering by returning 1, returning 0 finishes the
frame.
A smattering of til_module_t.finish_frame() implementations were
largely mechanically updated to match this change by returning 0,
since nothing actually uses multi-pass rendering yet.
The impetus for this is experimenting with the flow module doing
two passes of threaded rendering per frame. A first pass to
sample the flow field and update the elements, per-cpu, but
drawing nothing. Then a second pass to render the elements in a
tiled manner.
|
|
This is intended to perhaps be of use for threaded rendering that
don't actually produce pixels during their render phase, but
still need n_cpus fragments to dispatch the parallel work keying
off the fragment.number.
Such a renderer might then put its pixels on-screen serially @
finish_frame(), or maybe the rendering functions will get a
return value to trigger multi-pass rendering on the same tick.
|
|
Nothing too crazy here, the speed= setting still controls the
speed in lieu of something driving the tap.
|
|
Remove strobe_update_taps() redundant assignment if already zero
|
|
The ad-hoc sys-based probe of cpus works fine on Linux, but it's
not really preferable when Linux's sysconf supports
_SC_NPROCESSORS_ONLN, and there's a chance non-Linux's will
support it too.
Supposedly even Emscripten supports this, and will report the
number of available WebWorkers through this interface, even with
pthreads emulation. I'm curious if that actually works, getting
wasm builds of rototiller demos could be fun.
|
|
This is too aggressive and produces some undesirable visible
artifacts on the periphery, especially for slow-moving
small-size fields.
In such scenarios the elements near the edges would be
excessively pruned when the direction wandered off-screen, then
leaving an overly sparse region when the direction inevitably
wandered back.
This is still an issue but it's far less prominent when only
clipping to the flow field boundaries... since the FOV doesn't
quite encompass the edges of the flow field. Now the elements
can survive wandering a bit off-screen, and re-enter.
|
|
Phil tripped over this when his TV powered off in the midst of
playing with rototiller over hdmi output.
The NULL mode/connector was being supplied as the
til_setting_spec_t.preferred value throwing an assert in
til_setting_desc_new().
Just detect this exception and return an error.
|
|
The repro is:
--seed=0x64f6820b '--module=compose,layers=blank\,pixbounce\\\,pixmap_size\\\=0.8\\\,pixmap\\\=err\,pixbounce\\\,pixmap_size\\\=0.4\\\,pixmap\\\=ignignokt,texture=voronoi\,cells\=512\,randomize\=on' '--video=mem,size=3840x2160'
The major culprit seems to be the combination of high resolution,
and small number of voronoi cells (cells=512), with randomize=on
which exercises jumpfill every frame.
The way jumpfill is implemented currently is racy by design to
allow threading, and mostly works fine despite not really being
how the algorithm is intended to work.
The assumption has been, something like:
"the seeds are already placed before the threaded phase, so the
threaded jumpfill should at least find stable seed cells in the
face of racing against other tiles being jumpfilled
simultaneously"
But it appears that assumption isn't always true, in that we
won't necessarily find one of the seed cells at the start of the
jumpfill when there aren't that many cells (512) compared to the
area of the voronoi (3840x2160).
By noticing when we've finished a tile's jumpfill with remaining
unassigned cells, we can just repeat the jumpfill, with time
passed, and the other tiles will have made progress on their work
propagating more knowledge of where cells are... so the
subsequent pass will probably leave nothing unassigned.
This approach sucks, but stops the crashing.
It'd also be possible to just change the way cells are looked up
so there's no potential for a NULL pointer dereference, just have
some uninitialized cell color which gets shown erroneously in the
output. That avoids the computational cost of repeating the
tile's jumpfill, and likely nobody would notice the likely single
pixel of error for a single frame.
I'm just doing this quick and dirty fix to prevent the crashing
for now, and would like to just revisit voronoi more thoroughly
with an eye towards decoupling the voronoi cost from the
resolution. It's a cheap hack the way there's a distance entry
per pixel, done just to simplify the implementation when I
slapped it together on a Zephyr train ride.
|
|
This is a first stab at colorizing the output.
The flow field now has two v3f_t datums per cell, direction and
color.
It's a bit pastel-y and color choice/palettes definitely needs
work, at least some gamma correction would make sense.
But I kind of like the pastel look actually, some of the
combinations start looking very 80s aesthetic.
A good way to watch flow's possibilities is:
--module=rtv,channels=flow,duration=10,context_duration=10,caption_duration=0 \
--video=sdl,fullscreen=on --defaults --go
The long-ish duration really gives a chance to get into the
groove of things before switching
|
|
Just another defensive programmer error assert, though
exceedingly unlikely we definitely shouldn't be getting
til_setup_t's created by someone else.
|
|
til_module_create_contexts() currently supplies the til_module_t*
independent from the til_setup_t*, but since the addition of
til_setup_t.creator, we actually get the module from the setup as
well. So let's at least assert that they match, since it'd be a
real problem if a setup from a foreign creator somehow got here.
It's tempting to refactor to just derive the module from the
setup altogether and stop passing that around separately, but
that kind of sucks as-is because til_setup_t is used by more than
just rendering modules. It makes me lean towards deprecating
til_setup_t.creator and instead having the various til_setup_t
users subclass it with their own types like til_fb_setup_t and
til_module_setup_t which each get typed creators, then pass those
around. It's starting to smell like over-engineering, let's just
roll with the assert.
|
|
Simplify ff_new() failure path by using ff_free(), also make
ff_free() more ergonomic by returning NULL.
|
|
This is kind of a particle system, where the particles are pushed
around through a 3D vector space treated as a flow field.
No physics are being simulated here, it's just treating the flow
field as direction vectors that are trilinearly interpolated when
sampled to produce a single direction vector. That direction
vector gets applied to particles near it.
To keep things interesting the flow field evolves by having two
distinct flow fields which the simulation progressively
alternates sampling from. For every frame, both flow fields are
sampled for every particle, but how much weight is given to the
influence of one vs. the other varies by a triangle wave over
time. When the weight is biased enough to one of the flow fields
near a peak/valley in the triangle wave, the other gets
re-populated while its influence is negligible, also
interpolating its new values with 25% influence from the active
field.
The current flow field population routine is completely random.
Yet there's a surprising amount of emergent order despite being
totally randomized direction vectors.
Currently supported settings include:
size= the width of the 3D flow field cube in direction vectors
(the number of vectors is size*size*size)
count= the number of particles/elements
speed= how far a particle is moved along the current sample's
direction vector
This was first implemented in 2017, but sat unfinished in a topic
branch for myriad reasons. Now that rototiller has much more
robust settings infrastructure, among other things, it seemed
worth finishing this up and merging.
|
|
This has a nice side effect of being able to have no rings at all
via 0.
Note it would be potentially interesting to tap n_centers, but
that's substantially more complicated as those have allocated
state per-center. Maybe the centers= setting could be treated
as a max, then the tap could vary within that limit.
|
|
This was hard-coded @ 20 for no particular reason.
Varying this paramater greatly affects the output, it should also
be exposed as a tap.
|
|
Pixbounce isn't a particularly costly thing to render, but when
used as part of a composition, any time wasted with idle CPUs is
CPU time potentially stolen from other layers which could be
utilizing those CPUs.
So in this commit I've done a rather minimal conversion of the
pixbounce code to support threaded rendering. It basically
doubles+ lone pixbounce FPS in --video=mem tests here.
|
|
Just some more res_setup baking failure path cleanups, largely
mechanical change.
|
|
This introduces til_setup_free_with_ret_err()
which just does the common idiom of:
- free the setup
- return with err code
so all these failure cases can be reduced to just a direct return
of calling this function.
Simpler version of til_setup_free_with_failed_setting_ret_err(),
which could have been reuesed for this by making the settings-related
parameters optional... but this way the call sites are less verbose
and these are tiny helpers it's harmless.
|
|
Similar to setup_interactively(), rkt_scener needs to handle
EINVAL errors on res_setup baking @ finalize.
Until now it had handled EINVAL @ finalize by failing the
operation and returning to the main scenes prompt.
With this commit rkt_scener now returns the user to the failed
setting, enabling correcting the problem.
It's a little janky, but not too bad. See comments for why.
|
|
Now that all the module setup_funcs are returning a res_setting
w/-EINVAL in res_setup baking, it should be fine for
setup_interactively() to resume using the single setup_func()
loop passing res_setup, and always using res_setting on -EINVAL.
This is especially desirable now that :-prefix settings are
accepted as overrides. You can now get arbitrary values down to
the res_setup baking phase, previously when you got something
wrong there you'd get an ambiguous error without a setting path.
With this commit, you should get a much more useful error
including a setting path.
This partially undoes 5191d68, where res_setup-baking was split
off from the core loop, to occur after the res_failed_desc on
EINVAL storage.
|
|
This function is effectively deprecated, but
til_settings_apply_desc_generators() still makes use of it.
So for now just making it private until I feel like either
refactoring the desc generators to not use it, or maybe just
moving a simpler open-coded form into
til_settings_apply_desc_generators().
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
_ref_setup() doesn't have any -EINVAL failure paths in res_setup
baking, but let's get off til_settings_get_and_describe_value()
so it can go away.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
compose_setup() doesn't have any res_setup baking -EINVAL error
paths, but still transition over to enable potentially
deprecating the value-oriented variant.
What error paths it does have during res_setup baking is nested
in the underlying tile's module setup, and that should be
propagating up any -EINVAL failures with the res_setting already
populated.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
montage_setup() doesn't have any res_setup baking -EINVAL error
paths, but still transition over to enable potentially
deprecating the value-oriented variant.
What error paths it does have during res_setup baking is nested
in the underlying tile's module setup, and that should be
propagating up any -EINVAL failures with the res_setting already
populated.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
submit_setup() doesn't have any res_setup baking -EINVAL error
paths, but still transition over to enable potentially
deprecating the value-oriented variant.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
Also fixed a bug while here in the style_values error detection;
it was misusing nelems() on the array when it's NULL-terminated
with a sentinel. But this was only triggered if a user force
overrided the setting with the :-prefix syntax, since otherwise
the setting had to be in the values set according to the
front-end. It was known there'd prolly be bugs when adding that
: override prefix support. A lot of the module-local setup
baking code has been neglected/bitrotted/carelessly changed over
time, depending on front-end values policing to keep things on
the rails.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|