Age | Commit message (Collapse) | Author |
|
Writing this down so it doesn't completely fall off my radar.
|
|
s->count isn't always perfectly divisable by n_cpus, which is why
ctxt->n_elements is computed from n_cpus * elements_per_cpu in
the transition to threaded rendering for flow.
That's all fine and dandy, but the ctxt->elements initialization
loop was still using the vestigial s->count from the pre-threaded
implementation. So on core counts where ctxt->n_elements was
smaller than s->count, initialization scribbled.
Thanks Sketch for assistance in chasing this down w/ASAN enabled
on a box that exhibited crashing w/rtv,channels=flow.
|
|
57bae7 removed the default from the settings list when bumping the counts,
oops!
|
|
The convention has been to label threaded modules in their
description.
|
|
Now that there's threaded rendering, handling larger counts
without bogging down the frame rate on anything remotely modern
is feasible.
|
|
While optimizing the threaded rendering in commit 6d6c141, the
pos.{xy} expanding from 0-1 to -1..+1 were eliminated from the
inner loops in favor of just having the positions always in
-1..+1 coordinates. But I missed that it was only the x/y
coordinates which were being expanded, with .z being left in the
0-1 space, which had a desirable aesthetic effect of condensing
the Z space, flattening everything.
This commit undoes that, without reintroducing the expansion to
the inner loops. It's a bit crufty because now .z is treated
exceptionally throughout as 0..1 while {.x,.y} are in -1..+1, but
it's fine for now.
|
|
This exploits the just added multipass rendering support.
In the first pass, the flow-field is sampled and applied to the
elements, with every thread operating on its own subset of the
elements list. Since the flow-field sampling is all read-only,
it's perfectly safe too do in parallel. Nothing is drawn in the
first pass, it's only the elements updating according to the
flow-field which is performed.
In the second pass, the elements are rendered in parallel using
the slice_per_cpu fragmenter. Since the elements are kept on a
simple array, with no spatial indexing, every thread must visit
every element.
Since the fragmenter used divides the frame into horizontal
slices, every thread needing to reject elements not overlapping
its region can take some shortcuts in easily identifying elements
entirely outside its region. But the whole 3d->2d projection
step must still be performed for every element's current position
and +n_iters final position for the frame, which does have a
divide unfortunately.
Nonetheless, this change improves frame rates substantially on my
2c/4t i7 X230 as benchmarked w/--video=mem,1366x768:
--seed=0x64fa9508 '--module=rtv,channels=flow,duration=3,context_duration=3,caption_duration=0,log_channels=on,snow_duration=0,snow_module=none' '--video=mem,size=1366x768'
rtv channel settings: 'flow,size=4,count=40000,speed=.8'
FPS: 261
FPS: 265
rtv channel settings: 'flow,size=4,count=1000,speed=.9'
FPS: 1153
FPS: 3204
FPS: 2934
rtv channel settings: 'flow,size=8,count=5000,speed=.9'
FPS: 2923
FPS: 1634
FPS: 1592
rtv channel settings: 'flow,size=2,count=50000,speed=.4'
FPS: 1006
FPS: 219
FPS: 268
rtv channel settings: 'flow,size=16,count=30000,speed=.8'
FPS: 304
FPS: 350
FPS: 343
rtv channel settings: 'flow,size=16,count=30000,speed=.02'
FPS: 379
FPS: 503
FPS: 472
rtv channel settings: 'flow,size=8,count=1000,speed=.16'
FPS: 1393
FPS: 3822
FPS: 3876
---
Prior to this commit:
--seed=0x64fa9508 '--module=rtv,channels=flow,duration=3,context_duration=3,caption_duration=0,log_channels=on,snow_duration=0,snow_module=none' '--video=mem,size=1366x768'
rtv channel settings: 'flow,size=4,count=40000,speed=.8'
FPS: 53
FPS: 53
rtv channel settings: 'flow,size=4,count=1000,speed=.9'
FPS: 426
FPS: 1366
FPS: 1335
rtv channel settings: 'flow,size=8,count=5000,speed=.9'
FPS: 1097
FPS: 368
FPS: 367
rtv channel settings: 'flow,size=2,count=50000,speed=.4'
FPS: 279
FPS: 73
FPS: 74
rtv channel settings: 'flow,size=16,count=30000,speed=.8'
FPS: 71
FPS: 71
FPS: 70
rtv channel settings: 'flow,size=16,count=30000,speed=.02'
FPS: 136
FPS: 305
FPS: 305
rtv channel settings: 'flow,size=8,count=1000,speed=.16'
FPS: 972
FPS: 2593
FPS: 2634
|
|
Modules can now use the til_module_t.finish_frame() return value
to trigger re-rendering by returning 1, returning 0 finishes the
frame.
A smattering of til_module_t.finish_frame() implementations were
largely mechanically updated to match this change by returning 0,
since nothing actually uses multi-pass rendering yet.
The impetus for this is experimenting with the flow module doing
two passes of threaded rendering per frame. A first pass to
sample the flow field and update the elements, per-cpu, but
drawing nothing. Then a second pass to render the elements in a
tiled manner.
|
|
Nothing too crazy here, the speed= setting still controls the
speed in lieu of something driving the tap.
|
|
Remove strobe_update_taps() redundant assignment if already zero
|
|
This is too aggressive and produces some undesirable visible
artifacts on the periphery, especially for slow-moving
small-size fields.
In such scenarios the elements near the edges would be
excessively pruned when the direction wandered off-screen, then
leaving an overly sparse region when the direction inevitably
wandered back.
This is still an issue but it's far less prominent when only
clipping to the flow field boundaries... since the FOV doesn't
quite encompass the edges of the flow field. Now the elements
can survive wandering a bit off-screen, and re-enter.
|
|
The repro is:
--seed=0x64f6820b '--module=compose,layers=blank\,pixbounce\\\,pixmap_size\\\=0.8\\\,pixmap\\\=err\,pixbounce\\\,pixmap_size\\\=0.4\\\,pixmap\\\=ignignokt,texture=voronoi\,cells\=512\,randomize\=on' '--video=mem,size=3840x2160'
The major culprit seems to be the combination of high resolution,
and small number of voronoi cells (cells=512), with randomize=on
which exercises jumpfill every frame.
The way jumpfill is implemented currently is racy by design to
allow threading, and mostly works fine despite not really being
how the algorithm is intended to work.
The assumption has been, something like:
"the seeds are already placed before the threaded phase, so the
threaded jumpfill should at least find stable seed cells in the
face of racing against other tiles being jumpfilled
simultaneously"
But it appears that assumption isn't always true, in that we
won't necessarily find one of the seed cells at the start of the
jumpfill when there aren't that many cells (512) compared to the
area of the voronoi (3840x2160).
By noticing when we've finished a tile's jumpfill with remaining
unassigned cells, we can just repeat the jumpfill, with time
passed, and the other tiles will have made progress on their work
propagating more knowledge of where cells are... so the
subsequent pass will probably leave nothing unassigned.
This approach sucks, but stops the crashing.
It'd also be possible to just change the way cells are looked up
so there's no potential for a NULL pointer dereference, just have
some uninitialized cell color which gets shown erroneously in the
output. That avoids the computational cost of repeating the
tile's jumpfill, and likely nobody would notice the likely single
pixel of error for a single frame.
I'm just doing this quick and dirty fix to prevent the crashing
for now, and would like to just revisit voronoi more thoroughly
with an eye towards decoupling the voronoi cost from the
resolution. It's a cheap hack the way there's a distance entry
per pixel, done just to simplify the implementation when I
slapped it together on a Zephyr train ride.
|
|
This is a first stab at colorizing the output.
The flow field now has two v3f_t datums per cell, direction and
color.
It's a bit pastel-y and color choice/palettes definitely needs
work, at least some gamma correction would make sense.
But I kind of like the pastel look actually, some of the
combinations start looking very 80s aesthetic.
A good way to watch flow's possibilities is:
--module=rtv,channels=flow,duration=10,context_duration=10,caption_duration=0 \
--video=sdl,fullscreen=on --defaults --go
The long-ish duration really gives a chance to get into the
groove of things before switching
|
|
Simplify ff_new() failure path by using ff_free(), also make
ff_free() more ergonomic by returning NULL.
|
|
This is kind of a particle system, where the particles are pushed
around through a 3D vector space treated as a flow field.
No physics are being simulated here, it's just treating the flow
field as direction vectors that are trilinearly interpolated when
sampled to produce a single direction vector. That direction
vector gets applied to particles near it.
To keep things interesting the flow field evolves by having two
distinct flow fields which the simulation progressively
alternates sampling from. For every frame, both flow fields are
sampled for every particle, but how much weight is given to the
influence of one vs. the other varies by a triangle wave over
time. When the weight is biased enough to one of the flow fields
near a peak/valley in the triangle wave, the other gets
re-populated while its influence is negligible, also
interpolating its new values with 25% influence from the active
field.
The current flow field population routine is completely random.
Yet there's a surprising amount of emergent order despite being
totally randomized direction vectors.
Currently supported settings include:
size= the width of the 3D flow field cube in direction vectors
(the number of vectors is size*size*size)
count= the number of particles/elements
speed= how far a particle is moved along the current sample's
direction vector
This was first implemented in 2017, but sat unfinished in a topic
branch for myriad reasons. Now that rototiller has much more
robust settings infrastructure, among other things, it seemed
worth finishing this up and merging.
|
|
This has a nice side effect of being able to have no rings at all
via 0.
Note it would be potentially interesting to tap n_centers, but
that's substantially more complicated as those have allocated
state per-center. Maybe the centers= setting could be treated
as a max, then the tap could vary within that limit.
|
|
This was hard-coded @ 20 for no particular reason.
Varying this paramater greatly affects the output, it should also
be exposed as a tap.
|
|
Pixbounce isn't a particularly costly thing to render, but when
used as part of a composition, any time wasted with idle CPUs is
CPU time potentially stolen from other layers which could be
utilizing those CPUs.
So in this commit I've done a rather minimal conversion of the
pixbounce code to support threaded rendering. It basically
doubles+ lone pixbounce FPS in --video=mem tests here.
|
|
Just some more res_setup baking failure path cleanups, largely
mechanical change.
|
|
Similar to setup_interactively(), rkt_scener needs to handle
EINVAL errors on res_setup baking @ finalize.
Until now it had handled EINVAL @ finalize by failing the
operation and returning to the main scenes prompt.
With this commit rkt_scener now returns the user to the failed
setting, enabling correcting the problem.
It's a little janky, but not too bad. See comments for why.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
compose_setup() doesn't have any res_setup baking -EINVAL error
paths, but still transition over to enable potentially
deprecating the value-oriented variant.
What error paths it does have during res_setup baking is nested
in the underlying tile's module setup, and that should be
propagating up any -EINVAL failures with the res_setting already
populated.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
montage_setup() doesn't have any res_setup baking -EINVAL error
paths, but still transition over to enable potentially
deprecating the value-oriented variant.
What error paths it does have during res_setup baking is nested
in the underlying tile's module setup, and that should be
propagating up any -EINVAL failures with the res_setting already
populated.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
submit_setup() doesn't have any res_setup baking -EINVAL error
paths, but still transition over to enable potentially
deprecating the value-oriented variant.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
Also fixed a bug while here in the style_values error detection;
it was misusing nelems() on the array when it's NULL-terminated
with a sentinel. But this was only triggered if a user force
overrided the setting with the :-prefix syntax, since otherwise
the setting had to be in the values set according to the
front-end. It was known there'd prolly be bugs when adding that
: override prefix support. A lot of the module-local setup
baking code has been neglected/bitrotted/carelessly changed over
time, depending on front-end values policing to keep things on
the rails.
|
|
More setup_func conversion to returning the failed setting on
errors during res_setup baking.
|
|
Switching over to the newly added setting-centric variants
enables ergonomically returning the failed setting during baking.
With this commit, modules/plato becomes the first to properly
return the setting which failed to parse during baking res_setup.
|
|
Several modules still had vestigial ad-hoc free() cleanups on
error paths in their create_context().
Largely mechanical change of replacing those with
til_module_context_free() which is more appropriate, since the
til_module_context_t holds a reference on the setup. A plain
free() will leak that reference.
But it's only on create_context() failures which are uncommon,
so this was in practice mostly harmless...
|
|
Having everything in fixed defines severely constrains the
diversity of particle behaviors and appearances.
This commit has been sitting around bitrotting since 2017, but
now that there's all this settings infra. and randomizing via
rtv, it seems worth landing, so I've rebased and am merging to
prevent a bitrot->rebase recurrence.
As-is, this commit ~minimally establishes a somewhat streamlined
parameterizing mechanism w/X-Macro patterns, while wiring up a
few of the obvious use cases surrounding xplode/burst, colorizing
the default sparkler explosions while at it.
It appears that when I first hacked this up I did some
experimentation with parameters as well, so there are some tweaks
to the behavior as opposed to a strict conversion of the fixed
defines to parameters. They seem minor enough to just leave be.
Plus a few minor optimizations like converting divides to
multiplies were in there.
Future commits can now wire up settings to choose from parameter
presets for different sparklers...
|
|
Mechanical rename for clarity reasons, primarily to better
differentiate from the setup_func style
til_module_setup()/til_module_setup_full() functions.
|
|
This sets the flash state when driven by something (like rkt).
When driven, the toggle tap will override hz altogether.
|
|
This enables dynamic external control of the strobe's frequency.
|
|
Clarifying trivial mechanical rename
|
|
Preparatory commit for exposing strobe::hz as a tap, it seems
awkward to work in periods especially in the track data.
Though I do like the 0-1 range of period, though that doesn't
even hold for slower than 1HZ frequencies so... it's kind of a
lie anyways.
At least if the track is called "hz" anyone will know what the
values mean and easily reason about them. So I'm making the
setting consistent with the soon to be added "hz" tap.
|
|
Commit 64a5b17 added this to til_module_context_t, so it's
already being tracked now making this redundant.
|
|
This needs a bit more work, but at least filtering the same-tick
renders avoids the many-movements-per-frame potential when used
as something like a checkers::fill_module
|
|
This needs a bit more work, but at least filtering the same-tick
renders avoids the many-increments-per-frame potential when used
as something like a checkers::fill_module
|