| Age | Commit message (Collapse) | Author | 
|---|
|  | There's an outstanding issue surrounding the need for context
clones, and I'd just like to write something down somewhere
before it falls off my radar.  Presently it's just checkers that
exercises this need, so it makes sense to put it here for now,
until I get around to actually taking action on the issue. | 
|  | Mechanical change switching til_fragmenter_slice_per_cpu() users
over to til_fragmenter_slice_per_cpu_x16(), except sparkler where
it's quite detrimental to performance. | 
|  | This restores the original til_fragmenter_slice_per_cpu() while
adding an explicit x16 variant for what
til_fragmenter_slice_per_cpu() had become.
The impetus for this is realizing the x16 multiplier is terrible
for sparkler's crappy threading, and it really need strictly
n_cpus slices for the threading to be beneficial.
So I'm just getting rid of the hidden x16 in favor of making it
explicit in an _x16 variant.
Subsequent commit will pivot all the non-sparkler callers to
til_fragmenter_slice_per_cpu_x16(). | 
|  | Better spearate the generic error line from surrounding text with
an extra newline | 
|  | This was a bit of an aspirational misnomer, editing scenes isn't
actually implemented yet.  What the EDITSCENE state currently
implements is the per-scene dialog+prompt, which originally was
going to just be the scene editing flow but became more of a
"view a specified scene's details" with a prompt of its own.
Nothing functionally changes, just mechanical internal renames. | 
|  | Preparatory commit for rkt_scener scenes editing, which will be
replacing the current scene's settings instance with a newly
constructed one.  This enables simply duplicating the existing
scene setting's positional label, which will be identical in the
replacement. | 
|  | Most of the time if there are undescribed settings in the
settings hierarchy, they're just noise in the serialized as_arg
form.
So change til_settings_as_arg() to always skip undsecribed
entries, and introduce til_settings_as_arg_unfiltered() for any
use cases that actually want everything included.  The unfiltered
variant may just go the way of the dodo if nothing ever makes use
of it. | 
|  | Since "ref" renders using arbitrary foreign contexts, it must
explicitly limit their rendering concurrency to its own. | 
|  | With the introduction of discoverable on-stream contexts, with
the intention of modules finding them at runtime for nested
rendering, relying exclusively on limiting n_cpus @
til_module_create_context() is no longer adequate.
When the nested rendering makes use of a context created
elsewhere, it can't make any assumptions about what its n_cpus
is, nor should it be attempting to change that value for its use.
So this variant just adds a max_cpus parameter for setting an
upper bound to whatever is found in the context's n_cpus.
The impetus for this is to fix the ref builtin, which you can
currently trick into deadlock by performing a nested threaded
render within a threaded render.  What the ref module should be
doing is propagating its own context's n_cpus down as the upper
bound, via this new render variant. | 
|  | This is leftover from development when it used a fmt string in
combination with the key, before the desc path was getting
properly appended. | 
|  | With the addition of the "radcache" in b6362c, the need for a
faster approximate atan2f() is largely eliminated.
And there seems to be a bug in the implementation as-is taken
from https://mazzo.li/posts/vectorized-atan2.html
You can see the bug as vertical line artifact around the center
where the X coordinate would be 0.  Rather than debug what's
wrong with this approximation's implementation surrounding its
quadrant adjustments, let's just resume using atan2f() and let
the cache keep things quick. | 
|  | When introducing the **fragment_ptr model in 5a0776f, the
rototiller_thread() introduced a local place to put the pointer
to point at when rendering.
But this pointer then ends up outliving the thread on shutdown
within any queued frames until quiescent.  Fixed in the obvious
way by sticking it in rototiller_t instead. | 
|  | On Linux I don't notice a significant affect on anything letting
rkt try connect every frame when offline but in creative mode.
On Windows however, Dan reported significant latencies in the
Scener prompt responsiveness and visible slowdowns in this
condition.
I suspect the WIN32 Rocket library's sync_tcp_connect() code is
the real problem here.  But for now I can ameliorate things a bit
by just hammering on that code path less when unconnected. | 
|  | This is only relevant to creative mode.  Stops RocketEditor from
continuing playback endlessly until user intervention beyond the
current end of the demo. | 
|  | Prepartory commit for pausing playback upon entering 99999 scene
It needs to trigger only on the edge of entering the scene to
permit RocketEditor to unpause playback even if still in scene
99999, if that's what the user is trying to do.  It'd be annoying
to have it just keep asserting a paused state until the scene idx
leaves 99999...
But this also enables triggering anything on scene change edges,
for future stuff. | 
|  | - strip off the leading /path/to/rkt/module prefix
- separate taps from their scene context path with ':' vs. '/'
RocketEditor doesn't currently support recursive grouping, so
this is as good as it gets.
Note this commit will break the existing tracks for alphazed, so
you'll have to use a newer .zip for track data if building your
rototiller from source.  Or build from a prior commit. | 
|  | Preparatory commit for rewriting track paths a bit to better
group things in RocketEditor.  I'll need access to
rkt_context_t.til_module_context_t.setup->path for prefix
matching purposes.. | 
|  | This makes it possible to tiled+rotate the output of another
module in the same manner checkers::fill_module fills cells with
module output.
The default stays "none" for the classic roto with the
psychedelic color cycling.  When !"none" the color cycling
doesn't get applied currently.  It might be interesting to try
support that in the future though. | 
|  | During rkt_scener development this append was at one time needed,
as there was no retained reference to the scenes_settings for
deriving paths from.
Now that the path is derived from the actual scenes setting
instance it's just resulting in a double trailing /scenes in the
"/module/rkt/scenes/scenes:" heading
Purely cosmetic fix | 
|  | For meta-demo use cases like alphazed is experimenting with, it's
desirable to change the window title from "rototiller" to
"alphazed" or whatever if in windowed mode.
This adds a way to do that in the obvious fashion...
--title="alphazed" etc. | 
|  |  | 
|  | The tap->ptr indirection must always be updated even when we're
the driver on a fresh pipe created by us.
This bug only triggers when the caller's tap was already
on-stream, without being the driver, and the driving tap/context
has since exited the stream, untapping itself which removed the
associated pipes - even the ones it didn't own but was driving.
In that scenario when pipe is created by the previously
non-driving tap on its first update since the driver exited, its
tap->ptr still points at the stale driver.  This manifested as a
UAF bug in that case.
The fix is a simple matter of always updating tap->ptr to reflect
the driving tap.
This also fixes the real bug causing 468c78e3 to crash, such that
the syncronous gc performed there shouldn't really be necessary
to prevent crashing.  It was the awkward overlapping existence of
contexts at the same path which produced the triggering lifecycle
pattern WRT driving taps at that path. | 
|  | Pressing '=' at either prompts now makes scener's current scene
the current Rocket scene.
While you could already do this manually by just looking at the
scenes list for the one with the '*' in the Rocket column while
either watching a production and pressing <enter> repeatedly to
keep refreshing the scenes list... that's cumbersome and
annoying, now just use this shortcut.
Since this just copies Rocket's scene to the Scener scene index,
it needed to properly handle scene 99999... hence the previous
commits. | 
|  | While there's no actual context for 99999, it's a state we need
to represent visibly somehow.. so just make it appear like an
epilogue scene off the end.
I've included the Rocket/Scener/Pinned status columns
consistently as well so you at least still get a visible
indication when you've done something like pinned 99999 somehow
(not that this is possible presently, but with future changes
there will be more ways to copy the Rocket idx into Scener's idx) | 
|  | rkt_scener needs to know this value so define it in rkt.h and
switch over all the existing 99999 instances. | 
|  | Until channel context paths are distinct it's buggy to let the
contexts linger while constructing the next channel's contexts.
Originally when the gc was added here the intention was to
support stuff like the "ref" module and get the channels settings
wired up immediately with more focus on rtv's details in this
area.  Supporting stuff like contexts backing some layers
persisting across channels, while the others were swapped out,
seemed potentially interesting (and it still is).
But the rkt stuff became prioritized as rtv is more like a fuzzer
than anything despite being the default module.  And rkt related
activities will continue for now, so let's just get rtv less
likely to crash.
A reliable repro for triggering an ASAN UAF bug without this
commit is:
--seed=0x64af3b05
'--module=rtv,duration=1,context_duration=1,channels=compose,caption_duration=2,snow_duration=0,snow_module=none,log_channels=on'
'--video=sdl,fullscreen=off,size=640x480'
A few channels in blinds will UAF while updating taps stored in a
freed context, because the previous channel has a blinds in the
same layer as the newly setup channel, putting the contexts at
exactly the same paths on-stream.  There's probably another bug
in here that I need to dig into, but coexisting contexts at the
same path on-stream was never the intention.  The syncronous
immediate gc ensures nothing remains of the previous channel
before constructing the new one at the same path. | 
|  | The channel module name was being used as the settings instance
name, which is redundant to the entries[0] "value as label"
behavior with the module name also @ entries[0].
The resulting paths resembled:
/module/rtv/compose/compose/layers/[N]/...
/module/rtv/compose/compose/texture/...
Now they are:
/module/rtv/channel/compose/layers/[N]/...
/module/rtv/channel/compose/texture/...
There's still work to be done in this area of rtv.  It's unclear
if even a static "/module/rtv/channel" is correct, or if it
should show the subscript of the channels[] entry currently being
used when that becomes a proper nested settings situation, i.e.
/module/rtv/channels/[0]/compose/layers/[N]/...
...
/module/rtv/channels/[1]/spiro
...
/module/rtv/channels/[2]/roto
...
for a "channels=compose,spiro,roto" kind of hypothetical
another option which might make sense is to have the channel path
more like an auto-increment identifier so the channel contexts
have unique paths and don't collide on the stream.  This could be
important for scenarios utilizing the "ref" module and trying to
reuse parts of the context from one channel to the next.  When
the paths collide things tend to break in weird ways presently.
this area needs more thought | 
|  | Due to how meta-modules reference other modules from within their
context, they pin the related refcounts until their gc is
performed.
So repeat the gc passess until nothing gets freed, to gc those
recursive references immediately.
Fortunately this isn't intended to be done at perf-sensitive
times... | 
|  | til_stream_register_module_contexts() reuses the container when
c->n_contexts is sufficiently large, and deliberately leaves
c->n_contexts as-is to potentially accomodate a larger set in the
future.  The excess was NULLified in prepping for reuse so it
should be fine, just add a blurb about it so it doesn't look like
an oversight. | 
|  | Silly and unnecessarily racy to do this in render_fragment now
that blinds is threaded | 
|  | This is a first stab at threading blinds, while here I got rid of
the reliance on _checked() put_pixel for clipping.
This could be better, things like using a block put/copy instead
of put_pixel would be a huge advantage for blinds due to its
naturally contiguous blocks per-blind.
While this module when non-threaded wasn't especially slow, any
module leaving cores idle is depriving other potentially threaded
modules of utilizing them for the duration of the non-threaded
render.  So now that rototiller is being used for compositions
and increasingly becoming something of a meta-demo, it's become
important to make everything as threaded as possible. | 
|  | Especially since taps are established on first use, it's
desirable to get them online immediately so they show in things
like --print-pipes and modules::rkt/RocketEditor and not
spuriously once the context becomes actively rendered...
This kind of thing will be done for all modules eventually | 
|  | computing the angle for every pixel coordinate from the origin is
costly, even with the approximate method added by c690303.
An easy speedup is to only do this once for a given frame
dimensions, and cache those results.  In the form of a 32-bit
float, it's equivalent to caching a full page of pixel data.
This is slightly complicated by needing to be an effectively
global cache and the potential for multiple shapes contexts
rendering concurrently when part of a composition.
I think this particular situation highlights a need for something
equivalent generalized on-stream where modules can register
discoverable caches of costly to compute information, having a
high probability of being useful to others.
In this particular case it was alphazed's use of shapes in two
layers that made it an obvious win, even without any other
modules needing atan2() per-pixel with a centered origin.
With this commit:
Configured settings as flags:
  --seed=0x64adabae '--module=compose,layers=blank\,shapes\\\,type\\\=pinwheel\\\,scale\\\=.9\\\,pinch\\\=.25\\\,pinch_spin\\\=.1\\\,pinches\\\=7\\\,points\\\=19\\\,spin\\\=.01\,shapes\\\,type\\\=pinwheel\\\,scale\\\=.9\\\,pinch\\\=1\\\,pinch_spin\\\=-.25\\\,pinches\\\=8\\\,points\\\=5\\\,spin\\\=0,texture=moire\,centers\=3' '--video=mem,size=1366x768'
  FPS: 73
  FPS: 74
  FPS: 73
Without:
Configured settings as flags:
  --seed=0x64adb857 '--module=compose,layers=blank\,shapes\\\,type\\\=pinwheel\\\,scale\\\=.9\\\,pinch\\\=.25\\\,pinch_spin\\\=.1\\\,pinches\\\=7\\\,points\\\=19\\\,spin\\\=.01\,shapes\\\,type\\\=pinwheel\\\,scale\\\=.9\\\,pinch\\\=1\\\,pinch_spin\\\=-.25\\\,pinches\\\=8\\\,points\\\=5\\\,spin\\\=0,texture=moire\,centers\=3' '--video=mem,size=1366x768'
  FPS: 55
  FPS: 54
  FPS: 54
So it's significant, and in alphazed there's also a transition
from one scene with two full-screen shapes layers into a
checkered scene with shapes as the fill_module.  Further
amplifying the payoff..  infact whenever shapes is used for a
fill_module in checkers, there's n_cpus shapes contexts created
because checkers is threaded.  All of those would be benefitting
from the cache. | 
|  | What this really should have is a general memset32()
implementation, but for now it's a small improvement at least for
clearing since that's known to be a 0-pixel case where memset()
is applicable.
memset() isn't usable for actual RGBA pixels since it operates
strictly on bytes.  There's some hacks for using memmove() to get
closer where you store the first four bytes, then memove the
start of the buffer to the remaining, doubling the size of the
move each time for log(n) calls to memmove.  Maybe that makes
sense as a stop-gap for the general case, it needs
implementing+benchmarking first.
This commit is dead simple though and an obvious little FPS gain
for clearing. | 
|  | When profiling compositions it's convenient to suppress some
modules in the settings to see how their omission affects the
FPS.
There's already a "blank" builtin which clears the fragment, but
there's no way to turn the cost to ~zero.  This "noop" builtin
does just that, absolutely nothing - it won't even trigger the
implicit "til_fb_fragment_t.cleared = 1" update so whatever
rendering follows will still find an uncleared fragment if that's
what came into the noop. | 
|  | This moves the core til_module_setup() functionality out into a
more composable helper with an eye towards all meta-modules using
it for doing their module setup instead of all the ad-hoc stuff
occurring presently.
Since til_module_setup_full() needs more paramaters to influence
its behavior, it's not directly usable as a "setup_func".  But is
trivially wrapped into one, which is what til_module_setup() has
now become.
It's in the wrapper that users are expected to do specify what's
necessary for influencing til_module_setup_full() to do what's
appropriate for their needs. | 
|  | The color stuff as-is isn't cheap and doesn't even get used if
there's a texture present, so don't bother with it at all.
This is especially significant since this module isn't threaded,
so it ties up all the cores leaving most of them idle when part
of a composition.
Also since spiro doesn't clamp its coordinates to the fragment
dimensions, only considering the frame dimensions, it must
continue using the "checked" put_pixel variant... | 
|  | This seems to make things work well enough for mingw+wine
Will probably revisit in the future.  Adding an ewouldblock
helper rather than duplicating the ifdeffery seems likely
Let's just leave it like this for now and find out if a real
windows test succeeds | 
|  | This is available in win32, unlike inet_aton() | 
|  | just quick and nasty hacking to make win32 build again | 
|  | This stows the duration (in ticks (which are ms for now)) of a
given module context's most recent, as well as tracking the max,
in til_module_context_t.
For now it's also added to --print-module-contexts output, but
this is still rather primitive and nearly inscrutible in
practice...  that output is a flickery and unsorted mess during
playback.  But this is just a start.
Ticks may need to move up to microseconds if it'll also be the
units for measuring timings.  Right now things are slow enough
that the low-hanging fruit stand out as multiple if not dozens of
milliseconds so it's still useful for now. | 
|  | In the early days of checkers when I introduced fill_module= with
the per-cpu contexts to still allow threaded rendering, the whole
seed passing to contexts thing wasn't as well sorted out.  This
meant the contexts often produced vastly different outputs
despite being the same module, same seed, and same settings.
The consequence of that was that w/fill_module checkers would
produce crazy randomized output when you expected the same output
in the filled cells.  But by using .cpu_affinity (which I had to
implement just for this use case actually) at least the different
outputs would become stable.  It was a band-aid over a different
problem that still needed sorting out.
Nowadays, it seems like this is improved enough at least for
alphazed to look correct without the affinity hack, so I'm
removing it because it really kills checkers threaded
performance.
Whatever modules remain uncooperative WRT seed reproducibility,
they'll just need to be fixed up. | 
|  | While here also did more minor optimizations moving things out of
the inner loops that were only there out of laziness... | 
|  | Taken from this excellent post:
https://mazzo.li/posts/vectorized-atan2.html
While I haven't gone full vectorized, just getting rid of the
regular atan2f() call will be a big improvement.
This just adds the functionality, nothing is calling it yet. | 
|  | Thin wrapper around gettimeofday(), prolly change the main ticks
stuff over to this too. | 
|  | This is just a first stab at threading shapes... whenever shapes
finds itself in a scene it easily becomes a significant
bottleneck, threading is a trivial path to improving that
somewhat.
While at it I also got rid of the need for _checked() variants
which also helps a bit.
But nothing here is proper optimization of the routines.. there's
too much math happening per pixel in a naive fashion. | 
|  |  | 
|  | trivial simplification | 
|  | Silly typo, one of those fun C instances where it's surprising
how silently mostly-working such a blatant mistake can be.
For posterity:
The way this was even observed as having an affect is while
verifying graceful handling of connections broken while in the
listen backlog.
With an active scener session idle at the prompt, start another
telnet, connecting without receiving any banner (queued via
backlog), ^]cl that backlogged telnet.   Then start another
telnet in the same way.  Now go to the idle scener session and
quit.  The latest telnet would just sit there, seemingly blocked
behind the broken-while-backlogged connection.
But what was really happening was the banner send got the error
on the broken connection after accepting, as you'd expect.  This
bug in the errno tests prevented detecting the genuine error
though, leaving the broken session connected indefinitely.
Fun! | 
|  | After putting the recv() in a for(;;) to not have to render a
frame per byte received, I completely dropped the ball on moving
the return and adding the continue to actually finish the change.
This makes creating new scenes via pasting long settings strings
far less laggy.  A future improvement would be to not recv() a
byte at a time, but this really isn't a perf-sensitive thing. |