Age | Commit message (Collapse) | Author |
|
computing the angle for every pixel coordinate from the origin is
costly, even with the approximate method added by c690303.
An easy speedup is to only do this once for a given frame
dimensions, and cache those results. In the form of a 32-bit
float, it's equivalent to caching a full page of pixel data.
This is slightly complicated by needing to be an effectively
global cache and the potential for multiple shapes contexts
rendering concurrently when part of a composition.
I think this particular situation highlights a need for something
equivalent generalized on-stream where modules can register
discoverable caches of costly to compute information, having a
high probability of being useful to others.
In this particular case it was alphazed's use of shapes in two
layers that made it an obvious win, even without any other
modules needing atan2() per-pixel with a centered origin.
With this commit:
Configured settings as flags:
--seed=0x64adabae '--module=compose,layers=blank\,shapes\\\,type\\\=pinwheel\\\,scale\\\=.9\\\,pinch\\\=.25\\\,pinch_spin\\\=.1\\\,pinches\\\=7\\\,points\\\=19\\\,spin\\\=.01\,shapes\\\,type\\\=pinwheel\\\,scale\\\=.9\\\,pinch\\\=1\\\,pinch_spin\\\=-.25\\\,pinches\\\=8\\\,points\\\=5\\\,spin\\\=0,texture=moire\,centers\=3' '--video=mem,size=1366x768'
FPS: 73
FPS: 74
FPS: 73
Without:
Configured settings as flags:
--seed=0x64adb857 '--module=compose,layers=blank\,shapes\\\,type\\\=pinwheel\\\,scale\\\=.9\\\,pinch\\\=.25\\\,pinch_spin\\\=.1\\\,pinches\\\=7\\\,points\\\=19\\\,spin\\\=.01\,shapes\\\,type\\\=pinwheel\\\,scale\\\=.9\\\,pinch\\\=1\\\,pinch_spin\\\=-.25\\\,pinches\\\=8\\\,points\\\=5\\\,spin\\\=0,texture=moire\,centers\=3' '--video=mem,size=1366x768'
FPS: 55
FPS: 54
FPS: 54
So it's significant, and in alphazed there's also a transition
from one scene with two full-screen shapes layers into a
checkered scene with shapes as the fill_module. Further
amplifying the payoff.. infact whenever shapes is used for a
fill_module in checkers, there's n_cpus shapes contexts created
because checkers is threaded. All of those would be benefitting
from the cache.
|
|
What this really should have is a general memset32()
implementation, but for now it's a small improvement at least for
clearing since that's known to be a 0-pixel case where memset()
is applicable.
memset() isn't usable for actual RGBA pixels since it operates
strictly on bytes. There's some hacks for using memmove() to get
closer where you store the first four bytes, then memove the
start of the buffer to the remaining, doubling the size of the
move each time for log(n) calls to memmove. Maybe that makes
sense as a stop-gap for the general case, it needs
implementing+benchmarking first.
This commit is dead simple though and an obvious little FPS gain
for clearing.
|
|
When profiling compositions it's convenient to suppress some
modules in the settings to see how their omission affects the
FPS.
There's already a "blank" builtin which clears the fragment, but
there's no way to turn the cost to ~zero. This "noop" builtin
does just that, absolutely nothing - it won't even trigger the
implicit "til_fb_fragment_t.cleared = 1" update so whatever
rendering follows will still find an uncleared fragment if that's
what came into the noop.
|
|
This moves the core til_module_setup() functionality out into a
more composable helper with an eye towards all meta-modules using
it for doing their module setup instead of all the ad-hoc stuff
occurring presently.
Since til_module_setup_full() needs more paramaters to influence
its behavior, it's not directly usable as a "setup_func". But is
trivially wrapped into one, which is what til_module_setup() has
now become.
It's in the wrapper that users are expected to do specify what's
necessary for influencing til_module_setup_full() to do what's
appropriate for their needs.
|
|
The color stuff as-is isn't cheap and doesn't even get used if
there's a texture present, so don't bother with it at all.
This is especially significant since this module isn't threaded,
so it ties up all the cores leaving most of them idle when part
of a composition.
Also since spiro doesn't clamp its coordinates to the fragment
dimensions, only considering the frame dimensions, it must
continue using the "checked" put_pixel variant...
|
|
This seems to make things work well enough for mingw+wine
Will probably revisit in the future. Adding an ewouldblock
helper rather than duplicating the ifdeffery seems likely
Let's just leave it like this for now and find out if a real
windows test succeeds
|
|
This is available in win32, unlike inet_aton()
|
|
just quick and nasty hacking to make win32 build again
|
|
This stows the duration (in ticks (which are ms for now)) of a
given module context's most recent, as well as tracking the max,
in til_module_context_t.
For now it's also added to --print-module-contexts output, but
this is still rather primitive and nearly inscrutible in
practice... that output is a flickery and unsorted mess during
playback. But this is just a start.
Ticks may need to move up to microseconds if it'll also be the
units for measuring timings. Right now things are slow enough
that the low-hanging fruit stand out as multiple if not dozens of
milliseconds so it's still useful for now.
|
|
In the early days of checkers when I introduced fill_module= with
the per-cpu contexts to still allow threaded rendering, the whole
seed passing to contexts thing wasn't as well sorted out. This
meant the contexts often produced vastly different outputs
despite being the same module, same seed, and same settings.
The consequence of that was that w/fill_module checkers would
produce crazy randomized output when you expected the same output
in the filled cells. But by using .cpu_affinity (which I had to
implement just for this use case actually) at least the different
outputs would become stable. It was a band-aid over a different
problem that still needed sorting out.
Nowadays, it seems like this is improved enough at least for
alphazed to look correct without the affinity hack, so I'm
removing it because it really kills checkers threaded
performance.
Whatever modules remain uncooperative WRT seed reproducibility,
they'll just need to be fixed up.
|
|
While here also did more minor optimizations moving things out of
the inner loops that were only there out of laziness...
|
|
Taken from this excellent post:
https://mazzo.li/posts/vectorized-atan2.html
While I haven't gone full vectorized, just getting rid of the
regular atan2f() call will be a big improvement.
This just adds the functionality, nothing is calling it yet.
|
|
Thin wrapper around gettimeofday(), prolly change the main ticks
stuff over to this too.
|
|
This is just a first stab at threading shapes... whenever shapes
finds itself in a scene it easily becomes a significant
bottleneck, threading is a trivial path to improving that
somewhat.
While at it I also got rid of the need for _checked() variants
which also helps a bit.
But nothing here is proper optimization of the routines.. there's
too much math happening per pixel in a naive fashion.
|
|
|
|
trivial simplification
|
|
Silly typo, one of those fun C instances where it's surprising
how silently mostly-working such a blatant mistake can be.
For posterity:
The way this was even observed as having an affect is while
verifying graceful handling of connections broken while in the
listen backlog.
With an active scener session idle at the prompt, start another
telnet, connecting without receiving any banner (queued via
backlog), ^]cl that backlogged telnet. Then start another
telnet in the same way. Now go to the idle scener session and
quit. The latest telnet would just sit there, seemingly blocked
behind the broken-while-backlogged connection.
But what was really happening was the banner send got the error
on the broken connection after accepting, as you'd expect. This
bug in the errno tests prevented detecting the genuine error
though, leaving the broken session connected indefinitely.
Fun!
|
|
After putting the recv() in a for(;;) to not have to render a
frame per byte received, I completely dropped the ball on moving
the return and adding the continue to actually finish the change.
This makes creating new scenes via pasting long settings strings
far less laggy. A future improvement would be to not recv() a
byte at a time, but this really isn't a perf-sensitive thing.
|
|
This changes things so rkt won't exit with an error @ startup if
RocketEditor isn't already listening. It also tolerates
RocketEditor going away, and will show a "OFFLINE" overlay status
text should that happen w/connect=on.
Some status text has also been added to the "EXIT SCENE" 99999
scene for both the RocketEditor connection and the scener
enabled/disabled status. No indicator yet for if scener has a
connection though, only if it's listening or not via listen=on.
|
|
Remove spurious space
|
|
|
|
Trivial related indentation adjustment too
|
|
This adds a BBS-style interface for creating new scenes in a live
rkt session.
It listens on tcp port 54321 on localhost by default, just use
telnet to connect, the rest is fairly self-explanatory.
This is still early days, but it's a whole lot more than nothing.
|
|
Returning the failed desc was just a lazy half-assed thing that
was sort of the best option in the simpler, pre-paths world.
But now that everything has paths thanks to recursive settings,
let's just return the path to the failed setting's desc.
This conveniently gets rid of a UAF bug when setup returned the
setting->desc as the failed desc, and main would print the desc
*after* freeing all the settings in its final moments.
But the best part is now more of the errors parsing settings
should be accompanied by an illuminatingly relevant setting path.
Previously you'd at best get a bare key from the desc, but often
no failed desc was returned at all and you saw no guidance at
all. But with the recent improvements to the setup error
handling I think those cases should be few if not entirely
eliminated.
|
|
It was too coarse a condition filtering the list on !name.
This change actually looks up non-NULL names to verify it's a
resolvable module name before producing the unfiltered list.
Mostly just to prevent fat-fingered manual module inputs from
resulting in unexpectedly larger and potentially dangerous module
lists in interactive "invalid input" retries. Observed in the
WIP rkt_scener stuff...
|
|
It just happened to be tolerable that the first setting in these
randomized settings instances wasn't ever getting described...
due to the ad-hoc nature of how rtv used them.
But in preparation for rkt_scener which will have "randomize
scene" functionality, as well as interactive (re)configuring of
the scenes, these settings instances need to be more correct to
not break things.
It seems awkward to be duplicating this "Renderer module" desc
here however. But I just want to make progress on the scene
editor for rkt, I suspect this will be revisited in the future.
On a high-level it feels like til_module_setup() should be the
thing producing that desc, and the randomizer just picking the
random settings through that setup func.
|
|
With the "ref" builtin module established and seeming to work
well enough, it looks unlikely we'll need access to unresolvable
module names in the contexts.
The thinking originally was that these names might have special
syntax making them more generic. E.g something like
"@/module/rkt/scenes/[1]/drizzle" for a module_name would have
been supported, which would get resolved either at context create
or even later (as in the ref builtin) at render time.
But the ref builtin is using a path setting, so module names just
stay module names.
Maybe in the future a special syntax will be added for brevity
reasons, but it does make the code more complicated vs. module
names just being names and resolving them entirely at setup time.
Anyhow, this commit does away with the module_name in the
context's scenes. You can still access it via
til_module_context_t.module.name anyways... it's basically just a
resolution-of-name-to-context time constraint that's being
codified now.
|
|
This is just temporary until all the setup_funcs always return a
res_setting on -EINVAL.
Currently they only cover this requirement when res_setup is
NULL...
|
|
fix desc leak
|
|
In the interests of improving error handling of interactive
setups, return the setting that was invalid when setup returns
-EINVAL.
For now this is only supported for non-finalized/non-baking setup
calls, i.e. (!res_setup).
In the future I want this also supported for finalizing when
res_setup is set. A lot of the -EINVAL returns are actually
during that stage (or need to be added) due to that being when
the more sensitive parsing occurs going from strings to native
numeric types and such.
The main reason it's not already being done in this commit is
it'll churn quite a bit to get there, since the setup funcs don't
generally have the setting pointers onhand at that phase. It'll
require changing the string-value-centric local variables to
instead all be the til_setting_t* holding those values.
And if setup funcs aren't going to be so value-centric, the
settings API will likely change to not even return the values
directly anymore and only return the full-blown til_settings_t as
the main return pointer, perhaps rid of their res_setting use
even.
Anyway, while here I did some random little cleanups too.
|
|
There was only til_setting_desc_fprint_path() (for
setup_interactively()), but for rkt_scener this stuff has to go
out the socket and the queueing mechanism is built around
til_str_t so let's get a til_str_t-centric variant in there.
|
|
This was a UAF bug when the context being freed contained a
driving tap... just careless placement of the untap call.
It needs to occur before destroying the context, because module
contexts are often used to store the til_tap_t instances.
|
|
When setting up a nested module, the HERMETIC modules shouldn't
be considered as eligible nor should one be the preffered option
(rtv).
This uses the presence of a parent on the settings as a heuristic
for if it's a nested scenario (no parent == root). When nested
the default changes to "compose" from "rtv", and HERMETIC modules
are omitted from the listed values.
|
|
This was assuming non-NULL res_setup which would assert in
til_module_setup_finalize() which doesn't expect a NULL
res_setup since that's its entire purpose.
It's not a bug that was actually triggered by any callers (yet)
|
|
trivial change
|
|
I get why I called this seed at the time as it was the starting
string of a potentially larger buildup... but it's just
ambiguous naming with the other more descriptive uses of seed in
the tree, and isn't even really appropriate.
|
|
In rkt_scener it's desirable to fully construct a new settings
instance *before* wiring it into the parent rkt/scenes settings.
But the way these positional settings labels get constructed
depends on the parent's settings entries. So it becomes a
chicken-egg problem to require the proper label @ creation before
its containing setting is even added in the parent's entries.
With this change a temporary WIP label can be used while
constructing the settings instance, then amend it once we know
the settings instance is complete and ready to join the party.
|
|
Introduce til_settings_get_parent(), impetus being so
til_module_setup() can infer if it's hermetic-appropriate or not.
|
|
It's more ergonomic more often to behave consistently with
strlen() here, plus it's just the established mental model.
While here I made til_settings_path_as_buf() private as nothing
external uses it, it's essentially just a logically distinct
private helper function from the public wrappers around it.
Dragged into this changeset due to clarifying some
naming/semantics as it's one of the few til_str_to_buf() callers.
But nobody was actually passing a non-NULL res_bufsz/res_len to
it anyways, as its use is minimal.
|
|
Helper for trimming off a trailing CRNL or NL if present
Clearly I once knew perl if this is the name that came to mind
|
|
You can't just reuse the ap in multiple calls to vsnprintf
without restarting... fixed in the obvious way
|
|
Make this a distinct heap allocation so it can be enlarged when
editing the scenes... (preparatory commit for scenes editing)
|
|
Preparatory commit for adding an interactive scene editing server
of sorts. It'll go in a separate listing, but needs these types
as it'll operate on rkt_context_t->scenes[].
|
|
I think this got messed up in the shift to libtil
|
|
See previous commits re: drm_fb/mem_fb
|
|
Let's some a more user-friendly ways of exiting...
|
|
til_fb_fragment_slice_single() and indirectly
til_fragmenter_slice_per_cpu() could get into infinite loops when
slicing small fragments into many slices.
This became more likely with commit a2f7397 which increased
per_cpu slice counts by 16X, which is how I tripped over it
running rtv. A checkers,size=8,fill_module=moire sent things
spinning...
This commit prevents it in the obvious manner.
|
|
See previous commit re: mem_fb
|
|
Snapshotting made page alloc/free more frequent, and
modules/mixer really hammers on snapshots in fade mode... drizzle
as well, but mixer snapshots both module outputs per frame.
So by keeping these around the reclaimed page snapshots can be
held onto at within the fb as spares for quicker allocation. In
practice it should just hit a high water mark of a working set
for spare pages, once in a steady state, assuming snapshots
aren't leaking.
Future commits will replicate this change in {drm,sdl}_fb
|
|
Slight improvement of CPU utilization for fragmenters using this
strategy...
I noticed tile64 would give better FPS in some scenarios where it
seemed obvious slice_per_cpu() was the appropriate option. And
that turned out to just be by virtue of being able to give idle
threads something to do while busy ones finished what was on
their plate.
So just make the slices a bit more granular than n_cpus... this
may have to be revisited in the future to find the sweet spot,
and may need to be more sophisticated than just multiplying by a
constant factor.
|