Age | Commit message (Collapse) | Author |
|
While the montage doesn't deeply thread the per-tile/module rendering,
the per-frame rendering is threaded with a work unit granularity of
every module's tile.
Meaning every module renders its tile in a single thread, but the
tiles are all rendered in parallel.
For the most part this works, and will only work better as more modules
are added to rototiller increasing the granularity. In the mean time
it's a bit coarse and some modules can be a lot more costly to render
than others, and there can be a shortage of modules to schedule on idle
CPUs.
It would be an interesting task to try make each module's tile get
subfragmented elastically. I didn't make any attempt to do that, but
it might even be worthwhile on hidpi screens where even those small
tiles may have a whole lot of pixels, especially on manycore CPUs.
|
|
|
|
This requires a forward declaration of v3f_t and changing din()
to take a v3f_t *.
The swab module needed updating to supply a pointer type and a
v3f_t definition.
This is being done so din.h users can have their own v3f
implementations. I might consolidate all the duplicated vector
code scattered throughout the libs and modules, but for now I'm
carrying on with the original intention of having modules be
largely self-contained. Though the introduction of libs like
ray and din has certainly violated that a bit already.
|
|
Since snow_context_t needs another member anyways, stick n_cpus in there
to inform the fragmenter of precisely how many fragments to make.
This renderer doesn't benefit from tiling or any such locality, so it uses
the slice fragmenter and really only benefits from as many fragments as there
are CPUs. Any additional fragments is just wasted fragmenting overhead.
|
|
Snow was already threaded, but used a global seed with rand_r() meaning
the CPUs were hammering on the same address. There wasn't any locking or
atomics, as it isn't terribly critical when generating white noise if the
seed access is racy. But the writes still caused cache lines to ping-pong.
This commit gives a ~15.5% speedup in my measurements on an i7-2640M.
Note without the padded union, using just an array of ints, zero gain
is realized. I used a padding of 256 just to have some headroom, x86
is 64 but other CPUs vary, POWER9 is 128 for example.
|
|
This repurposes the generic fb_fragment_tile_single() to better fit
the montage use case.
Partially covered areas are simply skipped, and tiles no longer need
to be square.
In determining the tile width and height, I'm just using the sqrt of
the number of modules and dividing the frame width and height. But
when the sqrt has a fraction, it's rounded up on the width divide
and rounded down on the height divide. So the width gets the extra
column of tiles, and the height just throws away the fraction.
I think it's OK for now, until someone gets a bug up their ass and
wants to avoid having vacant tiles in the bottom right corner when
the number of modules doesn't cooperate.
One problem with the just skipping partially covered areas is they
don't get zeroed out - no fragment is ever generated for them.
To fix that there will prolly have to be a fb_fragment_zero() of
the frame @ prepare :(. I guess it wouldn't be the end of the
world if the fragmenter itself zeroed out skipped regions. It's
kind of an ugly layering violation but this is a private montage-
specific fragmenter.
|
|
The old approach was just to get things working, it's preferable
to not have empty tiles on-screen where modules were skipped and
have all tiles be smaller to accomodate vacancies.
Now the modules list gets pruned @ context create, so the skipping
only happens once and everywhere else is looking at a modules list
and count of only the keepers.
I also added stars to the skipped modules, for now, since both stars
and pixbounce malfunction when the fragment size changes.
|
|
Not doing this produces especially visible artifacts when shown
by rtv.
|
|
Segfaults were observed when montage came up in rtv, since pixbounce
doesn't seem to be rendering properly at all just skip it for now.
I suspect what's happening is rtv ran pixbounce before running montage,
and pixbounce caches fragment knowledge @ initialization. So when
montage ran pixbounce in a tile, that stale fragment knowledge was used
and caused scribbling.
Stars probably has similar problems actually.
|
|
This is somewhat unfinished as it uses the generic tiled fragmenter
that's not interested in appearances but prioritizes total coverage
and simplicity.
Montage should have its own tiler that can produce non-square and even
non-uniform tile dimensions, prioritizing filling the screen with
mostly-uniform tiles.
But that's a TODO item, this is good enough for now and exercises some
fragment details previously irrelevant and often ignored/broken in
modules.
The pixbounce module in particular seems completely broken with small
fragment sizes.
|
|
Mechanical change removing abbreviation for consistency
|
|
Mostly mechanical change, though threads.c needed some jiggering to
make the logical cpu id available to the worker threads.
Now render_fragment() can easily addresss per-cpu data created by
create_context().
|
|
Back in the day, there was no {create,destroy}_context(), so passing
num_cpus to just prepare_frame made sense. Modules then would
implicitly initialize themselves on the first prepare_frame() call
using a static initialized variable.
Since then things have been decomposed a bit for more sophisticated
(and cleaner) modules. It can be necessary to allocate per-cpu data
structures and the natural place to do that is @ create_context(). So
this commit wires that up.
A later commit will probably have to plumb a "current cpu" identifier
into the render_fragment() function. Because a per-cpu data structure
isn't particularly useful if you can't easily address it from within
your execution context.
|
|
This makes the visualization more interesting by adding more variety.
|
|
To facilitate random setting of these flexible string-oriented
settings, support a random helper supplied with the description.
This helper would return a valid random string to be used with the
respective setting being described.
Immediate use case is the rtv module, which also gets fixed up to
use it in this commit.
|
|
This is just a quick stab at randomizing settings, only multiple
choice setings are randomized currently.
For modules with settings, a new Settings: field is added to the
caption showing the settings as the arguments one would pass to
rototiller's module argument.
|
|
This maps a different Z-slice through the noise field to each color
channel. The slices are moved up and down through the field over
time, and the size of the area each color samples is tweaked a bit
to make them less coherent with the noise field cells.
It could be improved, but I think the output is already neat enough
to be worth sharing.
|
|
Consolidate the time() calls in setup_next_module() by using a now
variable.
|
|
This broke when snow was added.
|
|
The idea is to have captions similar to how MTV did back in the 80s.
It'd be nice to make the text resolution independent, but this is a
good first stab for an afternoon of tooling around.
|
|
This simplifies the bsp code while addressing cleanup
|
|
This assert prevents using the chunker for efficient freeing,
maybe in the future add a flag for toggling this but for now
it can just be commented out.
|
|
particles_free() didn't do all the necessary cleanup.
bsp_free() remains mostly unimplemented. I think this wasn't done at
the time because I was thinking bsp.c should use the chunker, then
cleanup is just a matter of freeing the chunker instead of traversing
the bsp.
|
|
This uses the newly added snow module as a transition between modules
|
|
I wanted to add some noise to the rtv module and figured why not
just add a snow module and make rtv pass through it briefly when
switching modules.
It's not interesting by itself, but as more composite/meta modules
like rtv get made it might be handy beyond rtv.
|
|
This is sort of a meta renderer, as it simply renders other
modules in its prepare_frame() stage. They're still threaded
as the newly public rototiller_module_render() utilizes the
threading machinery, it just needs to be called from the serial
phase @ prepare_frame().
I'm pretty sure this module will leak memory every time it changes
modules, since the existing cleanup paths for the modules hasn't
needed to be thorough in the least. So that's something to fix
in a later commit, go through all the modules and make sure their
destroy_context() entrypoints actually cleans everything up.
See the source for some rtv-specific TODOs.
|
|
color banding has been quite visible, and somewhat expected with a
direct conversion from the linear float color space to the 8-bit
integral rgb color components.
A simple lookup table is used here to non-linearly map the values, table
generation is taken from Greg Ward's REAL PIXELS gem in Graphics Gems II.
|
|
Initially I was going to make 32 vs. 64 be a setting, but decided
now that SDL is supported it's fairly likely there will be odd fb
dimensions (arbitrary window sizes). Since this never really brought
anything of significant value, just drop the version that mostly
just demonstrated how to pack multiple pixels into a single u64 write
to the framebuffer more than anything else.
|
|
This removes the submit-softly module, instead using a runtime
setting to toggle bilinear interpolation on the submit module.
|
|
|
|
|
|
Viscosity and diffusion are supported, it'd be neat to add a
configurable size (the ROOT define) for the flow field in the
future.
I didn't go crazy here, it's just a list of orders of magnitude you
choose from for each. It'd probably be more interesting to change
this into a single knob with descriptive names like "smoke" "goop"
"water" mapping to a LUT.
|
|
s/Joe/Jos/, I should wear my glasses more.
|
|
This implements near verbatim the code found in the paper titled:
Real-Time Fluid Dynamics for Games
By Jos Stam
It sometimes has the filename GDC03.PDF, or Stam_fluids_GDC03.pdf
The density field is rendered using simple linear interpolation of
the samples, in a grayscale palette. No gamma correction is being
performed.
There are three configurable defines of interest:
VISCOSITY, DIFFUSION, and ROOT.
This module is only threaded in the drawing stage, so basically the
linear interpolation uses multiple cores. The simulation itself is
not threaded, the implementation from the paper made no such
considerations.
It would be nice to reimplement this in a threaded fashion with a
good generalized API, then move it into libs. Something where a unit
square can be sampled for interpolated densities would be nice.
Then extend it into 3 dimensions for volumetric effects...
|
|
Remove the silly kludge avoiding peripheral cells
|
|
|
|
This substantially reworks the cell sampling in submit.
As a result, it's now threaded in the rendering phase which now
resembles a texture mapper sans transformations.
This produces a full-screen rendering rather than a potentially
smaller one when the resolution wasn't cleanly divisable by the grid
size.
A new module, named submit-softly has also been added to expose the
bilinearly interpolated variant. The transition between cells is also
employing a smoothstep so it's not actually linear.
The original non-interpolated version is retained as well, at the same
submit module name.
Some minor cleanups happened as well, nothing worth mentioning, except
perhaps that the cells are now a uint8_t which is fine unless someone
tries to redefine NUM_PLAYERS > 255.
|
|
Just making things consistent, also dropping unnecessary player
assert from submit module. Future libs/grid may explore using
the "unassigned" zero player in taken calls for unassigning
cells like in simultaneously taken collision scenarios.
|
|
|
|
This module displays realtime battle for domination simulated
as 2D cellular automata.
This is just a test of the backend piece for a work-in-progress
multiplayer game idea. The visuals were kind of interesting to
watch so I figured may as well merge it as a module to share.
Enjoy!
PS: the results can vary a lot by tweaking the defines in submit.c
|
|
Rather than require adding -Isrc/libs/$lib to every Makefile.am for
every lib used, just add -Ilibs to those makefiles and prefix the lib
dir in the #include <> header paths.
Later I'll probably just move the -Isrc/libs someplace common so the
per-module Makefile.am doesn't need to bother with this stuff.
|
|
This is the first step of breaking out all the core rendering stuffs
into reusable libraries and making modules purely compositional,
consumers of various included rendering/effects libraries.
Expect multiple modules leveraging libray for a variety of scenes and
such. Also expect compositions mixing the various libraries for more
interesting visuals.
|
|
|
|
Fixes silly cosmetic error in configure output for checking libdrm...
|
|
also const the ray_euler_t basis
|
|
|
|
This moves the per-object _prepared state into ray_render_object_$type
structs with all the rendering-related object methods switched to
operate on the new render structs.
Since the current rendering code just makes all these assumptions
about light objects being point lights, I've just dropped all the
stuff associated with rendering light objects for now. I think it
will be refactored a bit later on when the rendering code stops
hard-coding the point light stuff.
These changes open up the possibility of constifying the scene and
constituent objects, now that rendering doesn't shove the prepared
state into the embedded _prepared object substructs.
|
|
This introduces ray_render_t, and ray_render.[ch].
The _prepared member of ray_scene_t has been moved to ray_render_t,
and the other _prepared members (e.g. objects) will follow.
Up until now I've just been sticking the precomputed state under
_prepared members of their associated structures, and simply using
convention to enforce anything resembling an api boundary. It's
been convenient without being inefficient, but I'd like to move
the ray code into more of a reusable library and this wart needs
to be addressed.
The render state is also where any spatial indexes will be built
and maintained, another thing I've been experimenting with.
Note most of the churn here is just renaming ray_scene.c to
ray_render.c. A nearly global s/ray_scene/ray_render/ has occurred,
now that ray_scene_t really only serves as glue to bind objects,
lights, and scene-global properties into a cohesive unit.
|
|
Before I can clean up the ray_scene_t._prepared kludge I need a
place to keep state from frame prepare to render, enter context.
Future commits will migrate the _prepared stuff into a separate
ray_render_t which is constructed on prepare then acted on in
fragment render.
Then spatial acceleration structures may be added, constructed
at prepare phase and shared across the concurrent rendering.
|
|
Remove some extraneous indentation
|