Age | Commit message (Collapse) | Author |
|
Trivial optimization eliminates some instructions from the hot path,
no need to maintain a separate index from the current object pointer.
|
|
Previously every fb_fragment_t (and thus thread) was constructing
its own ray_camera_frame_t view into the scene, duplicating some
work.
Instead introduce ray_camera_fragment_t to encapsulate the truly
per-fragment state and make ray_scene_render_fragment() operate
on just this stuff with a reference to a shared
ray_camera_frame_t prepared once per-frame.
Some minor ray_camera.c cleanups sneak in as well (prefer multiply
instead of divide, whitespace cleanups...)
|
|
Currently fragments always start at the left edge of the frame, but
when switching to a tiling fragmenter this is no longer true and
causes visible errors.
|
|
ray:object intersection coordinates were incorrectly being computed
relative to the ray origin using a subtraction instead of addition, a
silly mistake with surprisingly acceptable results. Those results
were a result of other minor complementary mistakes compensating to
produce reasonable looking results.
In the course of experimenting with an acceleration data structure it
became very apparent that 3d space traversal vectors were not behaving
as intended, leading to review and correction of this code.
|
|
|
|
For now, a simple cpu multiplier of 64 is used.
fb_fragment_t needs a tiling fragment divider added...
|
|
Instead of creating fragment lists striped across available threads
uniformly in a round-robin fashion, just have the render threads iterate
across the shared fragments array using atomics.
This way non-uniform cost of rendering can be adapted to, provided the
module prepares the frame with sufficient fragment granularity.
In the ray tracer for example, it is quite common for some areas of the
screen to have lower complexity/cost than others. The previous model
distributed the fragments uniformly across the threads with no ability for
underutilized threads to steal work from overutilized threads in the event
of non-uniform cost distributions.
Now no attempt to schedule work is made. The render threads simply race
with eachother on a per-frame basis, atomically incrementing a shared
index into the frame's prepared fragemnts. The fragment size itself
represents the atomic work unit.
A later commit will change the various renderers to prepare more/smaller
fragments where appropriate. The ray tracer in particular needs more and
would probably further benefit from a tiling strategy, especially when
an acceleration data structure is introduced.
|
|
Small speedup, I personally find the code cleaner this way too.
Everything in the hot path should now be inlined, no function calls.
|
|
We can just assume the object which reflected the ray being tracing
isn't going to be intersected. Maybe later this assumption no longer
holds true, but it is true for now.
|
|
This gets rid of some computation on the primary ray:plane intersection tests
The branches on depth suck though... I'm leaning towards specialized primary
ray intersection test functions.
|
|
This gets rid of some computation on the primary ray:plane intersection tests
|
|
To enable prepare to precompute aspects of primary rays which all have a
common origin at the camera, bring the camera to ray_object*_prepare() and
bring the depth to ray_object*_intersects_ray() for primary ray detection.
This is only scaffolding, functionally unchanged.
|
|
This may need to be undone in the future when more sophisticated lights,
like area lights, are implemented. For now I can avoid polluting the
objects list with the lights by strictly separating them.
|
|
Remove unnecessary nearest_object check, the distance comparison alone
is sufficient when initialized to INFINITY.
|
|
Just tidying up shade_ray() before more optimizations.
|
|
Trivially removes a ray_3f_mult_scalar() from the hot path.
|
|
We can avoid some unnecessary work at the max depth by checking it in
shade_ray() instead.
|
|
|
|
This is functionally identical.
|
|
This function isn't currently used, but its implementation was awful.
|
|
Need to normalize the direction when we step the y axis and @ start.
|
|
powf() is slow but precise, this isn't the fastest method but it's
at least portable and a bit faster.
|
|
It's only necessary to normalize the direction stored vector in x_step(),
the rest can simply be linearly interpolated which saves some divides.
|
|
Simple optimization taking advantage of the prepare, mults generally
are cheaper than divs.
|
|
Just embed a _prepared struct in the object where precomputed stuff can be
cached. Gets called once before rendering, which ends up calling the
object-specific ray_object_$type_prepare() methods per object.
|
|
Prior to rototiller_module_t these headers were included
and the module-specific render functions called directly.
That's no longer the case, these files are irrelevant today.
|
|
This moves most of the particle system maintenance into the serially
executed sparkler_prepare_frame(), divides the frame into ncpus
fragments, and leaves the draw to occur concurrently.
The drawing must still currently process all particles and simply skips
drawing those falling outside the fragment.
Moving more of the computation out of prepare_frame() and into
render_fragment() is left for future improvements, as it's a bit
complex to do gainfully.
|
|
should_draw_expire_if_oob() assumed the fragment represented the entire
frame. Instead, return 0 if the coordinates are outside the fragment,
but only reset longevity if outside of the frame.
If sparkler goes threaded in the drawing, this would result in threads
simply skipping particles outside the fragment. The longevity reset
occurring in all threads appears suspicious but should be benign since
they all write the same thing - 0.
|
|
fb_fragment_put_pixel_unchecked() assumed the x,y coordinates of
the fragment were always 0.
|
|
|
|
The burst particle abused a zero mass to circumvent the effects of aging.
Instead use an explicit virtual flag to suppress aging, change busrt_cb
to ignore all virtual particles instead of only its center. Previously
burst_cb would thrust other bursts like any other particle, and this was
incorrect. Now burst centers are always stationary, even when they overlap
other bursts.
|
|
|
|
|
|
|
|
introduces create_context() and destroy_context() methods, and adds a
'void *context' first parameter to the module methods.
If a module doesn't supply create_context() then NULL is simply passed
around as the context, so trivial modules can continue to only implement
render_fragment().
A subsequent commit will update the modules to encapsulate their global
state in module-specific contexts.
|
|
Now that rototiller is generally threaded when a prepare_frame() method is
supplied, and modules/ray has been updated accordingly, discard the now
redundant ray-specific threading code and related stuff.
|
|
The ray tracer was already threaded, so this required little change other
than making some state global like the previous commits, and calling the
underlying non-threaded single-fragment scene renderer function.
A future commit will discard the now vestigial ray_threads related code.
|
|
Same basic changes as the previous commits made to julia and plasma.
|
|
Same general procedure as the previous commit made to the julia module.
|
|
Move maintenance of per-frame variables into julia_prepare_frame(), which
requires making them static file-scope globals for now.
Also make minor adjustments to the code to make less assumptions about the
fragment being rendered (like it's x/y coordinates being 0, etc.)
A future commit will probably add an initializer function to
rototiller_module_t, with an opaque pointer output which will be fed to all
the module methods so these globals can be encapsulated and instantiated.
|
|
This is a simple worker thread implementation derived from the ray_threads
code in the ray module. The ray_threads code should be discarded in a
future commit now that rototiller can render fragments using threads.
If a module supplies a prepare_frame() method, then it is called
per-frame to prepare a rototiller_frame_t which specifies how to divvy
up the page into fragments. Those fragments are then dispatched to a
thread per CPU which call the module's rendering function in parallel.
There is no coupling of the number of fragments in a frame to the number of
threads/CPUs. Some modules may benefit from the locality of tile-based
rendering, so the fragments are simply dispatched across the available CPUs
in a striped fashion.
Helpers will be added later to the fb interface for tiling fragments, which
modules desiring tiled rendering may utilize in their prepare_frame()
methods.
This commit does not modify any modules to become threaded, it only adds
the scaffolding.
|
|
Undecided on wether rototiller_frame_t should be fb_frame_t or not,
may change later.
|
|
Modules need to know the overall dimensions of the frame the fragment
they're rendering is part of. Previously it wasn't really necessary
since the fragments supplied to the modules had always been the full
page, but that's changing.
This commit also changes the julia module to use the frame dimensions,
others will need updating as well.
|
|
Adding more context to the name in anticipation of adding a prepare_frame()
method to the module struct.
|
|
Make consistent with the source directory structure naming.
|
|
|
|
The highlight on the little green sphere was white-washing
the entire thing due to its high specular reflection value.
This produces more reasonable results...
|
|
Was a constant at 20, this allows it to be specified per-object.
|
|
Make spheres a little more diverse in terms of specular/diffusion,
and minor tweak to the plane color.
|
|
|