Age | Commit message (Collapse) | Author |
|
Depending on where cancellation was happening, locks were potentially
left held which could result in the next thread being cancelled
deadlocking.
In deferred cancellation, only cancellation points realize the
cancellation.
So one worker thread could realize the cancellation entering say,
pthread_cond_wait, and exit with the associated mutex still held.
The other thread could be in the process of returning from
pthread_cond_wait - past the cancellation point already, and get stuck
in trying to acquire the mutex as pthread_cond_wait does before
returning, because the lock was left held by the other thread.
Instead, use the cleanup handlers to unlock the mutexes, and enable
asynchronous cancellation.
This seems to eliminate the observed occasional deadlocks on destroy.
|
|
For the sake of sdl_fb, move page flipping into the main thread
and run module render dispatch from another thread instead.
This eliminates the fb flipper thread, moving its functionality
into fb_flip() which synchronously consumes and performs a single
flip from the same queue as before - the function is verbatim
the loop body of the flipper thread.
Now main() calls fb_flip() in a loop where it previously dispatched
pages for rendering.
Rendering dispatch is now performed in a created thread.
See the comment in fb.c for more explanation of this shuffle.
|
|
With fb backends entirely abstracted behind fb_ops_t, this is
no longer necessary.
|
|
|
|
With drmsetup.c gone these are no longer used and I don't see
their use returning. Get rid of them.
|
|
The fb_ops entrypoints and their descendants are purely readers
of the settings, so constify their settings_t instances and the
operative functions which only read settings.
|
|
Since people are more likely to first run this from a GUI environment,
default to SDL which should work in most situations.
Then if they want they can switch to a linux console and explicitly use
the drm video backend.
|
|
|
|
This uses a simple fixed 640x480 windowed mode (for now).
The SDL2 Renderer & Texture API is used for vsync-synchronized presents.
There's probably excessive copying going on because the rototiller fb
code manages pages and flips but SDL2 doesn't really expose low-level
control of such things.
This backend is quite useful for development purposes, allowing quick
iteration in a windowed environment.
Note this is just the backend implementation, it's dormant code but
trivially activated.
|
|
This should probably be split into multiple commits, but
for simplicity sake it's all cut over at once.
drm_fb.c sees major changes, migrating the remaining drm-specific bits
from drmsetup into it, behind the settings API.
rototiller.c sees a bunch of scaffolding surrounding the settings API
and wiring it up into the commandline handling and renderers and video
backends.
fb.[ch] see minor changes as settings get plumbed to the backend
drmsetup.[ch] goes bye bye
|
|
Preliminary means for interactively configuring settings and defaults
|
|
Nothing wired up yet.
|
|
Settings will be used to express configurable parameters in the
rendering modules and fb backends.
The goal is to address both commandline argument setting of parameters,
automatic use of defaults, as well as interactive configuration
including the outputting of the resulting settings in a form usable as
a commandline for future reuse.
Since settings can be numerous and highly varied from one module or
backend to another, a form similar to the Linux kernel's cmdline or
QEMU's approach has been adopted.
For example, a complete DRM backend, card selection and config would be:
rototiller --video=drm,dev=/dev/dri/card0,connector=LVDS-1,mode=1024x768@60
If any of the above were omitted, then the missing settings would be
interactively configured.
If you added --defaults, then any omissions would be automatically
filled in with the defaults.
i.e.
rototiller --video=drm,dev=/dev/dri/card4 --defaults
would use the preferred connector and mode for that card.
rototiller --video=drm --defaults
would do the same but also default to the /dev/dri/card0 path.
for brevity, I omitted rendering modules from above, but the same
approach applies simply to --module=:
rototiller --module=sparkler --video=drm --defaults
If you ran rototiller without any arguments, then a fully interactive
setup would ensue for module and video.
If you ran rototiller with just --defaults, then everything is
defaulted for you. A default rendering module will be used (the
original roto renderer, probably).
Note that this commit only adds scaffolding to make this possible,
none of this is wired up yet.
|
|
Remove everything drm-related from fb.c, utilizing the implementation in
drm_fb.c instead.
|
|
Largely mechanical copying of the drm code into the new fb_ops_t
abstraction. Dormant for now.
|
|
Hooks for fb acquire/release, page allocate, free, and flip.
This should encompass everything currently needed for the drm backend,
which will move behind this abstraction in a later commit.
|
|
Tidying this up a bit in preparation of ripping out all drm-specific
stuff from fb.[ch].
Future commits will refactor fb.c to utilize an fb_ops_t for hooks
to allocate, flip, and free pages.
|
|
also const the ray_euler_t basis
|
|
|
|
This moves the per-object _prepared state into ray_render_object_$type
structs with all the rendering-related object methods switched to
operate on the new render structs.
Since the current rendering code just makes all these assumptions
about light objects being point lights, I've just dropped all the
stuff associated with rendering light objects for now. I think it
will be refactored a bit later on when the rendering code stops
hard-coding the point light stuff.
These changes open up the possibility of constifying the scene and
constituent objects, now that rendering doesn't shove the prepared
state into the embedded _prepared object substructs.
|
|
This introduces ray_render_t, and ray_render.[ch].
The _prepared member of ray_scene_t has been moved to ray_render_t,
and the other _prepared members (e.g. objects) will follow.
Up until now I've just been sticking the precomputed state under
_prepared members of their associated structures, and simply using
convention to enforce anything resembling an api boundary. It's
been convenient without being inefficient, but I'd like to move
the ray code into more of a reusable library and this wart needs
to be addressed.
The render state is also where any spatial indexes will be built
and maintained, another thing I've been experimenting with.
Note most of the churn here is just renaming ray_scene.c to
ray_render.c. A nearly global s/ray_scene/ray_render/ has occurred,
now that ray_scene_t really only serves as glue to bind objects,
lights, and scene-global properties into a cohesive unit.
|
|
Add a hook for post-render serialized frame completion,
some of the renderers may have state to cleanup after rendering
a frame.
A future commit may change add a return value to control flow for
features like multi-pass rendering within a given module.
The raytracer for example may want to add concurrently executed
post filters, and having a non-void return from finish_frame()
would be a tidy way to tell rototiller "go back to prepare->render
for this context" as many times as necessary, keeping the pass state
in the context.
For now its return is void however, as I just need a cleanup hook
as the raytracer becomes more stateful per frame with a BIH spatial
index in the works.
|
|
Before I can clean up the ray_scene_t._prepared kludge I need a
place to keep state from frame prepare to render, enter context.
Future commits will migrate the _prepared stuff into a separate
ray_render_t which is constructed on prepare then acted on in
fragment render.
Then spatial acceleration structures may be added, constructed
at prepare phase and shared across the concurrent rendering.
|
|
Remove some extraneous indentation
|
|
Commit 445e94 switched to using sentinel objects, but missed removal
of these obsoleted object counts.
|
|
There's no point computing more reflections if they're not going
to contribute substantially to the resulting sample. Previously
the max depth threshold solely controlled how many times a given
ray could reflect, this commit introduces a minimum relevance as
well. Value may require tuning, may actually make sense to move
into the scene description as a parameter.
Brings a minor frame rate improvement.
|
|
Just cast buf to (void *) for the pointer arithmetic, stride is in
units of bytes and no assumptions should be made about its value
such as divisability by 4.
|
|
|
|
Mechanical cosmetic change
|
|
|
|
|
|
Rather than laying out all fragments in a frame up-front in
ray_module_t.prepare_frame(), return a fragment generator
(rototiller_fragmenter_t) which produces the numbered fragment
as needed.
This removes complexity from the serially-executed
prepare_frame() and allows the individual fragments to be
computed in parallel by the different threads. It also
eliminates the need for a fragments array in the
rototiller_frame_t, indeed rototiller_frame_t is eliminated
altogether.
|
|
|
|
Trivial optimization eliminates some instructions from the hot path,
no need to maintain a separate index from the current object pointer.
|
|
Previously every fb_fragment_t (and thus thread) was constructing
its own ray_camera_frame_t view into the scene, duplicating some
work.
Instead introduce ray_camera_fragment_t to encapsulate the truly
per-fragment state and make ray_scene_render_fragment() operate
on just this stuff with a reference to a shared
ray_camera_frame_t prepared once per-frame.
Some minor ray_camera.c cleanups sneak in as well (prefer multiply
instead of divide, whitespace cleanups...)
|
|
Currently fragments always start at the left edge of the frame, but
when switching to a tiling fragmenter this is no longer true and
causes visible errors.
|
|
ray:object intersection coordinates were incorrectly being computed
relative to the ray origin using a subtraction instead of addition, a
silly mistake with surprisingly acceptable results. Those results
were a result of other minor complementary mistakes compensating to
produce reasonable looking results.
In the course of experimenting with an acceleration data structure it
became very apparent that 3d space traversal vectors were not behaving
as intended, leading to review and correction of this code.
|
|
|
|
For now, a simple cpu multiplier of 64 is used.
fb_fragment_t needs a tiling fragment divider added...
|
|
Instead of creating fragment lists striped across available threads
uniformly in a round-robin fashion, just have the render threads iterate
across the shared fragments array using atomics.
This way non-uniform cost of rendering can be adapted to, provided the
module prepares the frame with sufficient fragment granularity.
In the ray tracer for example, it is quite common for some areas of the
screen to have lower complexity/cost than others. The previous model
distributed the fragments uniformly across the threads with no ability for
underutilized threads to steal work from overutilized threads in the event
of non-uniform cost distributions.
Now no attempt to schedule work is made. The render threads simply race
with eachother on a per-frame basis, atomically incrementing a shared
index into the frame's prepared fragemnts. The fragment size itself
represents the atomic work unit.
A later commit will change the various renderers to prepare more/smaller
fragments where appropriate. The ray tracer in particular needs more and
would probably further benefit from a tiling strategy, especially when
an acceleration data structure is introduced.
|
|
Small speedup, I personally find the code cleaner this way too.
Everything in the hot path should now be inlined, no function calls.
|
|
We can just assume the object which reflected the ray being tracing
isn't going to be intersected. Maybe later this assumption no longer
holds true, but it is true for now.
|
|
This gets rid of some computation on the primary ray:plane intersection tests
The branches on depth suck though... I'm leaning towards specialized primary
ray intersection test functions.
|
|
This gets rid of some computation on the primary ray:plane intersection tests
|
|
To enable prepare to precompute aspects of primary rays which all have a
common origin at the camera, bring the camera to ray_object*_prepare() and
bring the depth to ray_object*_intersects_ray() for primary ray detection.
This is only scaffolding, functionally unchanged.
|
|
This may need to be undone in the future when more sophisticated lights,
like area lights, are implemented. For now I can avoid polluting the
objects list with the lights by strictly separating them.
|
|
Remove unnecessary nearest_object check, the distance comparison alone
is sufficient when initialized to INFINITY.
|
|
Just tidying up shade_ray() before more optimizations.
|
|
Trivially removes a ray_3f_mult_scalar() from the hot path.
|
|
We can avoid some unnecessary work at the max depth by checking it in
shade_ray() instead.
|