Age | Commit message (Collapse) | Author |
|
s->count isn't always perfectly divisable by n_cpus, which is why
ctxt->n_elements is computed from n_cpus * elements_per_cpu in
the transition to threaded rendering for flow.
That's all fine and dandy, but the ctxt->elements initialization
loop was still using the vestigial s->count from the pre-threaded
implementation. So on core counts where ctxt->n_elements was
smaller than s->count, initialization scribbled.
Thanks Sketch for assistance in chasing this down w/ASAN enabled
on a box that exhibited crashing w/rtv,channels=flow.
|
|
57bae7 removed the default from the settings list when bumping the counts,
oops!
|
|
The convention has been to label threaded modules in their
description.
|
|
Now that there's threaded rendering, handling larger counts
without bogging down the frame rate on anything remotely modern
is feasible.
|
|
While optimizing the threaded rendering in commit 6d6c141, the
pos.{xy} expanding from 0-1 to -1..+1 were eliminated from the
inner loops in favor of just having the positions always in
-1..+1 coordinates. But I missed that it was only the x/y
coordinates which were being expanded, with .z being left in the
0-1 space, which had a desirable aesthetic effect of condensing
the Z space, flattening everything.
This commit undoes that, without reintroducing the expansion to
the inner loops. It's a bit crufty because now .z is treated
exceptionally throughout as 0..1 while {.x,.y} are in -1..+1, but
it's fine for now.
|
|
This exploits the just added multipass rendering support.
In the first pass, the flow-field is sampled and applied to the
elements, with every thread operating on its own subset of the
elements list. Since the flow-field sampling is all read-only,
it's perfectly safe too do in parallel. Nothing is drawn in the
first pass, it's only the elements updating according to the
flow-field which is performed.
In the second pass, the elements are rendered in parallel using
the slice_per_cpu fragmenter. Since the elements are kept on a
simple array, with no spatial indexing, every thread must visit
every element.
Since the fragmenter used divides the frame into horizontal
slices, every thread needing to reject elements not overlapping
its region can take some shortcuts in easily identifying elements
entirely outside its region. But the whole 3d->2d projection
step must still be performed for every element's current position
and +n_iters final position for the frame, which does have a
divide unfortunately.
Nonetheless, this change improves frame rates substantially on my
2c/4t i7 X230 as benchmarked w/--video=mem,1366x768:
--seed=0x64fa9508 '--module=rtv,channels=flow,duration=3,context_duration=3,caption_duration=0,log_channels=on,snow_duration=0,snow_module=none' '--video=mem,size=1366x768'
rtv channel settings: 'flow,size=4,count=40000,speed=.8'
FPS: 261
FPS: 265
rtv channel settings: 'flow,size=4,count=1000,speed=.9'
FPS: 1153
FPS: 3204
FPS: 2934
rtv channel settings: 'flow,size=8,count=5000,speed=.9'
FPS: 2923
FPS: 1634
FPS: 1592
rtv channel settings: 'flow,size=2,count=50000,speed=.4'
FPS: 1006
FPS: 219
FPS: 268
rtv channel settings: 'flow,size=16,count=30000,speed=.8'
FPS: 304
FPS: 350
FPS: 343
rtv channel settings: 'flow,size=16,count=30000,speed=.02'
FPS: 379
FPS: 503
FPS: 472
rtv channel settings: 'flow,size=8,count=1000,speed=.16'
FPS: 1393
FPS: 3822
FPS: 3876
---
Prior to this commit:
--seed=0x64fa9508 '--module=rtv,channels=flow,duration=3,context_duration=3,caption_duration=0,log_channels=on,snow_duration=0,snow_module=none' '--video=mem,size=1366x768'
rtv channel settings: 'flow,size=4,count=40000,speed=.8'
FPS: 53
FPS: 53
rtv channel settings: 'flow,size=4,count=1000,speed=.9'
FPS: 426
FPS: 1366
FPS: 1335
rtv channel settings: 'flow,size=8,count=5000,speed=.9'
FPS: 1097
FPS: 368
FPS: 367
rtv channel settings: 'flow,size=2,count=50000,speed=.4'
FPS: 279
FPS: 73
FPS: 74
rtv channel settings: 'flow,size=16,count=30000,speed=.8'
FPS: 71
FPS: 71
FPS: 70
rtv channel settings: 'flow,size=16,count=30000,speed=.02'
FPS: 136
FPS: 305
FPS: 305
rtv channel settings: 'flow,size=8,count=1000,speed=.16'
FPS: 972
FPS: 2593
FPS: 2634
|
|
Nothing too crazy here, the speed= setting still controls the
speed in lieu of something driving the tap.
|
|
This is too aggressive and produces some undesirable visible
artifacts on the periphery, especially for slow-moving
small-size fields.
In such scenarios the elements near the edges would be
excessively pruned when the direction wandered off-screen, then
leaving an overly sparse region when the direction inevitably
wandered back.
This is still an issue but it's far less prominent when only
clipping to the flow field boundaries... since the FOV doesn't
quite encompass the edges of the flow field. Now the elements
can survive wandering a bit off-screen, and re-enter.
|
|
This is a first stab at colorizing the output.
The flow field now has two v3f_t datums per cell, direction and
color.
It's a bit pastel-y and color choice/palettes definitely needs
work, at least some gamma correction would make sense.
But I kind of like the pastel look actually, some of the
combinations start looking very 80s aesthetic.
A good way to watch flow's possibilities is:
--module=rtv,channels=flow,duration=10,context_duration=10,caption_duration=0 \
--video=sdl,fullscreen=on --defaults --go
The long-ish duration really gives a chance to get into the
groove of things before switching
|
|
Simplify ff_new() failure path by using ff_free(), also make
ff_free() more ergonomic by returning NULL.
|
|
This is kind of a particle system, where the particles are pushed
around through a 3D vector space treated as a flow field.
No physics are being simulated here, it's just treating the flow
field as direction vectors that are trilinearly interpolated when
sampled to produce a single direction vector. That direction
vector gets applied to particles near it.
To keep things interesting the flow field evolves by having two
distinct flow fields which the simulation progressively
alternates sampling from. For every frame, both flow fields are
sampled for every particle, but how much weight is given to the
influence of one vs. the other varies by a triangle wave over
time. When the weight is biased enough to one of the flow fields
near a peak/valley in the triangle wave, the other gets
re-populated while its influence is negligible, also
interpolating its new values with 25% influence from the active
field.
The current flow field population routine is completely random.
Yet there's a surprising amount of emergent order despite being
totally randomized direction vectors.
Currently supported settings include:
size= the width of the 3D flow field cube in direction vectors
(the number of vectors is size*size*size)
count= the number of particles/elements
speed= how far a particle is moved along the current sample's
direction vector
This was first implemented in 2017, but sat unfinished in a topic
branch for myriad reasons. Now that rototiller has much more
robust settings infrastructure, among other things, it seemed
worth finishing this up and merging.
|