diff options
author | Vito Caputo <vcaputo@pengaru.com> | 2023-06-17 18:26:50 -0700 |
---|---|---|
committer | Vito Caputo <vcaputo@pengaru.com> | 2023-06-19 15:37:14 -0700 |
commit | a2f7397d289a21d1077c205e1d3c2beee7b39ac4 (patch) | |
tree | 10a8fe46f58e0862dbd0b343baf6e5a7127e7ffe | |
parent | f1d5b79982f02c62539da2505cb8a4cd402d4969 (diff) |
til: use 16 * n_cpus in til_fragmenter_slice_per_cpu()
Slight improvement of CPU utilization for fragmenters using this
strategy...
I noticed tile64 would give better FPS in some scenarios where it
seemed obvious slice_per_cpu() was the appropriate option. And
that turned out to just be by virtue of being able to give idle
threads something to do while busy ones finished what was on
their plate.
So just make the slices a bit more granular than n_cpus... this
may have to be revisited in the future to find the sweet spot,
and may need to be more sophisticated than just multiplying by a
constant factor.
-rw-r--r-- | src/til.c | 16 |
1 files changed, 14 insertions, 2 deletions
@@ -662,10 +662,22 @@ int til_module_setup_finalize(const til_module_t *module, const til_settings_t * } -/* generic fragmenter using a horizontal slice per cpu according to context->n_cpus */ +/* generic fragmenter using a horizontal slice per cpu according to context->n_cpus (multiplied by a constant factor) */ int til_fragmenter_slice_per_cpu(til_module_context_t *context, const til_fb_fragment_t *fragment, unsigned number, til_fb_fragment_t *res_fragment) { - return til_fb_fragment_slice_single(fragment, context->n_cpus, number, res_fragment); + /* The *16 is to combat leaving CPUs idle waiting for others to finish their work. + * + * Even though there's some overhead in scheduling smaller work units, + * this still tends to result in better aggregate CPU utilization, up + * to a point. The cost of rendering slices is often inconsistent, + * and there's always a delay from one thread to another getting + * started on their work, as well as scheduling variance. + * + * So it's beneficial to enable early finishers to pick + * up slack of the laggards via slightly more granular + * work units. + */ + return til_fb_fragment_slice_single(fragment, context->n_cpus * 16, number, res_fragment); } |