til: use 16 * n_cpus in til_fragmenter_slice_per_cpu()

Slight improvement of CPU utilization for fragmenters using this strategy... I noticed tile64 would give better FPS in some scenarios where it seemed obvious slice_per_cpu() was the appropriate option. And that turned out to just be by virtue of being able to give idle threads something to do while busy ones finished what was on their plate. So just make the slices a bit more granular than n_cpus... this may have to be revisited in the future to find the sweet spot, and may need to be more sophisticated than just multiplying by a constant factor.
author: Vito Caputo <vcaputo@pengaru.com> 2023-06-17 18:26:50 -0700
committer: Vito Caputo <vcaputo@pengaru.com> 2023-06-19 15:37:14 -0700
commit: a2f7397d289a21d1077c205e1d3c2beee7b39ac4 (patch)
tree: 10a8fe46f58e0862dbd0b343baf6e5a7127e7ffe
parent: f1d5b79982f02c62539da2505cb8a4cd402d4969 (diff)
1 files changed, 14 insertions, 2 deletions
diff --git a/src/til.c b/src/til.c
index 78838ef..fdcce68 100644
--- a/src/til.c
+++ b/src/til.c
@@ -662,10 +662,22 @@ int til_module_setup_finalize(const til_module_t *module, const til_settings_t *
 }
 
 
-/* generic fragmenter using a horizontal slice per cpu according to context->n_cpus */
+/* generic fragmenter using a horizontal slice per cpu according to context->n_cpus (multiplied by a constant factor) */
 int til_fragmenter_slice_per_cpu(til_module_context_t *context, const til_fb_fragment_t *fragment, unsigned number, til_fb_fragment_t *res_fragment)
 {
-	return til_fb_fragment_slice_single(fragment, context->n_cpus, number, res_fragment);
+	/* The *16 is to combat leaving CPUs idle waiting for others to finish their work.
+	 *
+	 * Even though there's some overhead in scheduling smaller work units,
+	 * this still tends to result in better aggregate CPU utilization, up
+	 * to a point.  The cost of rendering slices is often inconsistent,
+	 * and there's always a delay from one thread to another getting
+	 * started on their work, as well as scheduling variance.
+	 *
+	 * So it's beneficial to enable early finishers to pick
+	 * up slack of the laggards via slightly more granular
+	 * work units.
+	 */
+	return til_fb_fragment_slice_single(fragment, context->n_cpus * 16, number, res_fragment);
 }
author	Vito Caputo <vcaputo@pengaru.com>	2023-06-17 18:26:50 -0700
committer	Vito Caputo <vcaputo@pengaru.com>	2023-06-19 15:37:14 -0700
commit	a2f7397d289a21d1077c205e1d3c2beee7b39ac4 (patch)
tree	10a8fe46f58e0862dbd0b343baf6e5a7127e7ffe
parent	f1d5b79982f02c62539da2505cb8a4cd402d4969 (diff)