summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--TODO-vmon.txt126
1 files changed, 126 insertions, 0 deletions
diff --git a/TODO-vmon.txt b/TODO-vmon.txt
new file mode 100644
index 0000000..b3fc1bb
--- /dev/null
+++ b/TODO-vmon.txt
@@ -0,0 +1,126 @@
+vmon is being used internally at my $dayjob, as such I can justify paying some
+bounties to anyone willing to work on these items for some cash.
+
+Payment is predicated on successful merging of the work, which means it must
+get through review by yours truly. We're not rewriting everything, we're not
+changing languages, we're making minimal changes to deliver what's asked.
+
+Before beginning any work listed here, please email <vcaputo@pengaru.com> to
+verify nobody else has started work on the same item, as well as to clarify
+what your intentions are and discuss any implementation details. Dollar
+amounts listed below are estimates, we will arrive at a firmer number in
+email. If you don't communicate with me before working on these things, be
+prepared for disappointment.
+
+There is no commitment being made here to merge anything you deliver, but
+should the work get merged, you will be paid.
+
+--------------------------------------------------------------------------------
+
+- vmon needs a generalized mechanism for runtime-defined static rows introduced
+ into the charts. Stuff like WiFi signal:noise levels, or
+ hwmon/thermal_zone_device temperature values vs. related throttles to plot
+ temperature in red with the related throttling action in cyan.
+
+ I'm leaning towards there being a CLI argument for introducing
+ a static row, with a syntax like:
+
+ --source "TYPE:label" "[TYPE-specific-args...]"
+
+ where the "--source source-args" position in argv influences its relative
+ order within the chart when appropriate, but --source+TYPE branches into
+ TYPE-specific cli handling of the rest which may or may not expect more
+ explicit positional/layout information regarding where to put this source in
+ the charts.
+
+ Beyond that, what happens in the TYPE-specific-args is entirely polymorphic
+ in the sense that the TYPE-specific code does whatever's needed there.
+
+ The first source to add should be some generic static single row wiring up
+ two sampled sources like /sys/ nodes producing a single stringified integer
+ when each read from. Targeting temperature values... with a float multiplier
+ to scale what's read into the row's height linearly, like a %age.
+
+ e.g. --source "therm:radio0" "/sys/class/thermal_zone_device10/temp,min,max;/sys/class/cooling_device/throttle,min,max"
+
+ where "therm" maps to a "therm" handler that knows how to parse and apply:
+ "/sys/class/thermal_zone_device10/temp,min,max;/sys/class/cooling_device/throttle,min,max"
+
+ the "radio0" part after therm: would be what gets drawn over the row as a
+ label, the "therm" handler just needs to do the sampling and draw the meters
+ for the row.
+
+ this must be done in a generalized manner where "therm" can be
+ easily replaced with "bar" for plugging in a new "bar" handler,
+ as there will be additional static rows to add - like the
+ aforementioned signal:noise row.
+
+ Proposed bounty upon successful merge upstream: $400
+
+
+- Processes should have a memory row accompanying the user+sys cpu row. It's
+ only appropriate per-process, as threads share their process' address space.
+
+ It's unclear to me how to determine the scale to use for the memory rows, if
+ it should be a %age of the total system physical memory, or something else.
+ Maybe it would be better as a relative growth/shrinkage plot where the red
+ indicates %age RSS increased from the bottom up (inverted vs. cpu) and cyan
+ %age RSS decreased from top down (also inverted vs. cpu)
+
+ If the relative movement approach is used, it'd have to rely on a numberic
+ overlay to capture the current absolute RSS value.
+
+ This would be desirable to capture in the snowflakes when a process exits,
+ alongside the cpu utilization graphs.
+
+ A significant part of the work required to add memory rows to just the
+ processes will likely be breaking the assumption that it's a single row of
+ pixels per thread and process, uniformly. Instead it'll be double height for
+ only the processes.
+
+ libvmon doesn't currently collect memory stats, so this will require libvmon
+ work adding that without introducing a bunch of undue overhead, in addition
+ to likely invasive changes to charts.c/vcr.c to deal with existing
+ assumptions surrounding the per-process/per-thread row height being a uniform
+ thing... since processes would now have another row for memory, while threads
+ would continue containing only cpu.
+
+ Proposed bounty upon successful merge upstream: $1000
+
+
+- A CLI flag for turning vmon into a subreaper would be handy. It's fairly
+ trivial but does mean vmon would have to become more robust in its child
+ reaping to not accumulate zombies. This is regarding the
+ PR_SET_CHILD_SUBREAPER prctl... when you run a command under vmon in an
+ strace-like fashion, being a subreaper would capture orphaned descendents
+ like daemons so they don't leave vmon's scope upon becoming orphans inherited
+ by some ancestor subreaper, likely pid1.
+
+ Proposed bounty upon successful merge upstream: $100
+
+
+- charts.c should do the non-zero level detection in the integer domain, to
+ ensure no floating point precision/rounding errors can result in 0-height
+ bars in the graph for non-zero but well below 1 pixel when scaled levels.
+
+ Proposed bounty upon successful merge upstream: $50
+
+
+- I've noticed that when testing vmon in embedded devices experiencing large
+ scheduling delays, enough that the Adherence row becomes solid red, vmon
+ still tends to produce an inconsistent duration in the bar graph. So
+ there's still some work to be done here in terms of not losing/adding time
+ in the long run when Adherence consistently slips.
+
+ The cases where I've observed this use 8000 width graphs with 1Hz sample
+ rates, combined with a heavly loaded system suffering from memory pressure
+ thrashing and scheduler contention. The snapshots are being saved every 25
+ minutes, but the actual time passed in the graph's X axis varies as much as
+ several minutes. This almost certainly has to do with the Adherence
+ handling and sample repeating done to fill in the missed samples. It'd be
+ nice to firm this area up such that the X axis distance traveled accurately
+ reflects the time passed. It might require maintaining a cumulative
+ fractional error value across samples to compensate for, since cumulative
+ error is what's causing this.
+
+ Proposed bounty upon successful merge upstream: $100
© All Rights Reserved