diff options
-rw-r--r-- | TODO-vmon.txt | 126 |
1 files changed, 126 insertions, 0 deletions
diff --git a/TODO-vmon.txt b/TODO-vmon.txt new file mode 100644 index 0000000..b3fc1bb --- /dev/null +++ b/TODO-vmon.txt @@ -0,0 +1,126 @@ +vmon is being used internally at my $dayjob, as such I can justify paying some +bounties to anyone willing to work on these items for some cash. + +Payment is predicated on successful merging of the work, which means it must +get through review by yours truly. We're not rewriting everything, we're not +changing languages, we're making minimal changes to deliver what's asked. + +Before beginning any work listed here, please email <vcaputo@pengaru.com> to +verify nobody else has started work on the same item, as well as to clarify +what your intentions are and discuss any implementation details. Dollar +amounts listed below are estimates, we will arrive at a firmer number in +email. If you don't communicate with me before working on these things, be +prepared for disappointment. + +There is no commitment being made here to merge anything you deliver, but +should the work get merged, you will be paid. + +-------------------------------------------------------------------------------- + +- vmon needs a generalized mechanism for runtime-defined static rows introduced + into the charts. Stuff like WiFi signal:noise levels, or + hwmon/thermal_zone_device temperature values vs. related throttles to plot + temperature in red with the related throttling action in cyan. + + I'm leaning towards there being a CLI argument for introducing + a static row, with a syntax like: + + --source "TYPE:label" "[TYPE-specific-args...]" + + where the "--source source-args" position in argv influences its relative + order within the chart when appropriate, but --source+TYPE branches into + TYPE-specific cli handling of the rest which may or may not expect more + explicit positional/layout information regarding where to put this source in + the charts. + + Beyond that, what happens in the TYPE-specific-args is entirely polymorphic + in the sense that the TYPE-specific code does whatever's needed there. + + The first source to add should be some generic static single row wiring up + two sampled sources like /sys/ nodes producing a single stringified integer + when each read from. Targeting temperature values... with a float multiplier + to scale what's read into the row's height linearly, like a %age. + + e.g. --source "therm:radio0" "/sys/class/thermal_zone_device10/temp,min,max;/sys/class/cooling_device/throttle,min,max" + + where "therm" maps to a "therm" handler that knows how to parse and apply: + "/sys/class/thermal_zone_device10/temp,min,max;/sys/class/cooling_device/throttle,min,max" + + the "radio0" part after therm: would be what gets drawn over the row as a + label, the "therm" handler just needs to do the sampling and draw the meters + for the row. + + this must be done in a generalized manner where "therm" can be + easily replaced with "bar" for plugging in a new "bar" handler, + as there will be additional static rows to add - like the + aforementioned signal:noise row. + + Proposed bounty upon successful merge upstream: $400 + + +- Processes should have a memory row accompanying the user+sys cpu row. It's + only appropriate per-process, as threads share their process' address space. + + It's unclear to me how to determine the scale to use for the memory rows, if + it should be a %age of the total system physical memory, or something else. + Maybe it would be better as a relative growth/shrinkage plot where the red + indicates %age RSS increased from the bottom up (inverted vs. cpu) and cyan + %age RSS decreased from top down (also inverted vs. cpu) + + If the relative movement approach is used, it'd have to rely on a numberic + overlay to capture the current absolute RSS value. + + This would be desirable to capture in the snowflakes when a process exits, + alongside the cpu utilization graphs. + + A significant part of the work required to add memory rows to just the + processes will likely be breaking the assumption that it's a single row of + pixels per thread and process, uniformly. Instead it'll be double height for + only the processes. + + libvmon doesn't currently collect memory stats, so this will require libvmon + work adding that without introducing a bunch of undue overhead, in addition + to likely invasive changes to charts.c/vcr.c to deal with existing + assumptions surrounding the per-process/per-thread row height being a uniform + thing... since processes would now have another row for memory, while threads + would continue containing only cpu. + + Proposed bounty upon successful merge upstream: $1000 + + +- A CLI flag for turning vmon into a subreaper would be handy. It's fairly + trivial but does mean vmon would have to become more robust in its child + reaping to not accumulate zombies. This is regarding the + PR_SET_CHILD_SUBREAPER prctl... when you run a command under vmon in an + strace-like fashion, being a subreaper would capture orphaned descendents + like daemons so they don't leave vmon's scope upon becoming orphans inherited + by some ancestor subreaper, likely pid1. + + Proposed bounty upon successful merge upstream: $100 + + +- charts.c should do the non-zero level detection in the integer domain, to + ensure no floating point precision/rounding errors can result in 0-height + bars in the graph for non-zero but well below 1 pixel when scaled levels. + + Proposed bounty upon successful merge upstream: $50 + + +- I've noticed that when testing vmon in embedded devices experiencing large + scheduling delays, enough that the Adherence row becomes solid red, vmon + still tends to produce an inconsistent duration in the bar graph. So + there's still some work to be done here in terms of not losing/adding time + in the long run when Adherence consistently slips. + + The cases where I've observed this use 8000 width graphs with 1Hz sample + rates, combined with a heavly loaded system suffering from memory pressure + thrashing and scheduler contention. The snapshots are being saved every 25 + minutes, but the actual time passed in the graph's X axis varies as much as + several minutes. This almost certainly has to do with the Adherence + handling and sample repeating done to fill in the missed samples. It'd be + nice to firm this area up such that the X axis distance traveled accurately + reflects the time passed. It might require maintaining a cumulative + fractional error value across samples to compensate for, since cumulative + error is what's causing this. + + Proposed bounty upon successful merge upstream: $100 |