TODO-vmon.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123

vmon is being used internally at my $dayjob, as such I can justify paying some
bounties to anyone willing to work on these items for some cash.

Payment is predicated on successful merging of the work, which means it must
get through review by yours truly.  We're not rewriting everything, we're not
changing languages, we're making minimal changes to deliver what's asked.

Before beginning any work listed here, please email <vcaputo@pengaru.com> to
verify nobody else has started work on the same item, as well as to clarify
what your intentions are and discuss any implementation details.  Dollar
amounts listed below are estimates, we will arrive at a firmer number in
email.  If you don't communicate with me before working on these things, be
prepared for disappointment.

There is no commitment being made here to merge anything you deliver, but
should the work get merged, you will be paid.

--------------------------------------------------------------------------------

- vmon needs a generalized mechanism for runtime-defined static rows introduced
  into the charts.  Stuff like WiFi signal:noise levels, or
  hwmon/thermal_zone_device temperature values vs.  related throttles to plot
  temperature in red with the related throttling action in cyan.

  I'm leaning towards there being a CLI argument for introducing
  a static row, with a syntax like:

  --source "TYPE:label" "[TYPE-specific-args...]"

  where the "--source source-args" position in argv influences its relative
  order within the chart when appropriate, but --source+TYPE branches into
  TYPE-specific cli handling of the rest which may or may not expect more
  explicit positional/layout information regarding where to put this source in
  the charts.

  Beyond that, what happens in the TYPE-specific-args is entirely polymorphic
  in the sense that the TYPE-specific code does whatever's needed there.

  The first source to add should be some generic static single row wiring up
  two sampled sources like /sys/ nodes producing a single stringified integer
  when each read from.  Targeting temperature values... with a float multiplier
  to scale what's read into the row's height linearly, like a %age.

  e.g. --source "therm:radio0" "/sys/class/thermal_zone_device10/temp,min,max;/sys/class/cooling_device/throttle,min,max"

  where "therm" maps to a "therm" handler that knows how to parse and apply:
  "/sys/class/thermal_zone_device10/temp,min,max;/sys/class/cooling_device/throttle,min,max"

  the "radio0" part after therm: would be what gets drawn over the row as a
  label, the "therm" handler just needs to do the sampling and draw the meters
  for the row.

  this must be done in a generalized manner where "therm" can be
  easily replaced with "bar" for plugging in a new "bar" handler,
  as there will be additional static rows to add -  like the
  aforementioned signal:noise row.

  Proposed bounty upon successful merge upstream: $400


- Processes should have a memory row accompanying the user+sys cpu row.  It's
  only appropriate per-process, as threads share their process' address space.

  It's unclear to me how to determine the scale to use for the memory rows, if
  it should be a %age of the total system physical memory, or something else.
  Maybe it would be better as a relative growth/shrinkage plot where the red
  indicates %age RSS increased from the bottom up (inverted vs. cpu) and cyan
  %age RSS decreased from top down (also inverted vs. cpu)

  If the relative movement approach is used, it'd have to rely on a numeric
  overlay to capture the current absolute RSS value.

  This would be desirable to capture in the snowflakes when a process exits,
  alongside the cpu utilization graphs.

  A significant part of the work required to add memory rows to just the
  processes will likely be breaking the assumption that it's a single row of
  pixels per thread and process, uniformly.  Instead it'll be double height for
  only the processes.

  libvmon doesn't currently collect memory stats, so this will require libvmon
  work adding that without introducing a bunch of undue overhead, in addition
  to likely invasive changes to charts.c/vcr.c to deal with existing
  assumptions surrounding the per-process/per-thread row height being a uniform
  thing... since processes would now have another row for memory, while threads
  would continue containing only cpu.

  Proposed bounty upon successful merge upstream: $1000


- charts.c should do the non-zero level detection in the integer domain, to
  ensure no floating point precision/rounding errors can result in 0-height
  bars in the graph for non-zero but well below 1 pixel when scaled levels.

  Proposed bounty upon successful merge upstream: $50


- I've noticed that when testing vmon in embedded devices experiencing large
  scheduling delays, enough that the Adherence row becomes solid red, vmon
  still tends to produce an inconsistent duration in the bar graph.  So
  there's still some work to be done here in terms of not losing/adding time
  in the long run when Adherence consistently slips.

  The cases where I've observed this use 8000 width graphs with 1Hz sample
  rates, combined with a heavly loaded system suffering from memory pressure
  thrashing and scheduler contention.  The snapshots are being saved every 25
  minutes, but the actual time passed in the graph's X axis varies as much as
  several minutes.  This almost certainly has to do with the Adherence
  handling and sample repeating done to fill in the missed samples.  It'd be
  nice to firm this area up such that the X axis distance traveled accurately
  reflects the time passed.  It might require maintaining a cumulative
  fractional error value across samples to compensate for, since cumulative
  error is what's causing this.

  Proposed bounty upon successful merge upstream: $200


- When the process hierarchy extends to/beyond the bottom of the chart, if it
  retracts to again fit well within the chart's bounds, the blank padding rows
  aren't restored.  Instead, whatever residual contents leftover from when the
  hierarchy went to the edge and beyond gets retained as the padding rows.

  Proposed bounty upon successful merge upstream: $100