Age | Commit message (Collapse) | Author |
|
This is really how things are implemented today, which may
actually be incorrect in some edge case scenarios... but let's
assert it holds true currently to aid debugging some spurious
asserts in vcr_shift_below_row_up_one() about row vs.
hierarchy_end.
The potential issue I see with this assumption as-is is it's
entirely possible to have descendants survive a parent's demise,
grandchildren don't have to exit when a parent does. But it
might be OK to treat it that way, as they'll be rediscovered as
children of PID 1, and there's no strict need to preserve
continuity of their associated charts state across that
transition. It's rare enough that I don't think it's worth
worrying about, but maybe this is what's happening with the
asserts during startup specifically; when things are daemonizing
/ double forking etc.
|
|
The deferred pass only enters draw_chart() once regardless of
this_sample_duration, with the idx always 0.
So when this_sample_duration > 1 (stalls/repeated samples), the
conditional draw_overlay_row() would only get entered in the
non-deferred passes in deferred mode, which are short-circuited
within draw_overlay_row() because we don't want to do render that
stuff in those passes in deferred mode.
The fix is trivial; always enter draw_overlay_row() for the
deferred pass.
This fixes a cosmetic artifact where you'd see stale / missing
overlays in the hierarchy rows of output, when sample durations
were falling behind schedule enough for this_sample_duration to
be greater than 1.
|
|
Since vcr_t implements rendering of borders and backgrounds, to
such an extent that when serializing mem->png for headless mode
it produces the background and border on the fly on a per-row
basis, let's just give it the ability to access the marker
distance in vwm_charts_t and draw the markers as needed.
It feels hacky to be passing pointers to these values but I
really despise repeating setters across abstractions to plumb
things through, so I'm doing the stupid simple thing here.
|
|
This introduces the concept of border markers intended to serve
as timeline references/milestones.
Here only the minimal API for setting their distance is added,
nothing is actually implemented yet.
|
|
Now that this just wraps vcr_shadow_row() in a post-vcr world
it's pointless, so let's get rid of it.
No functional difference.
|
|
Purely cosmetic, no functional change.
|
|
Double precision is unnecessary for this, use floats throughout,
at least for everything vmon related.
|
|
This is a minor trivial optimization turning some frequent
divides into multiplies which are generally less costly to
compute.
|
|
This really needs to be a clock unaffected by ntp adjustments,
which CLOCK_MONOTONIC_RAW seems to provide.
|
|
I prefer this be on its own in the upper right corner.
|
|
This draws the new scheduling "adherence" metric in a row below the top
IOWait/Idle% row.
The headings have moved down one to cover "adherence" instead,
which I think should help make the important IOWait/Idle% row
more visible as well as improving headings readability.
The adherence row should generally be either black or red, rarely
cyan.
Red indicates %age of sampling interval behind schedule for the
given sample, Cyan indicates same but ahead of schedule which
should be unusual/almost never happen. Infact I think the
current sharing of the "close enough" epsilon as adherence
truncating threshold the ahead of schedule "close enough"
situations will always get truncated to zero. So it might be
impossible to see any cyan adherence as-is right now.
A future commit will move the '\/\/\ # %name @ Hz' heading up to
the IOWait/Idle% putting it back in the upper right corner, but
only that one.
|
|
This will likely be made more dynamic in the future, but for now
there's a need to shift "rest" down another row to make room for
the "adherence" row. This is a simple way to accomodate that,
another preparatory commit.
|
|
This is an attempt to add a schedule adherence metric which a
subsequent commit will plot in a row below the top IOWait/%Idle %
row.
Ideally the adherence metric's value would always be 0, because
we're always exactly on-time with our samples.
But what tends to happen is falling behind, or rarely being
slightly ahead of schedule (particularly with the epsilon
introduction).
This metric can serve as a sort of proxy for userspace's ability
to get scheduled on time, which is a useful thing to see.
|
|
This introduces a concept of a "close enough" epsilon value.
Where if the attempted update's current time is within very small
temporal distance from the precisely scheduled time dictated by
the interval, the update will still take a sample, rather than
try introduce a tiny dely the host/kernel/ppoll will likely fail
to adhere to without being tardy.
Previously the desired delay was just a third of the interval,
with no consideration for how long sampling took. This was dead
simple, but made no attempt to schedule the poll timeout to align
with the next sampling deadline, and would either cause excessive
wakeups, or excessive tardiness, depending on the host's speed.
I think this technically also fixed a bug where this_delta
wouldn't get assigned if one of the earlier conditions
short-circuited the later condition where it was being assigned.
|
|
This is mostly preparotory for having more precision in a
computed delay, but is also arguably just finishing what was
started when adding the _us suffixes throughout.
A future commit should also rework signal stuff to only unblock
signals in ppoll().
|
|
This should have been done when draw_chart() and
draw_chart_rest() were split apart making draw_chart()
non-recursive. But it becomes much more glaringly obvious in a
world where maintain_chart() is calling draw_chart() in multiple
places.
|
|
This is preparatory for shifting heading off row 0 which until
now has been the safe assumption, but I'm intending to add an
"adherence" row below the IOWait/Idle top row. The headings
will be moving down to that.
|
|
primarily s/sampling_interval/sampling_interval_secs/ units
clarification
|
|
This applies charts->this_sample_duration by advancing and drawing
the graph bars this_sample_duration times.
It's a bit crufty with conditionals especially where it overlaps
with deferred_pass handling... but seems to work ok in initial
tests.
Future work will have to add a row indicating how far we've
deviated from the scheduled sample time... Maybe cyan would show
how premature we were, and red how late we were. Where 100%
would be the entire sample interval was exceeded, but < 100%
would show our still more or less on-schedule scheduling
deviations.
|
|
This turns the time passed since the last sample taken into a
"sample duration".
Ideally this would always be 1, and up until now in the main use
case, vwm, it's been assumed to generally be 1 and drops in the
timeline treated benign/fleeting because of the live viewing.
But with the introduction of --headless and increasing use on my
servers / embedded interests, this has become more problematic.
In this commit the duration is only being maintained, but not
applied.
Subsequent commits will have to repeat the current sample in the
graphs (this_sample_duration - 1) times.
|
|
Mechanical renaming of this vestigial name choice from when
vmon_proc_t was below the "monitor". Now it's just the
vmon_proc_t pointed at from the chart, so let's name accordingly.
No functional change.
|
|
This API is targeting poll() usage which implies microseconds,
but let's better clarify it in naming.
|
|
For rows reflecting threads and single/non-threaded processes,
let's scale the bar % by the number of cpus, so they can use the
full height of the row.
These tasks can't scale to multiple CPUs, so it's pointless to
leave vertical space for the other cores' capacity, if present.
For multi-threaded process rows, the vertical space continues to
accomodate all cores.
I've been on the fence about this change for a while because it
increases the cognitive load of reading the graphs, now the
scales are inconsistent. But when you've got 16 cores like on my
AMD P14s thinkpad, combined with a row height of 16 pixels, you
start wishing these rows used the full height of the row for
their single-core-constrained %ages.
|
|
cosmetic change; insert a space after the "#" in the string used
when comm/argv can't be sampled
|
|
|
|
|
|
Mechnical fix of longstanding typo I'm tired of ignoring...
|
|
It's actually pretty useful to see the relative PID values
across snowflakes...
|
|
Part of the reason for adding headless support in vmon is to
facilitate embedded use cases. These are often incompatible with
anti-tivoization aspects of gplv3.
I am the copyright holder of all this stuff so it's entirely fine
to switch to gplv2. Phil Freeman contributed one trivial patch
(4183fbd), regardless I checked if he had any objections to the
gplv2 switch and he had none.
So here we go, gplv2 all the things.
|
|
|
|
It's nice to be able to blindly hit Mod1+Left a bunch to minimize
the monitoring overhead, like when trying to preserve battery
etc. But I've found myself sometimes annoyed that I've
completely disabled the monitors when I reach for the overlays.
This change removes the 0Hz option from the preset intervals, so
now when you lower monitoring by blindly hitting Mod1+Left a
bunch it'll bottom out at 1HZ.
A subsequent commit will wire up disabling the monitors to an
explicit vwm key combo (probably Mod1+z).
|
|
vmon steps on this edge case, in vwm it was largely benign
since nothing ever happens immediately at vwm startup.
But in vmon you do things monitor commands which might
immediately send SIGUSR1 to vmon for .png snapshots, producing an
empty .png because the first update didn't sample because the
time delta hadn't passed.
This change just maintains a "primed" charts flag to ensure the
initial charts update always samples. This way if got_sigusr1 is
already set on the first iteration, at least the first charts
update will have sampled and composited *something*.
|
|
Since libvmon samples the sys_wants before proc_wants, it's
entirely possible the proc_stat->start will be later than
sys_stat->boottime by the time a given process gets sampled.
Simply treat this analogous to being unable to sample the start,
either of which will only leave the Wall as ??s in the highly
ephemeral short-lived process scenario. In the > boottime case,
the next sample for the same process would have start <= boottime
|
|
Tidying some vestigial cruft from the pre-VWM_COLUMN* transition,
still feels relatively crufty and fragile, but this is about all I
have time for spending on this at the moment...
|
|
When libvmon fails to successfully sample proc_stat, it will
leave this value as 0, which isn't really otherwise a normal
process start value.
Handle this by producing "??s" for the Wall time normally derived
from (sys_stat->boottime - proc_stat->start), to prevent
producing an incorrect Wall time equal to sys_stat->boottime.
There should probably be a more robust means of communicating
these libvmon sampling failures to vwm/vmon, but I've thus far
been resisting adding something like an errno to every sample
store, or worse every sample store's datum. It's kind of
non-trivial to do without bloating the sample stores, especially
since the stores consolidate multiple proc files under a single
store/want. Having a single errno in the store would prevent
letting the valid portions of the store be usable while ignoring
the errored portions. Perhaps just a per-store errno with a
bitfield to indicate which subset are errored would suffice...
|
|
|
|
This is a first pass at cleaning up the overlay content rendering
with an eye towards enabling runtime configuration of which
columns are present and their layout.
Nothing is runtime configurable yet, but this changes the drawing
to at least be data-driven using two arrays of column structs,
one for the list of active processes in the upper portion of the
chart, and another for the lower "snowflakes" exited
processes/threads portion.
|
|
comm is where the thread name will be if set, and when set it can
be awkward to then see the process' argv following the thread
name. This reduces the amount of clutter and visual noise for
threaded processes...
|
|
|
|
Was using the underlying chart dimensions which mirror the window
dimensions, which worked fine but this wastes less space in the
produced images when there's not much vertical content.
|
|
Currently only vmon wires this up to --name, but vwm could get
the window title of the window being overlayed and pass that in
if set...
|
|
Preparatory work for supporting --snapshot-on-sigchld to vmon;
add a way to access a chart's pixels outside of the X server.
|
|
Cosmetic change, the plain truncation would occasionally result in
unexpected Hz values in the charts like 59Hz particularly using
vmon w/arbitrary --hertz values.
|
|
Cleanup previous commit that littered MIN length clamping everywhere
%n was being used w/snprintf. Removed the %n usage altogether and
just clamps the return value out of snprintf to the buffer size minus
one for the null terminator.
The standard C library has such awful warts :/
|
|
It appears I overlooked that %n returns the length that would have
been printed regardless of the destination buffer size, not what
was actually written. The man page is misleading here:
n The number of characters written so far is stored into the
integer pointed to by the corresponding argument. That
argument shall be an int *, or variant whose size matches
the (optionally) supplied integer length modifier. No argu‐
ment is converted. (This specifier is not supported by the
bionic C library.) The behavior is undefined if the conver‐
sion specification includes any flags, a field width, or a
precision.
In testing, it isn't the count of what's actually written. It's
oblivious of truncated output scenarios where the output buffer has
been exhausted before reaching the %n. The man page should be
clarified here.
This commit does the simplest thing and simply clamps the length to
the destination buffer - 1 (for the \0). %n is being used to avoid
needing an strlen() in this somewhat hot path, but it might make
sense to instead use the snprintf return value similarly clamped
instead of %n since %n isn't doing what was expected.
|
|
|
|
This loop assumed ancestor->parent was !NULL, and that's not necessarily
always true. Due to these circular linked lists from the kernel's
list.h, they're not simply NULL delimited and we need the pointer to the
actual head to detect the end of the list. In libvmon, the head for the
siblings list is either the parent proc's children member, or the
processes member of the vmon struct. It may be more elegant to switch
to always having a root proc in libvmon, even if it's just synthetic,
for simplifying this crap. But for now, just determine which head is
relevant and check against it for loop termination.
Under some heavy parallel kernel compilations I was seeing occasional
vwm segfaults, and the addr2line of the ip in dmesg mapped to this
particular loop. I'm assuming the ancestor walk landed on a top-level
process and then this sibling detection tried dereferencing the
top-level proc's NULL parent and boom, segfault.
|
|
|
|
This was mixed up a bit in the cleanups... charts of !width represent the
uninitialized charts, so don't copy or free them instead of inhibiting
just the copy.
|
|
`make tags` in the linux kernel revealed that XDrawText can return
a BadLength error, which is not mentioned in the man page.
Glancing at the xorg-server source for doPolyText() this is found:
1192 else { /* print a string */
1193
1194 unsigned char *pNextElt;
1195
1196 pNextElt = c->pElt + TextEltHeader + (*c->pElt) * itemSize;
1197 if (pNextElt > c->endReq) {
1198 err = BadLength;
1199 goto bail;
1200 }
So there appears to bea fairly arbitrary ceiling on how many items one can
pass to XDrawText, and it probably depends on the cumulative length of the
individual items overflowing the maximum request length.
Well.. that's lame, and shrinking the maximum items makes it less likely to
trip over this in practice, but it probably just takes a long enough
individual item to trigger it again.
I had erred on the side of "excessively long" assuming XDrawText would just
deal and clip the text to the bounds of the destination drawable, just in
case there was an argv with lots of tiny items, then that would be covered.
This approach is incompatible with the potential for BadLength errors, so
drastically shrinking the maximum number of items until further notice.
|