summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-10-10report-entry-arrays: use journal_read() for payloadHEADmasterVito Caputo
Replace the open coded fixed file read op with calling the buffered journal read interface, trivial cleanup.
2021-10-07report-entry-arrays: produce aggregate totalsVito Caputo
This introduces a final stats print identical to the per-journals stats, but for the running total every per-journal count contributes to.
2021-10-07report-entry-arrays: move stats print to functionVito Caputo
This also adds count/utilized members to the stats struct, so the stats print has them available in lieu of a profile.
2021-10-07report-entry-arrays: move cooked stats into structVito Caputo
More preparatory work for summarized stats...
2021-10-07report-entry-arrays: s/stats/profile/ more or lessVito Caputo
preparatory commit, the existing stats struct is really more about the state needed to produce stats (like the hash table for identifying duplicates)... just rename it to profile. A future commit will introduce a new stats struct encapsulating just the cooked stats currently simply printed per-journal as computed and forgotten, for use both per-journal and in summary.
2021-09-23jio: mention entry-arrays in report usage blurbVito Caputo
2021-09-03journals: privatize everything bufs addedVito Caputo
This turns journal_t into a member of a new private _journal_t, everything bufs-related resides in the outer _journal_t, and the public journals API only passes around the journal_t member pointer. The usual PIMPL in C paradigm; requires some container_of() hoop-jumping in the journals.c code to compute the private address from the public.
2021-09-03journals: add missing le64toh() callVito Caputo
Apparently skipped a line when doing the mechanical addition of this hoop-jumping...
2021-09-03thunk_h: bump submodule to silence parens warningsVito Caputo
2021-08-28report-{layout,usage}: quick fix crashes since 6150dcVito Caputo
These have been broken since 6150dc, this is just a minimal fix to make them work again but there needs to be a proper cleanup of all this at some point.
2021-08-28verify-hashed-objects: add byte counts to summaryVito Caputo
Still dubious of crazy short cached runtime, there must be a bug here Should probably just add a cumulative digest per journal
2021-08-28journals: propagate errors from journal_read()Vito Caputo
(unlikely) dispatch errors were being dropped on the floor
2021-08-27journals: free bufs iovec when registeringVito Caputo
oversight resulting in a small one-time mem leak
2021-08-27journals: implement rudimentary read buffersVito Caputo
This adds eight 8KiB "fixed" buffers per opened journal, recycled in a basic LRU fashion. Any read 8KiB or smaller passes through this cache, simply memcpy()d from the buffer when already resident, or upsized to an 8KiB read when absent, to then be memcpy()d out of the populated buffer when the read into the buffer completes. Any read larger than 8KiB bypasses the buffers to be read directly into the provided destination via iou as if the cache weren't present at all.
2021-08-27verify-hashed-objects: add rudimentary statsVito Caputo
Need some kind of sanity check to verify work is being done as more optmizations get done down below
2021-08-24libiou: bump submodule for batched CQE consumingVito Caputo
Preliminary measurements show significant efficiency gains in at least one `verify hashed-objects` test with a pile of journals: Before: [root@luminesce build]# echo 2 > /proc/sys/vm/drop_caches; grep dm-2 /proc/diskstats; time src/jio verify hashed-objects; grep dm-2 /proc/diskstats 254 2 dm-2 657291 0 134621503 521531 35549 0 4029868 276790 0 553094 798322 0 0 0 0 0 0 real 0m7.595s user 0m2.256s sys 0m11.880s 254 2 dm-2 667507 0 136637545 530163 35550 0 4029948 276790 0 560640 806954 0 0 0 0 0 0 [root@luminesce build]# time src/jio verify hashed-objects real 0m6.366s user 0m1.918s sys 0m10.062s After: [root@luminesce build]# echo 2 > /proc/sys/vm/drop_caches; grep dm-2 /proc/diskstats; time src/jio verify hashed-objects; grep dm-2 /proc/diskstats 254 2 dm-2 647038 0 132604603 511855 35395 0 4028104 276581 0 546827 788436 0 0 0 0 0 0 real 0m5.968s user 0m1.649s sys 0m9.859s 254 2 dm-2 657256 0 134620709 521521 35396 0 4028112 276581 0 552747 798102 0 0 0 0 0 0 [root@luminesce build]# time src/jio verify hashed-objects real 0m4.873s user 0m1.429s sys 0m8.184s
2021-08-24verify-hashed-object: threaded offload via iou_async()Vito Caputo
When a hashed object > 16KiB is encountered, give it to a worker thread via the newly added iou_async(). The idea here is to not hold up the serialized iou_run() machinery while grinding on a large object. 16KiB was pulled out of thin air, I haven't done any profiling to tune this threshold... just slapped this all together last night. Will have to get some journals with larger objects for testing, maybe some coredump.conf::Storage=journal situations, and do some tuning and timing vs. journalctl --verify.
2021-08-24libiou,thunk_h: bump for iou_async()/mt-safe thunksVito Caputo
libiou sprouted a thread pool, to make thunks safe for dispatching/freeing from threads thunk.h needed some compare-and-swap magic in maintaining those free lists. Since libiou pulls in pthreads now, bring pthreads in here too.
2021-08-24verify-hashed-objects: oops thunk_mid() per-journalVito Caputo
This is all forcing clarifying a bunch of details I'm going to keep putting off and just kicking around to try keep functional for now.
2021-08-24verify-hashed-objects: clarify some return pathsVito Caputo
Just adding some missing thunk_{mid,end}() return value filters, which highlights some small optimization opportunities where these instances aren't being reused. Still on the fence about this life-cycle approach and naming though.
2021-08-24thunk_h: bump submodule for new lifecycle modelVito Caputo
Udpate thunk usage througout to explicitly control thunk instance lifecycles from calleees according to new model. This enables discarding a bunch of the per-object dispatch thunks, eliminates some thunk leaks, and I think generally makes the code more expressive and clear about what's going on. Keep in mind this is all experimental and and I'm not spending a whole lot of time on this, it's mostly a toy and exploring some different programming styles I'd never really consider for production/real work. Though it actually has some interesting properties, and produces some surprisingly succinct and readable listings at times once you have the cumbersome building blocks in place. Especially for non-daemon programs where you can basically either log+ignore errors or treat them as fatal, I think this programming style might actually have its place.
2021-08-24report-entry-arrays: whitespace/indentation fixupVito Caputo
trivial
2021-08-24verify-hashed-objects: add `jio verify hashed-objects`Vito Caputo
This is currently very hacky and unfinished, but does enough for some performance comparisons against a zstd-using journalctl --verify that has been hacked to return early after the first pass. It's currently rather leaky, the whole per-object-dispatch thingy is illuminating a thunk_h shortcoming and forcing addressing the issue... soon.
2021-08-24upstream: tie into build systemVito Caputo
includes some miscellany to make compiler happy
2021-08-17thunk_h: bump submodule for and use thunk_free()Vito Caputo
thunk.h got some rudimentary environment caching, and in the process introduced thunk_free(). Update the submodule and replace the bare free(closure) callsites with thunk_free(closure). This should help reduce the amount of malloc/free hammering in jio, though there are still quite a few allocations for object payloads in e.g. report-entry-arrays, and the object header space is allocated as a thunk payload which doesn't get environment caching yet.
2021-08-17report-layout: disable implicit stdio lockingVito Caputo
It's not performing threaded stdio on the layout FILE*.
2021-08-17op: use thunk_dispatch() for cbVito Caputo
Using the thunk->dispatch() method directly as the iou_op cb leaks the thunk
2021-08-16report-entry-arrays: fix braino in commentVito Caputo
s/quadratically/exponentially/
2021-08-16libiou: bump libiou submodule for faster opsVito Caputo
libiou got bulk ops allocating and free lists
2021-08-15report-entry-arrays: read payloads via "fixed" fdVito Caputo
Take advantage of the registered fds.
2021-08-15journals: register opened journals for "fixed" useVito Caputo
This introduces an journal_t.idx member for "fixed" style usage in io_uring parlance. Once all journals get opened, they're registered in the underlying iou ring. Subsequent operations on these files may supply these idx values as the fd, but must also add IOSQE_FIXED_FILE to the sqe's flags. This commit also switches the already present read operations to use the idx/IOSQE_FIXED_FILE method. Theoretically this should offer some efficiency gains, since the kernel can now skip some per-operation fd handling overheads by having them done once instead @ registration time.
2021-08-15libiou: bump libiou submodule for iou_ring()Vito Caputo
In order to make use of the liburing register_files() helper, the struct ring pointer is needed.
2021-08-15report-entry-arrays: add %age to Unique EAs lineVito Caputo
The bucketized counts already have %ages, but not the overall count.
2021-08-14report-entry-arrays: cleanup per_object_dispatch()Vito Caputo
This doesn't make use of all the parameters it inherited from journal_iter_objects_dispatch(), so at least get rid of them if you're not going to just bite the bullet and refactor this thing out of the picture.
2021-08-14report-entry-arrays: EntryArrayObject statisticsVito Caputo
This gives some visibility into EntryArrayObject duplication and utilization statistics. It's not the tidiest of code, just something I slapped together last night.
2021-08-14build: add libcrypto dependencyVito Caputo
Preparatory commit for report-entry-arrays, I used libcrypto for SHA1 "perfect" hashing of the payloads. This way I didn't need to keep the payloads themselves around for counting duplicates, the SHA1 digests suffice.
2021-01-05reclaim-tail-waste: disable temporarilyVito Caputo
As-is this doesn't update the header's arena_size correctly, so disable it to prevent unsuspecting users from producing journals that journald thinks are corrupt.
2020-12-06report-layout: remove redundant fflush()Vito Caputo
fclose() implicitly calls fflush()...
2020-12-05report-layout: use lower case 't' for OBJECT_TAGVito Caputo
Reserve upper case just for table type objects like arrays and hash tables.
2020-12-05report-layout: remove vestigial includesVito Caputo
This is derived from report-usage.c, tidy this up.
2020-12-05report-layout: implement rudimentary `jio report layout`Vito Caputo
This writes a .layout file for every opened journal, which describes the sequential object layout for the respective journal. Sample output: ``` Layout for "user-1000.journal" Legend: ? OBJECT_UNUSED d OBJECT_DATA f OBJECT_FIELD e OBJECT_ENTRY D OBJECT_DATA_HASH_TABLE F OBJECT_FIELD_HASH_TABLE A OBJECT_ENTRY_ARRAY T OBJECT_TAG |N| object spans N page boundaries (page size used=4096) | single page boundary +N N bytes of alignment padding + single byte alignment padding F|5344 D|448|1834896 d81+7 f50+6 d74+6 f48 d82+6 f55+ d84+4 f57+7 d80 f50+6 d122+6 f47+ d74+6 f44+4 d73+7 f44+4 d70+2 f44+4 d72 f45+3 d76+4 f44+4 d75+5 f48 d90+6 f54+2 d80 f54+2 d84+4 f55+ d123+5 f55+ d82+6 f56 d87+ f58+6 d93+3 f53+3 d|94+2 f54+2 d91+5 f59+5 d119+ f62+2 d107+5 f66+6 d105+7 f48 d108+4 f51+5 d82+6 f49+7 e480 A56 d80 d104 d74+6 d73+7 d107+5 e480 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 d97+7 d107+5 e|480 A56 A56 A56 d136 d107+5 e480 A56 d74+6 d148+4 d107+5 e480 A88 d107+5 e480 A88 A88 A88 A56 A88 A88 A88 A88 A88 A88 A88 A88 A88 A88 A|88 A88 A88 A88 A88 A88 A88 d80 d74+6 d107+5 e480 A88 A56 d107+5 e480 A56 A56 A56 d107+5 e480 A56 d107+5 e480 A88 A56 A56 d107+5 e|480 d80 d74+6 d107+5 e480 d97+7 d107+5 e480 A232 A88 A56 A56 d142+2 d107+5 e480 A232 A232 A232 A232 A232 A|232 A232 A232 A232 A232 A232 A232 A232 A232 A232 A232 A232 A232 A232 A232 d107+5 e480 d107+5 e|480 d80 d74+6 d107+5 e480 A232 d107+5 e480 A56 A56 d107+5 e480 d107+5 e480 d107+5 e304 d80 d74+6 d107+5 e|480 d107+5 e480 A56 A56 d107+5 e480 A232 d107+5 e480 d107+5 e480 A88 d80 d74+6 d107+5 e480 A88 d107+5 e|480 A56 A56 d107+5 e480 d107+5 e480 A88 A88 ``` This provides insight into the distribution of object types, how much space is spent on alignment padding, how frequently objects land on page boundaries - something preferably avoided especially for small objects near a boundary. In the above example, we can immediately make the observation that early in the journal there are interleaved data and field objects, and this cluster of the initially added data and field objects gets torn by a page boundary. It's possible if they weren't interleaved when adding an entry and its respective data + fields, fitting all the early (common) data objects within a single page might confer some performance gain. The field objects aren't currently used by data object heavy operations, it doesn't make much sense to have them polluting the area of the initially added data objects or really any otherwise clustered data objects.
2020-11-29journals: s/journal_for_each/journal_iter_objects/Vito Caputo
Make naming a bit more descriptive and consistent... journals_for_each() is a simple non-IO-incurring journals array iteration, journal_for_each() generates IO and walks the data objects hash table... they're quite different, and shouldn't have such similar names.
2020-11-25*: explicitly include assert.hVito Caputo
2020-11-25docs: add LICENSE and README filesVito Caputo
Now there's even instructions and legalese
2020-11-25build: introduce rudimentary autotoolsVito Caputo
Now it's easy to build even
2020-11-25src: initial commit of jio WIP sourceVito Caputo
This is a very quick and dirty experimental hack written in some sort of bastard continuation-passing style in C w/io_uring using journal-file introspection and manipulation duty as an excuse for its existence. Consider this unfinished prototype quality code.
2020-11-25upstream: import some hacked systemd headersVito Caputo
This brings journal-file data structures and byte swapping helpers I may as well just reuse. I've had to do some minor messing about to make things workable in isolation out of the systemd tree without pulling in too much. I've also added a new HashedObjectHeader type to encompass the slightly larger common header components of FieldObject and DataObject to facilitate a generic hash table iterator that can operate on just loading HashedObjectHeader when nothing more than the object size is needed for accounting. Basically the HashedObjectHeader is ObjectHeader+hash+next_hash_offset.
2020-11-25libiou: add libiou submoduleVito Caputo
libiou is a thin veneer over io_uring defining an ergonomic async IO-oriented API. Since jio shouldn't need to do anything computationally intensive, the combination of thunk_h for closures and libiou+io_uring for scheduling closures according to IO completions should be sufficient for an implementation. Though it may prove annoying to not have per-task stacks and the ability to arbitrarily yield and/or delay execution pending completion of non-IO results, we'll see.
2020-11-25thunk_h: add thunk_h submoduleVito Caputo
thunk_h makes using closure-esque callbacks in C somewhat convenient.
© All Rights Reserved