jio/.git - jio is an experimental systemd-journald journal file tool utilizing io

Age	Commit message (Collapse)	Author
2021-08-24	verify-hashed-objects: add `jio verify hashed-objects`	Vito Caputo
	This is currently very hacky and unfinished, but does enough for some performance comparisons against a zstd-using journalctl --verify that has been hacked to return early after the first pass. It's currently rather leaky, the whole per-object-dispatch thingy is illuminating a thunk_h shortcoming and forcing addressing the issue... soon.
2021-08-24	upstream: tie into build system	Vito Caputo
	includes some miscellany to make compiler happy
2021-08-17	thunk_h: bump submodule for and use thunk_free()	Vito Caputo
	thunk.h got some rudimentary environment caching, and in the process introduced thunk_free(). Update the submodule and replace the bare free(closure) callsites with thunk_free(closure). This should help reduce the amount of malloc/free hammering in jio, though there are still quite a few allocations for object payloads in e.g. report-entry-arrays, and the object header space is allocated as a thunk payload which doesn't get environment caching yet.
2021-08-17	report-layout: disable implicit stdio locking	Vito Caputo
	It's not performing threaded stdio on the layout FILE*.
2021-08-17	op: use thunk_dispatch() for cb	Vito Caputo
	Using the thunk->dispatch() method directly as the iou_op cb leaks the thunk
2021-08-16	report-entry-arrays: fix braino in comment	Vito Caputo
	s/quadratically/exponentially/
2021-08-15	report-entry-arrays: read payloads via "fixed" fd	Vito Caputo
	Take advantage of the registered fds.
2021-08-15	journals: register opened journals for "fixed" use	Vito Caputo
	This introduces an journal_t.idx member for "fixed" style usage in io_uring parlance. Once all journals get opened, they're registered in the underlying iou ring. Subsequent operations on these files may supply these idx values as the fd, but must also add IOSQE_FIXED_FILE to the sqe's flags. This commit also switches the already present read operations to use the idx/IOSQE_FIXED_FILE method. Theoretically this should offer some efficiency gains, since the kernel can now skip some per-operation fd handling overheads by having them done once instead @ registration time.
2021-08-15	report-entry-arrays: add %age to Unique EAs line	Vito Caputo
	The bucketized counts already have %ages, but not the overall count.
2021-08-14	report-entry-arrays: cleanup per_object_dispatch()	Vito Caputo
	This doesn't make use of all the parameters it inherited from journal_iter_objects_dispatch(), so at least get rid of them if you're not going to just bite the bullet and refactor this thing out of the picture.
2021-08-14	report-entry-arrays: EntryArrayObject statistics	Vito Caputo
	This gives some visibility into EntryArrayObject duplication and utilization statistics. It's not the tidiest of code, just something I slapped together last night.
2021-01-05	reclaim-tail-waste: disable temporarily	Vito Caputo
	As-is this doesn't update the header's arena_size correctly, so disable it to prevent unsuspecting users from producing journals that journald thinks are corrupt.
2020-12-06	report-layout: remove redundant fflush()	Vito Caputo
	fclose() implicitly calls fflush()...
2020-12-05	report-layout: use lower case 't' for OBJECT_TAG	Vito Caputo
	Reserve upper case just for table type objects like arrays and hash tables.
2020-12-05	report-layout: remove vestigial includes	Vito Caputo
	This is derived from report-usage.c, tidy this up.
2020-12-05	report-layout: implement rudimentary `jio report layout`	Vito Caputo
	This writes a .layout file for every opened journal, which describes the sequential object layout for the respective journal. Sample output: ``` Layout for "user-1000.journal" Legend: ? OBJECT_UNUSED d OBJECT_DATA f OBJECT_FIELD e OBJECT_ENTRY D OBJECT_DATA_HASH_TABLE F OBJECT_FIELD_HASH_TABLE A OBJECT_ENTRY_ARRAY T OBJECT_TAG \|N\| object spans N page boundaries (page size used=4096) \| single page boundary +N N bytes of alignment padding + single byte alignment padding F\|5344 D\|448\|1834896 d81+7 f50+6 d74+6 f48 d82+6 f55+ d84+4 f57+7 d80 f50+6 d122+6 f47+ d74+6 f44+4 d73+7 f44+4 d70+2 f44+4 d72 f45+3 d76+4 f44+4 d75+5 f48 d90+6 f54+2 d80 f54+2 d84+4 f55+ d123+5 f55+ d82+6 f56 d87+ f58+6 d93+3 f53+3 d\|94+2 f54+2 d91+5 f59+5 d119+ f62+2 d107+5 f66+6 d105+7 f48 d108+4 f51+5 d82+6 f49+7 e480 A56 d80 d104 d74+6 d73+7 d107+5 e480 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 A56 d97+7 d107+5 e\|480 A56 A56 A56 d136 d107+5 e480 A56 d74+6 d148+4 d107+5 e480 A88 d107+5 e480 A88 A88 A88 A56 A88 A88 A88 A88 A88 A88 A88 A88 A88 A88 A\|88 A88 A88 A88 A88 A88 A88 d80 d74+6 d107+5 e480 A88 A56 d107+5 e480 A56 A56 A56 d107+5 e480 A56 d107+5 e480 A88 A56 A56 d107+5 e\|480 d80 d74+6 d107+5 e480 d97+7 d107+5 e480 A232 A88 A56 A56 d142+2 d107+5 e480 A232 A232 A232 A232 A232 A\|232 A232 A232 A232 A232 A232 A232 A232 A232 A232 A232 A232 A232 A232 A232 d107+5 e480 d107+5 e\|480 d80 d74+6 d107+5 e480 A232 d107+5 e480 A56 A56 d107+5 e480 d107+5 e480 d107+5 e304 d80 d74+6 d107+5 e\|480 d107+5 e480 A56 A56 d107+5 e480 A232 d107+5 e480 d107+5 e480 A88 d80 d74+6 d107+5 e480 A88 d107+5 e\|480 A56 A56 d107+5 e480 d107+5 e480 A88 A88 ``` This provides insight into the distribution of object types, how much space is spent on alignment padding, how frequently objects land on page boundaries - something preferably avoided especially for small objects near a boundary. In the above example, we can immediately make the observation that early in the journal there are interleaved data and field objects, and this cluster of the initially added data and field objects gets torn by a page boundary. It's possible if they weren't interleaved when adding an entry and its respective data + fields, fitting all the early (common) data objects within a single page might confer some performance gain. The field objects aren't currently used by data object heavy operations, it doesn't make much sense to have them polluting the area of the initially added data objects or really any otherwise clustered data objects.
2020-11-29	journals: s/journal_for_each/journal_iter_objects/	Vito Caputo
	Make naming a bit more descriptive and consistent... journals_for_each() is a simple non-IO-incurring journals array iteration, journal_for_each() generates IO and walks the data objects hash table... they're quite different, and shouldn't have such similar names.
2020-11-25	*: explicitly include assert.h	Vito Caputo

2020-11-25	build: introduce rudimentary autotools	Vito Caputo
	Now it's easy to build even
2020-11-25	src: initial commit of jio WIP source	Vito Caputo
	This is a very quick and dirty experimental hack written in some sort of bastard continuation-passing style in C w/io_uring using journal-file introspection and manipulation duty as an excuse for its existence. Consider this unfinished prototype quality code.
2020-11-25	upstream: import some hacked systemd headers	Vito Caputo
	This brings journal-file data structures and byte swapping helpers I may as well just reuse. I've had to do some minor messing about to make things workable in isolation out of the systemd tree without pulling in too much. I've also added a new HashedObjectHeader type to encompass the slightly larger common header components of FieldObject and DataObject to facilitate a generic hash table iterator that can operate on just loading HashedObjectHeader when nothing more than the object size is needed for accounting. Basically the HashedObjectHeader is ObjectHeader+hash+next_hash_offset.