diff options
author | Vito Caputo <vcaputo@pengaru.com> | 2020-09-26 17:23:28 -0700 |
---|---|---|
committer | Vito Caputo <vcaputo@pengaru.com> | 2020-09-26 17:26:20 -0700 |
commit | d350e1c2b22cbc7b314790096f758d462385ef5e (patch) | |
tree | be0de659903190d43c1f4103587f907dfee95a76 | |
parent | 2944f70ee33de24b023ad1edb04246e4b71eef17 (diff) |
After bringing up the issue on bug-bash, it seems worth
documenting this state of affairs, for posterity if nothing else.
-rw-r--r-- | README | 56 |
1 files changed, 53 insertions, 3 deletions
@@ -2,11 +2,61 @@ Usage is like `tee`: wye [FILE]... -wye always consumes from its stdin, the optional files specified are +Wye always consumes from its stdin, the optional files specified are read-multiplexed as alternative inputs to stdin. All reads are performed using PIPE_BUF sized buffers which are the atomic units for unix pipes, any read data is immediately written to stdout. -Someone should try get this command added upstream in GNU Coreutils -upstream, and give me credit for the name. This implementation is a +Someone should try get this command added upstream in GNU Coreutils, +and I'd appreciate credit for the name. This implementation is a quick and dirty hack and not particularly robust. + +The primary correctness and robustness problem has to do with how +UNIX pipes are implemented. The intended use of this utility is to +supply pipes as wye's various inputs, and pipes normally buffer writes +such that the read side may get any number of bytes, including a +fraction of what was an atomic write. The default atomicity +guarantees WRT pipes and PIPE_BUF pertain only to concurrent writers +to the same pipe. They have zero relevance to semantics at the read +side. + +Wye could read every ready fd until exhausting what's immediately +available (EAGAIN/EWOULDBLOCK) in an attempt to combat this, but +there's still the potential for a short write at the writer side when +the pipe's internal buffers are full to come through at the read side +partially. + +When a partial record arrives at the read side, wye will naively +propagate that partial record in its output as if it were whole. Then +other input streams may be interleaved with that partial record, and +the aggregated stream becomes potentially incoherent. Wye also has no +ability to ensure only a single record passes through from each when +multiple inputs are ready simultaneously for reading, not without +becoming content-aware and parsing the data contents - rendering wye +specialized for a specific content type rather than a generalized +aggregator treating contents as opaque records. + +In Plan9 pipes have been implemented differently [0], enabling a tool +like wye to be more naturally robust and correctly implemented. + +Since Linux 3.4, the O_DIRECT flag has been implemented for the +pipe2() syscall [1], enabling plan9-like semantics for pipes. If the +shells available on Linux supported a means for conveniently enabling +O_DIRECT "packetized pipes", a wye-like tool could arguably be +provided in a generalized fashion, as it could have significant +utility. + +I suspect a significant factor in landing a tool like this upstream +somewhere like GNU Coreutils will first require getting "packetized +pipes" generally accessible through shells like GNU bash. I've made a +first attempt in doing so [2], but as one would expect was mostly met +by resistance in what little attention it received. These things take +time, and require significant buy-in from the community for movement +to occur. If this feature interests you, show your support on +bug-bash and help work towards exposing the "packetized pipe" +capability via the popular linux shells like GNU bash. + + +[0] http://man.cat-v.org/plan_9/2/pipe +[1] https://www.man7.org/linux/man-pages/man2/pipe.2.html +[2] https://mail.gnu.org/archive/html/bug-bash/2020-09/msg00076.html |