|author||Vito Caputo <firstname.lastname@example.org>||2020-09-26 17:23:28 -0700|
|committer||Vito Caputo <email@example.com>||2020-09-26 17:26:20 -0700|
After bringing up the issue on bug-bash, it seems worth documenting this state of affairs, for posterity if nothing else.
1 files changed, 53 insertions, 3 deletions
@@ -2,11 +2,61 @@
Usage is like `tee`: wye [FILE]...
-wye always consumes from its stdin, the optional files specified are
+Wye always consumes from its stdin, the optional files specified are
read-multiplexed as alternative inputs to stdin. All reads are
performed using PIPE_BUF sized buffers which are the atomic units for
unix pipes, any read data is immediately written to stdout.
-Someone should try get this command added upstream in GNU Coreutils
-upstream, and give me credit for the name. This implementation is a
+Someone should try get this command added upstream in GNU Coreutils,
+and I'd appreciate credit for the name. This implementation is a
quick and dirty hack and not particularly robust.
+The primary correctness and robustness problem has to do with how
+UNIX pipes are implemented. The intended use of this utility is to
+supply pipes as wye's various inputs, and pipes normally buffer writes
+such that the read side may get any number of bytes, including a
+fraction of what was an atomic write. The default atomicity
+guarantees WRT pipes and PIPE_BUF pertain only to concurrent writers
+to the same pipe. They have zero relevance to semantics at the read
+Wye could read every ready fd until exhausting what's immediately
+available (EAGAIN/EWOULDBLOCK) in an attempt to combat this, but
+there's still the potential for a short write at the writer side when
+the pipe's internal buffers are full to come through at the read side
+When a partial record arrives at the read side, wye will naively
+propagate that partial record in its output as if it were whole. Then
+other input streams may be interleaved with that partial record, and
+the aggregated stream becomes potentially incoherent. Wye also has no
+ability to ensure only a single record passes through from each when
+multiple inputs are ready simultaneously for reading, not without
+becoming content-aware and parsing the data contents - rendering wye
+specialized for a specific content type rather than a generalized
+aggregator treating contents as opaque records.
+In Plan9 pipes have been implemented differently , enabling a tool
+like wye to be more naturally robust and correctly implemented.
+Since Linux 3.4, the O_DIRECT flag has been implemented for the
+pipe2() syscall , enabling plan9-like semantics for pipes. If the
+shells available on Linux supported a means for conveniently enabling
+O_DIRECT "packetized pipes", a wye-like tool could arguably be
+provided in a generalized fashion, as it could have significant
+I suspect a significant factor in landing a tool like this upstream
+somewhere like GNU Coreutils will first require getting "packetized
+pipes" generally accessible through shells like GNU bash. I've made a
+first attempt in doing so , but as one would expect was mostly met
+by resistance in what little attention it received. These things take
+time, and require significant buy-in from the community for movement
+to occur. If this feature interests you, show your support on
+bug-bash and help work towards exposing the "packetized pipe"
+capability via the popular linux shells like GNU bash.