summaryrefslogtreecommitdiff
path: root/README
blob: 21a6bc0dc0568d15762af3182aaa848d637be55d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
`wye` is an input variant of the `tee` utility

Usage is like `tee`: wye [FILE]...

Wye always consumes from its stdin, the optional files specified are
read-multiplexed as alternative inputs to stdin.  All reads are
performed using PIPE_BUF sized buffers which are the atomic units for
unix pipes, any read data is immediately written to stdout.

Someone should try get this command added upstream in GNU Coreutils,
and I'd appreciate credit for the name.  This implementation is a
quick and dirty hack and not particularly robust.

The primary correctness and robustness problem has to do with how
UNIX pipes are implemented.  The intended use of this utility is to
supply pipes as wye's various inputs, and pipes normally buffer writes
such that the read side may get any number of bytes, including a
fraction of what was an atomic write.  The default atomicity
guarantees WRT pipes and PIPE_BUF pertain only to concurrent writers
to the same pipe.  They have zero relevance to semantics at the read
side.

Wye could read every ready fd until exhausting what's immediately
available (EAGAIN/EWOULDBLOCK) in an attempt to combat this, but
there's still the potential for a short write at the writer side when
the pipe's internal buffers are full to come through at the read side
partially.

When a partial record arrives at the read side, wye will naively
propagate that partial record in its output as if it were whole.  Then
other input streams may be interleaved with that partial record, and
the aggregated stream becomes potentially incoherent.  Wye also has no
ability to ensure only a single record passes through from each when
multiple inputs are ready simultaneously for reading, not without
becoming content-aware and parsing the data contents - rendering wye
specialized for a specific content type rather than a generalized
aggregator treating contents as opaque records.

In Plan9 pipes have been implemented differently [0], enabling a tool
like wye to be more naturally robust and correctly implemented.

Since Linux 3.4, the O_DIRECT flag has been implemented for the
pipe2() syscall [1], enabling plan9-like semantics for pipes.  If the
shells available on Linux supported a means for conveniently enabling
O_DIRECT "packetized pipes", a wye-like tool could arguably be
provided in a generalized fashion, as it could have significant
utility.

I suspect a significant factor in landing a tool like this upstream
somewhere like GNU Coreutils will first require getting "packetized
pipes" generally accessible through shells like GNU bash.  I've made a
first attempt in doing so [2], but as one would expect was mostly met
by resistance in what little attention it received.  These things take
time, and require significant buy-in from the community for movement
to occur.  If this feature interests you, show your support on
bug-bash and help work towards exposing the "packetized pipe"
capability via the popular linux shells like GNU bash.


[0] http://man.cat-v.org/plan_9/2/pipe
[1] https://www.man7.org/linux/man-pages/man2/pipe.2.html
[2] https://mail.gnu.org/archive/html/bug-bash/2020-09/msg00076.html
© All Rights Reserved