In a previous post I made pipes in unix shells more reliable. Well, it had some drawbacks. I’ll summarize the problem, the failed previous version, and then show the new and improved one.

Problem summary

Downstream processes in a unix shell pipe cannot know if the upstream finished successfully, or exited with an error. This means that it can’t know if it should “commit” the data it received.

Example uses:

$ pg_dumpall | xz -9 | google_cloud_storage_upload gs://bucket/path/postgres.dump
$ generate_data | psql --single-transaction

In both of these cases you want the right hand side to STOP, and not finalize the upload or commit the transaction.

The previous version

$ goodpipe <<EOF
[
  ["gsutil", "cat", "gs://example/input-unsorted.txt"],
  ["sort", "-S300M", "-n"],
  ["gzip", "-9"],
  ["gsutil", "cp", "-", "gs://example/input-sorted-numerically.txt.gz"]
]
EOF

This works fine for simple cases, but doesn’t support tee or per-command environment variables very well.

And I don’t want to invent a complex language, so my replacement took a different path.

wp — Wrap Pipe

wp on github.

wp instead wraps the input and/or output with a very minimal encapsulating protocol. This allows normal data to pass through, but still allows the downstream to get EOF as metadata.

If the data stream ends before receiving the EOF marker, then do not commit. The wrapped downstream child process sees this as stdin remaining open, and instead it’s getting terminated with a signal.

wp can either encapsulate when it wraps something that outputs data, with wp -o, or decapsulate and receive the EOF marker when it’s handling input data, or both.

Examples

$ wp -o pg_dumpall | wp -io xz -9 | wp -i google_cloud_storage_upload gs://bucket/path-postgres.dump
$ wp -o generate_data | wp -i psql --single-transaction

Quick install, if you have cargo

cargo install --locked wp-cli