Secure Self-Review: Preventing Package Manager Worms

When the vulnerability "npm fails to restrict the actions of malicious npm packages" was announced, I felt the response was underwhelming. To be fair, it's not a problem unique to npm. This is going to be an issue for any package manager or non-centralized software distribution mechanism. And it's very hard to address. Nevertheless, it's also very alarming!

Every programmer should be reviewing their own code before checkin (many do this already, tho a surprising number do not). However, to be an effective security measure, the review should be conducted in an environment isolated from the development environment. Few people do THAT. The proper workflow is:

Make changes and test them in a dev environment.
Switch to the checkin environment, review and commit the changes there.

I would like to make the case for more programmers to use this kind of workflow. And I'm offering textrecv a (crude!) tool to help enable it. It automates the unstated middle step of the above workflow:

Changes are automatically, invisibly and safely duplicated into the checkin environment.

We must consider this tool just a stab in the right direction, which addresses part of the problem rather than a definitive solution. So far, I'm addressing the git version of the npm vulnerability which we started with. textrecv doesn't help you with full-fledged package managers such as npm, rubygems, debian, etc. Self-review is dependent on human judgement and thus fallible. But, if widely adopted this strategy would catch most worms quite quickly. It's just a piece of an overall strategy. I'm not even sure what all the other pieces would be.

To help package managers, there ought to be a service that builds and publishes packages from git projects, perhaps automatically when a tag of a certain format is committed. Ideally this service can make packages without running any code in the project being built. (Difficult to impossible to use with most existing package managers.) An alternative, less attractive tho, is doing builds in a secure sandbox.

But what I'm offering today is just this piece:

textsend and textrecv comprise a system for safely moving source code from one system to another over a network.

textsend pushes source code from a 'dev' system where it is authored and tested to a 'checkin' system running textrecv where it is lightly sanitized. Then git (or another vcs) is used to publish it.

textsend and textrecv automate part of a workflow that help deter unauthorized changes to your projects. However, they are no substitute for thinking. Self-review is still very important. You _must_ review changes before checking the code in. This is the opportunity to find changes that you didn't make, (however unlikely you might consider it) along with all the other types of bugs and problems.

You were reviewing before checkin anyway, right? To make sure you didn't leave in printfs or other debugging code, and to help you compose the commit message, etc. All we're doing is moving the review and check-in to a different environment. Don't think of these happening in dev anymore.

As programmers, we need to take more seriously the responsibility we have. We are creating software to run on other people's computers. We need to take some reasonable steps to ensure the programs we create are free of security holes. And we should be taking steps to make sure we are not the inadvertant agents of malware propagation! I want to promote a culture of greater review of software. Programmers should conduct both self-review and peer review more often.

In order to use textsend/textrecv properly, you must have some other system for achieving privilege isolation: the dev environment should be on a separate jail, container, virtual machine or distinct physical machine from the checkin environment. Proper setup to achieve this isolation is well beyond the scope of this screed, but a few specifics to note:

Dev should not have any way to talk to checkin except to send source code for review and commit.
You should not be able to ssh from dev to checkin. (probably not from dev to anywhere else either.)
Dev should not know any secret keys, passwords, or other secret credentials known to checkin. (Except for the secret gpg2 key used to authenticate data sent from textsend to textrecv.)
In particular, the git password and/or secret key used to push to origin should not be present on dev.
You must never run any of the code transmitted by textrecv in the checkin environment. Best practice, when possible, is for the requisite interpreter or compiler needed to use the code simply not be present on checkin.
Specifically including: tests, build scripts or packaging scripts should never be run on the checkin environment.

You can use git in the dev environment (and should, to merge upstream changes and process pull requests). And you can edit files in the checkin environment (tho better to stay with very safe edits, since you can't test here). But you must test only on dev and certainly never commit or push (or have keys that would allow it) from dev.

textrecv enforces the following restrictions on the data it receives:

Only files and directories, no symlinks, devices, or other special files.
No extended/special attributes, sticky bits, setuid or other weird modes.
No funny business with filenames. In general, only normal-looking ascii is allowed in filenames. Nothing that could be specially interpreted by the shell or common cmdline programs allowed in filenames. No utf8. If you confine yourself to the normal characters for source file names, you will be fine:
```
          a-z A-Z 0-9 . _ -  perhaps a few more
```
No binary files (or more specifically, no ansi escape char, \033 in file contents) allowed. This restriction should still allow the full range of non-escape utf8 chars in file contents.

These restrictions are intended to make it safe to work with the files received by textrecv with ordinary shell tools without risk of system compromise, unexpected behavior, or need to think hard about it.

In order to ensure data integrity, gpg2 is used to sign data sent from the dev to the checkin environments. A gpg2 keypair needs to be shared between the 2 environments.

Why use textrecv instead of the many other ways to move files over a network? Other tools are more general-purpose (even git) and not designed to avoid the perils of potentially malicious source code. rsync, for instance, makes no promises about file integrity (unless you use ssh as a transport). Scp or sftp move files with guaranteed integrity, but require an ssh installation. It is _very_ tricky to install ssh in such a way as to allow file movement only and prohibit all general remote execution. At least, I never saw an easy and fool- proof way to set up scp-only mode. (Think you did it right? Did you prevent write access to ~/.bashrc ? How about to the user's crontab? Or ~/.init/* on upstart-based systems (ubuntu and derivatives) ?) Not that it can't be done, but it's way more difficult it oughtta be. No other tool prevents shenanigans with filenames or escape sequences in file contents. (Tho git tries to avoid them in branch and ref names) And all of those other tools are rather complicated by comparison. textsend and textrecv are (still fairly) small and easy to audit.

The name is terrible. I'm sorry, but I couldn't think of a better one.

The One Ring The Dark Lord has a sinister plan

Secure Self-Review: Preventing Package Manager Worms

Credits