Posted by Becker Polverini on 17 November 2015
An old adage in cryptography is that one should never "roll his or her own crypto." Besides being really hard to get right, it is stress-inducing, time-consuming, and tedious. It quickly becomes a black hole of code review and worry.
This article is a look at how we were able to take good C crypto, and call it from our Clojure backend and our Clojurescript frontend without having to change a single line of a trusted base.
Before we get started, we should mention that browser crypto is a bad idea if your goal is to fight the Man. It is, on the other hand, a good idea if the goal is to prevent spreading sensitive data across caches and microservices. For example, if we delete a key we could never read (it's encrypted client-side), it is now rendered inert everywhere in our backend.
In the case of Balboa, we never want to see our users' data and we love the literature about crypto providing elegant solutions to access control. Also, since Asm.js keeps getting faster and more available, this may pay greater dividends in future browser releases.
It is probably a matter of time until someone publishes something like Upton Sinclair's The Jungle regarding the mismanagement of our data by cloud services. We want to be ahead of the curve in practicing good data stewardship.
Our big question was the following, "Is it possible to take something developers already trust and call that as faithfully as possible from Clojure and Clojurescript?"
After some research, we had a gut feeling it would look something like the above. It seemed sensible enough, so we set to work finding suitable algorithms for the experiment.
After exploring some potential issues with Emscripten compilation (timing attacks, memset's being removed silently, etc.), we went looking for a PRF algorithm that would be simple for emcc to compile faithfully. We decided on Skein/Threefish.
Skein's NIST x86 implementation had no strange assembly instructions to worry about and had already gone through the review process of the SHA-3 competition. In a later post, we will talk about some of the tricks we used for NaCl and scrypt, which will cover tradeoffs around when to write Clojurescript or C.
Skein was one of the SHA-3 finalists that lost to Keccak. Keccak is great, but we like Threefish, the tweakable block cipher that makes up Skein, for its performance on x86_64.
Cutting to the chase, we ended up with the following structure for the Skein components in our crypto. You will notice an unexpected shim.
The first hurdle we encountered with JNI and Emscripten is what to do with structs. In the case of something like Skein, there is a struct, Skein1024_Ctxt_t, that contains all of the state for the pseudorandom function. This struct must be initialized and passed around for any incremental hashing operations.
In Emscripten, you can only pass "number" (int, float, memory address) or "string." Also, in Java, the passing of objects through the JNI boundary is taxing.
One way to portably share code across the platforms was memcpy'ing, back and forth, structs into uint8_t *. Once a struct gets represented as a uint8_t * and passed up to Clojure and Clojurescript as either a Uint8Array or a byte[] respectively, it can no longer be sensibly mutated. Fortunately, all modifications happen in C, where the uint8_t* can be memcpy'd back into a struct before operations are done on it.
skein_shim.c wraps the initialization and teardown of these uint8_t* into structs so that the API exposed to both Emscripten and JNI is always uint8_t *. For Emscripten, HEAPU8, the heap in the Asm.js virtual machine, is of type Uint8Array. No translation of the sort done in JNI for byte[] to uint8_t * is required. Instead, a set call is required to bring the buffer that exists outside of Emscripten's heap, into HEAPU8: a far simpler task.
You might be thinking about explicit_bzero or memset_s right about now for the data that is being copied around. Some is sensitive material, like the chaining vars used in Skein. Unfortunately, the JVM and the browser's Javascript interpreter give us little control over how memory is managed. If this concern is part of your threat model, then JVM and JS cryptography must be off limits for your project.
If you want Emscripten code to run quickly, you generally have to set ALLOW_MEMORY_GROWTH=0 at Emscripten-compile-time, forcing you to work with a finite amount of heap. Calling malloc means calling free. Clojurescript offers some really elegant means for controlling your memory usage with Emscripten.
Like most cool things with Clojure(script), it involves a macro:
(defmacro with-heap [bindings & body]
(cond
(= (count bindings) 0) `(do
(binding [*freeable* false]
~@body))
(symbol? (bindings 0)) `(binding [*freeable* true]
(let ~(subvec bindings 0 2)
(try
(with-heap ~(subvec bindings 2) ~@body)
(finally
(._free s/Module ~(bindings 0))))))
:else (throw "with-heap only allows Symbols in bindings")))
(defmacro leaky! [& body]
`(binding [*freeable* true]
~@body))
(defn malloc! [n]
(if-not *freeable*
(assert false "not freeable")
(._malloc js/Module n)))
Now, in Clojurescript, we can do Emscripten memory management in a more friendly way than straight Javascript. The fun doesn't just stop here: One could easily improve the above with a version that clears memory before returning it to HEAPU8, for the data that merits the treatment.
Nothing novel here in terms of macro tricks. This is a ripoff of clojure.core/with-open with some Emscripten-specific changes that you may find useful.
Now that the lower-level plumbing is in place, we can look at an example of making a C HMAC function callable from Clojurescript. It involves three parts: First, how to wrap Emscripten-compiled functions in something callable from Javascript; second, how to properly handle the heap; and, third, how to wrap the Javascript function in something more Clojurescript friendly.
(def type->string
{:pointer "number"
:byte "number"
:int "number"
:long ["number" "number"]
:void nil})
(defn cwrap [fn-name ret-type signatures]
(.cwrap js/Module fn-name
(type->string ret-type)
(clj->js
(flatten
(map type->string signatures)))))))
Above is nothing more than a simple wrapper for Module.cwrap(name, ret_type, arg_types), the mechanism for making functions callable from Javascript. We elected to make a map for translating data-types into the Emscripten representation, just because it is more explicit and easier for comparison against, say, a C header file. You'll notice that "long" is a vector. This is because Emscripten represents 64-bit integers as two numbers, since, 253 is the largest representable integer in Javascript.
(def emscripten-heap (.-HEAPU8 js/Module))
(defn heap-subarray [address sz]
(.subarray emscripten-heap address (+ address sz)))
(defn array-clone [a]
(doto (js/Uint8Array. (.-length a))
(.set a)))
(defn heap->uint8array [address sz]
(array-clone (heap-subarray address sz)))
(defn uint8array->heap! [uint8-buffer]
(let [n (.-length uint8-buffer)
buffer (malloc! n)]
(.set (heap-subarray buffer n) uint8-buffer)
buffer))
Manipulating data inside Emscripten requires copying data in and out of its own heap. We used the above methods for bringing Uint8Array back and forth.
For data being passed into the heap, the method is as follows: After malloc'ing space, a slice of the whole HEAPU8 is set.
In order to take data out of Emscripten, data is copied out of the heap, back into a regular Uint8Array, by representing the "memory address" and range as a slice of HEAPU8, and copying it into a new buffer. The cloned array is returned, and the allocation of Emscripten heap is free'd upon returning.
Our goal was to make sure this copying back and forth was negligible with respect to the amount of work being done in Emscripten-space.
(def skein-block-bytes 128)
(def hmac-init
(let [hmac-impl (cwrap "skein_hmac_init" :void [:pointer :pointer :int])]
(fn [key]
(with-heap [context (malloc! context-bytes)
key-heap (uint8array->heap! key)]
(hmac-impl context key-heap (.-length key))
(heap->uint8array context context-bytes)))))
(def hash-update
(let [hash-impl (cwrap "skein_hash_update" :void [:pointer :pointer :int])]
(fn [context message]
(with-heap [context-heap (uint8array->heap! context)
message-heap (uint8array->heap! message)]
(hash-impl context-heap message-heap (.-length message))
(.set context (heap-subarray context-heap context-bytes))))))
(def hash-final
(let [hash-impl (cwrap "skein_hash_final" :void [:pointer :pointer])]
(fn [context]
(with-heap [context (uint8array->heap! context)
output (malloc! skein-block-bytes)]
(hash-impl context output)
(heap->uint8array output skein-block-bytes)))))
And, now, the payoff: We can generate the HMAC for a given message! About time, if you ask me.
My next post will talk about how we implemented CTR mode and used Poly1305 to get some 8x speedups over our initial, naive implementations of authenticated encryption. Since we encrypt things like 500Mb files in Balboa all the time, we need all the speed we can get!
PKC Security
8092 Warner Ave.
Huntington Beach, CA 92647