Bridging Language Divides in WebAssembly

Posted on January 15, 2024

A question that I often hear asked is:

Do you ever worry about fragmentation in Wasm?

The answer is yes. Wow, yes. I worry about fragmentation in Wasm a lot. Every time I see some headline breathlessly proclaiming some new XYZ ported to "Wasm", my first thought is, cool!

A single rubber duck, colored blue with many-colored dots

But my second thought is....

"Which Wasm?"

There are many different "Wasms" in practice today. They all share the same core Wasm language, but differ in how they communicate with the surrounding platform.

A circle of none rubber docs, all different colors and patterns

To be sure, having different APIs in different places is a good thing. Wasm's complete separation of I/O facilities from computation is one of the great things about its design. It was built to be embedded in many different places and use many different APIs, to specialize for many different use cases.

But one thing core Wasm doesn't provide a clear answer to is...

What even is a Wasm API?

Programming languages differ from each other in countless ways, and rather than trying to explicitly support all these differences, core Wasm's type system is very low-level, to give languages the control they need to do things their way. But, this also means that when it's time for one language to talk to another, it isn't straightforward, even though they're using the same underlying core language.

Ideally, we'd like to be able to link together Wasm code produced from one language with Wasm code produced in another, which presents some interesting challenges. But even before think about that, at the very least, we want to be able to run Wasm code produced from one programming language on a host implemented in another, to avoid ending up with a fragmented ecosystem. To do even just that, we need to think about cross-language interfaces.

Cross-language interfaces

Looking outside of WebAssembly at how other systems have handled cross-language interfaces, there are roughly three different categories of approaches.

Point-to-point

First, point-to-point systems connect one specific language to another specific language. For example, using a system like PyO3, one might write Rust code, that talks to Python code, like this:

use pyo3::prelude::*;

/// Formats the sum of two numbers as string.
#[pyfunction]
fn sum_as_string(a: usize, b: usize) -> PyResult<String> {
    Ok((a + b).to_string())
}

Point-to-point systems like this make a tradeoff; at least one of the two sides knows what language it's talking to. With PyO3, the Rust code knows it's talking to Python, so it wouldn't be able to talk to any other language. The upside of that tradeoff is that by being specialized, point-to-point systems can provide the a high level of what I'll call integration. PyO3 can make much of the expressivity of Python available to the Rust code, and it can do so very efficiently.

Language-family

Next, language-family systems are mildly specialized to connect a family of languages that all share something in common with each other. For example, on the Web, many different JS-like languages can talk to each other by both sides pretending to be JS. On Unix-like operating systems, many different C-like languages can talk to each other by both sides pretending to be C.

Language-family systems tend to provide somewhat less integration than point-to-point systems. Any unique feature of any individual language that isn't common across the entire family tends to be hidden.

For example, ClojureScript code can talk to any JavaScript-family language by pretending to be JavaScript. But, JavaScript types don't all of the same types as ClojureScript. ClojureScript has to provide a function named clj->js to convert ClojureScript values to JavaScript values, and it's lossy:

(clj->js [:red "green" 'blue])
;;=> #js ["red" "green" "blue"]

ClojureScript has feature like symbols and JavaScript doesn't have, so when they get converted to JavaScript, they often need to be converted into something JavaScript does have, such as strings.

All-to-all

Finally, all-to-all systems are designed to connect any language to any other language. Each side is typically unaware of the language of the other side. Often this is done using an Interface Description Language (IDL), so that the interface between languages can be described in a language-independent way.

All-to-all systems are common in RPC protocols, such as network protocols, where it's especially desirable to be able to implement client and server in different languages.

IDLs also have a lot of similarity with database schema languages. Both need to define datatypes, and both typically have a strong need to keep the data independent of the programming languages that will produce or consume the data. And, both have a need for the data to be meaningful without existing within a particular address space or a GC heap with an arbitrary reference graph.

What Would Wasm Want?

First of all, there is no one answer that's best for all situations. There are tradeoffs in each of these three categories, so no single cross-language interface system will work best for all situations. All three should, and can, coexist within the Wasm ecosystem.

At the same time, the only way to avoid fragmentation across language boundaries, is to pursue an overarching all-to-all approach. Point-to-point and language-family approaches can be nested inside when needed, but a primary all-to-all approach is the only way to ensure that every language can participate in the ecosystem, without having to be a member of the right language family, or without having a blessed language that everyone has to pretend to be.

This is one of the unique opportunities in Wasm. In contrast, Unix, the JVM, the CLR, and JavaScript are all language-family platforms. They each start with their respective blessed language, and oblige all other languages to talk to each other by pretending to be that blessed language.

Wasm, on the other hand, doesn't have an inherent blessed language.

wait wait wait, everyone knows that C is the blessed language

The Wasm MVP explicitly focused on C/C++. Many people's first introduction to Wasm was filled with pointers and offsets and struct layouts. It gave a lot of people the impression that Wasm was settling into its place in the Unix tradition of using C as its blessed language of communication. Not everyone was happy about that, but a lot of people weren't surprised by it.

And after all, it seems; surely this is inevitable, since C is the universal language. All information on a computer is ultimately just bytes, and C pointers can point to any bytes in memory. C can talk to anything, in a way that most other languages cannot, and that makes it uniquely suited to be the glue between all languages.

Except that it isn't.

😱

Here's the thing. There are several wrinkles in that story. One very big wrinkle is Wasm GC.

Wasm GC is not bytes.

Wasm GC types fundamentally can't be pointed to by C pointers.

C is not the universal language on Wasm.

🤯

oh i knew it all along, GC is the answer

As much as C-style ABIs feel like the obvious answer to some people, there are other people to whom GC types are just as obviously the answer.

After all, if you look at the JVM, the CLR, or JavaScript, all very popular platforms that host wide varieties of programming languages, and the way they typically communicate between languages is by having a set of GC types provided by the platform that everyone just uses. It's simple, efficient, and proven. And unlike C, it doesn't have scary memory lifetime hazards. So it's the obvious answer for Wasm.

Except, there are winkles with that approach too. One is that even though C/C++ won't be the blessed language, they still do matter, and GC types can't point to linear memory, so they aren't universal either.

Another is that Wasm GC is being designed to be as language-independent as possible, which has led it away from attempting to provide one-size-fits-all opinionated types for things like "string", "list", and so on, because in practice programming languages, even just GC programming languages, have differing needs. Wasm instead aims to provide primitive constructs that programming languages can use to build higher-level types.

RPC you seeing me?

Wasm isn't the first place in computing to have a need to connect different languages without having an obvious "blessed language". Networking protocols in particular are an area where no single language took hold, in part because most languages' type systems have things like pointers to mutable data, which is awkward to share over a network. Popular network protocols have often turned to IDLs, such as OpenAPI, Protobufs, or others, which make them all-to-all systems.

Should Wasm use an existing RPC system?

When we scale up software systems, they tend to become distributed systems, so using an RPC protocol is tempting, as it would mean we'd be ready to go distributed, out of the box. On the other hand, one of the lessons from CORBA is that making everything network-aware makes everything harder.

And, encoding calls into bytes and decoding them on the callee side has overhead, and it's overhead that would be difficult to optimize away.

So what we'd ideally want is a system that uses an IDL to achieve the same kind of all-to-all cross-language properties that RPC systems have, but which is isn't tied to either bytestream serialization or network awareness.

The Wasm component model

The Wasm component model is an all-to-all system. It has an IDL, and connects languages to each other without either side being aware of the other. And it isn't tied to bytestream serialization or network awareness.

There's a lot in the component model, but to get a taste of how it works, consider a type like string. Instead of making it a type in the core language type system, string is a type in the interface type system. That means it doesn't have a fixed representation or even a fixed set of operations. It's just defined in terms of a set of values, which for string is all sequences of Unicode Scalar Values.

Bindings for individual language work by encoding descriptions of how the Unicode Scalar Values are represented within their languages. This avoids either side of an interface knowing how the other side represents its values. And, it provides enough information to linkers to insert whatever adaptation code is needed:

If Wasm code is passing a string to the host, the host can just read the string data straight from the Wasm code's memory. No copying is needed in many cases!
If Wasm code using UTF-8 strings is passing a string to Wasm code using UTF-16 strings, the linking process can transparently insert UTF-8 to UTF-16 transcoding between then, so that strings can be passed without either side knowing the encoding of the other side.
If Wasm code is passing a string to Wasm code using the same encoding, the data can be copied. A copy may sound expensive to some ears, but, in today's C-like ABIs, there isn't a way to perform this linking at all, so this isn't a regression. And in GC land, there are possible ideas for how this copy could get optimized away in the future.

So there's a lot more to it than this, but hopefully this gives a taste of how the system works.

For more information about using the component model, see the component model documentation.

Wrap up

There's a lot more to the component model; this blog post is just about putting the cross-language aspects of the design in perspective.

Wasm needs an over-arching cross-language interface system, if it's to avoid long-term language-based fragmentation. The component model works differently from what people expecting it to be just C ABIs expect, and also different from what people expecting just GC types expect, but it has the properties that a unified ecosystem needs.

A single rubber duck, colored blue with many-colored dots