Note: we are in the process of migrating all post-MVP features to tracking issues.
|ECMAScript module integration
Note: these will soon move to tracking issues.
= Essential features we want to prioritize adding shortly after the MVP.
This is covered in the tooling section.
Post-MVP, some form of feature-testing will be required. We don’t yet have the
experience writing polyfills to know whether
has_feature is the right
primitive building block so we’re not defining it (or something else) until we
gain this experience. In the interim, it’s possible to do a crude feature test
eval-ing WebAssembly code and catching
See Feature test for a more detailed sketch.
Provide access to safe OS-provided functionality including:
map_file(addr, length, Blob, file-offset): semantically, this operator
copies the specified range from
Blob into the range
addr+length <= memory_size) but implementations are encouraged
mmap(addr, length, MAP_FIXED | MAP_PRIVATE, fd)
discard(addr, length): semantically, this operator zeroes the given range
but the implementation is encouraged to drop the zeroed physical pages from
the process’s working set (e.g., by calling
shmem_create(length): create a memory object that can be simultaneously
shared between multiple linear memories
map_shmem(addr, length, shmem, shmem-offset): like
MAP_SHARED, which isn’t otherwise valid on read-only Blobs
mprotect(addr, length, prot-flags): change protection on the range
[addr, addr+length) (where
addr+length <= memory_size)
decommit(addr, length): equivalent to
mprotect(addr, length, PROT_NONE)
discard(addr, length) and potentially more efficient than
performing these operators in sequence.
length parameters above would be required to be multiples of
mprotect operator would require hardware memory protection to execute
efficiently and thus may be added as an “optional” feature (requiring a
feature test to use). To support efficient execution even when
no hardware memory protection is available, a restricted form of
could be added which is declared statically and only protects low memory
(providing the expected fault-on-low-memory behavior of native C/C++ apps).
The above list of functionality mostly covers the set of functionality
provided by the
mmap OS primitive. One significant exception is that
can allocate noncontiguous virtual address ranges. See the
FAQ for rationale.
Some platforms offer support for memory pages as large as 16GiB, which can improve the efficiency of memory management in some situations. WebAssembly may offer programs the option to specify a larger page size than the default.
Some types of control flow (especially irreducible and indirect) cannot be expressed with maximum efficiency in WebAssembly without patterned output by the relooper and jump-threading optimizations in the engine. Target uses for more expressive control flow are:
Options under consideration:
switch combined with jump-threading are enough.
goto (direct and indirect).
The WebAssembly MVP will support the wasm32 mode of WebAssembly, with linear memory sizes up to 4 GiB using 32-bit linear memory indices. To support larger sizes, the wasm64 mode of WebAssembly will be added in the future, supporting much greater linear memory sizes using 64-bit linear memory indices. wasm32 and wasm64 are both just modes of WebAssembly, to be selected by a flag in a module header, and don’t imply any semantics differences outside of how linear memory is handled. Platforms will also have APIs for querying which of wasm32 and wasm64 are supported.
Of course, the ability to actually allocate this much memory will always be subject to dynamic resource availability.
It is likely that wasm64 will initially support only 64-bit linear memory indices, and wasm32 will leave 64-bit linear memory indices unsupported, so that implementations don’t have to support multiple index sizes in the same instance. However, operators with 32-bit indices and operators with 64-bit indices will be given separate names to leave open the possibility of supporting both in the same instance in the future.
Coroutines will eventually be part of C++ and is already popular in other programming languages that WebAssembly will support.
See the asm.js RFC for a full description of signature-restricted Proper Tail Calls (PTC).
Useful properties of signature-restricted PTCs:
goto via function-pointer calls.
A compiler can exert some amount of control over register allocation via the ordering of arguments in the PTC signature.
General-purpose Proper Tail Calls would have no signature restrictions, and therefore be more broadly usable than Signature-restricted Proper Tail Calls, though there would be some different performance characteristics.
The initial SIMD API will be a “short SIMD” API, centered around fixed-width 128-bit types and explicit SIMD operators. This is quite portable and useful, but it won’t be able to deliver the full performance capabilities of some of today’s popular hardware. There is a proposal in the SIMD.js repository for a “long SIMD” model which generalizes to wider hardware vector lengths, making more natural use of advanced features like vector lane predication, gather/scatter, and so on. Interesting questions to ask of such an model will include:
What happens when code uses long SIMD on a hardware platform which doesn’t support it? Reasonable options may include emulating it without the benefit of hardware acceleration, or indicating a lack of support through feature tests.
WebAssembly is a new virtual ISA, and as such applications won’t be able to simply reuse their existing JIT-compiler backends. Applications will instead have to interface with WebAssembly’s instructions as if they were a new ISA.
Applications expect a wide variety of JIT-compilation capabilities. WebAssembly should support:
WebAssembly’s JIT interface would likely be fairly low-level. However, there are use cases for higher-level functionality and optimization too. One avenue for addressing these use cases is a JIT and Optimization library.
Presently, when an instruction traps, the program is immediately terminated. This suits C/C++ code, where trapping conditions indicate Undefined Behavior at the source level, and it’s also nice for handwritten code, where trapping conditions typically indicate an instruction being asked to perform outside its supported range. However, the current facilities do not cover some interesting use cases:
i32.min_s: signed minimum
i32.max_s: signed maximum
i32.min_u: unsigned minimum
i32.max_u: unsigned maximum
sext(x, y) is
i32.abs_s: signed absolute value (traps on
i32.bswap: sign-agnostic reverse bytes (endian conversion)
i32.clrs: sign-agnostic count leading redundant sign bits (defined for
all values, including 0)
i32.floor_div_s: signed division (result is floored)
f32.minnum: minimum; if exactly one operand is NaN, returns the other operand
f32.maxnum: maximum; if exactly one operand is NaN, returns the other operand
f32.fma: fused multiply-add (results always conforming to IEEE 754-2008)
f64.minnum: minimum; if exactly one operand is NaN, returns the other operand
f64.maxnum: maximum; if exactly one operand is NaN, returns the other operand
f64.fma: fused multiply-add (results always conforming to IEEE 754-2008)
maxnum operators would treat
-0.0 as being effectively less
0.0. Also, it’s advisable to follow the IEEE 754-2018 draft, which has
removed IEEE 754-2008’s
maxNum (which return qNaN when either
operand is sNaN) and replaced them with
which prefer to return a number even when one operand is sNaN.
Note that some operators, like
fma, may not be available or may not perform
well on all platforms. These should be guarded by
feature tests so that if available, they behave consistently.
f32.reciprocal_approximation: reciprocal approximation
f64.reciprocal_approximation: reciprocal approximation
f32.reciprocal_sqrt_approximation: reciprocal sqrt approximation
f64.reciprocal_sqrt_approximation: reciprocal sqrt approximation
These operators would not required to be fully precise, but the specifics would need clarification.
For 16-bit floating point support, it may make sense to split the feature into two parts: support for just converting between 16-bit and 32-bit or 64-bit formats possibly folded into load and store operators, and full support for actual 16-bit arithmetic.
128-bit is an interesting question because hardware support for it is very rare, so it’s usually going to be implemented with software emulation anyway, so there’s nothing preventing WebAssembly applications from linking to an appropriate emulation library and getting similarly performant results. Emulation libraries would have more flexibility to offer approximation techniques such as double-double arithmetic. If we standardize 128-bit floating point in WebAssembly, it will probably be standard IEEE 754-2008 quadruple precision.
WebAssembly floating point conforms IEEE 754-2008 in most respects, but there are a few areas that are not yet covered.
To support exceptions and alternate rounding modes, one option is to define an
alternate form for each of
alternate forms would have extra operands for rounding mode, masked traps, and
old flags, and an extra result for a new flags value. These operators would be
fairly verbose, but it’s expected that their use cases will be specialized. This
approach has the advantage of exposing no global (even if only per-thread)
control and status registers to applications, and to avoid giving the common
operators the possibility of having side effects.
Debugging techniques are also important, but they don’t necessarily need to be in the spec itself. Implementations are welcome (and encouraged) to support non-standard execution modes, enabled only from developer tools, such as modes with alternate rounding, or evaluation of floating point operators at greater precision, to support techniques for detecting numerical instability, or modes using alternate NaN bitpattern rules, to carry diagnostic information and help developers track down the sources of NaNs.
To help developers find the sources of floating point exceptions, implementations may wish to provide a mode where NaN values are produced with payloads containing identifiers helping programmers locate where the NaNs first appeared. Another option would be to offer another non-standard execution mode, enabled only from developer tools, that would enable traps on selected floating point exceptions, however care should be taken, since not all floating point exceptions indicate bugs.
Many popular CPUs have significant stalls when processing subnormal values, and support modes where subnormal values are flushed to zero which avoid these stalls. And, ARMv7 NEON has no support for subnormal values and always flushes them. A mode where floating point computations have subnormals flushed to zero in WebAssembly would address these two issues.
There are two different use cases here, one where the application wishes to handle overflow locally, and one where it doesn’t.
When the application is prepared to handle overflow locally, it would be useful to have arithmetic operators which can indicate when overflow occurred. An example of this is the checked arithmetic builtins available in compilers such as clang and GCC. If WebAssembly is made to support nodes with multiple return values, that could be used instead of passing a pointer.
There are also several use cases where an application does not wish to handle
overflow locally. One family of examples includes implementing optimized bignum
includes compiling code that doesn’t expect overflow to occur, but which wishes
to have overflow detected and reported if it does happen. These use cases would
ideally like to have overflow trap, and to allow them to
handle trap specially. Following the rule that explicitly signed and
unsigned operators trap whenever the result value can not be represented in the
result type, it would be possible to add explicitly signed and unsigned versions
mul, which would trap on overflow. The main
reason we haven’t added these already is that they’re not efficient for
general-purpose use on several of today’s popular hardware architectures.
The MVP feature testing situation could be improved by allowing unknown/unsupported instructions to decode and validate. The runtime semantics of these unknown instructions could either be to trap or call a same-signature module-defined polyfill function. This feature could provide a lighter-weight alternative to load-time polyfilling (approach 2 in FeatureTest.md), especially if the specific layer were to be standardized and performed natively such that no user-space translation pass was otherwise necessary.
If globals are allowed array types, significant portions of memory could be moved out of linear memory which could reduce fragmentation issues. Languages like Fortran which limit aliasing would be one use case. C/C++ compilers could also determine that some global variables never have their address taken.
The stack based nature of WebAssembly lends itself to the possibility of supporting multiple return values from blocks / functions.
The MVP limits modules to at most one memory and at most one table (the default ones) and there are only operators for accessing the default table and memory.
After the MVP and after GC reference types have been added, the default
limitation can be relaxed so that any number of tables and memories could be
imported or internally defined and memories/tables could be passed around as
parameters, return values and locals. New variants of
call_indirect would then be added which took an additional memory/table
To access an imported or internally-defined non-default table or memory, a
address_of operator could be added which, given an index immediate,
would return a first-class reference. Beyond tables and memories, this could
also be used for function definitions to get a reference to a function (which,
since opaque, could be implemented as a raw function pointer).
In the MVP, WebAssembly has limited functionality for operating on
tables and the host-environment can do much more (e.g.,
It would be useful to be able to do everything from within WebAssembly so, e.g.,
it was possible to write a WebAssembly dynamic loader in WebAssembly. As a
prerequisite, WebAssembly would need first-class support for
GC references on the stack and in locals. Given that, the following
could be added:
set_table: get or set the table element at a given dynamic
index; the got/set value would have a GC reference type
grow_table: grow the current table (up to the optional maximum), similar to
Additionally, in the MVP, the only allowed element type of tables is a generic “anyfunc” type which simply means the element can be called but there is no static signature validation check. This could be improved by allowing:
Copying and clearing large memory regions is very common, and making these
operations fast is architecture dependent. Although this can be done in the MVP
i32.store, this requires more bytes of code and forces VMs
to recognize the loops as well. The following operators can be added to improve
move_memory: Copy data from a source memory region to destination region;
these regions may overlap: the copy is performed as if the source region was
first copied to a temporary buffer, then the temporary buffer is copied to
the destination region
set_memory: Set all bytes in a memory region to a given byte
We expect that WebAssembly producers will use these operations when the region size is known to be large, and will use loads/stores otherwise.
TODO: determine how these operations interact w/ shared memory.