On this page:
5.1 Running scripts
5.2 Side effects
5.2.1 Computed values
5.3 Script execution
5.3.1 Structures
5.3.2 Notation
5.3.3 Execution
5.3.4 Instructions
5.3.4.1 Environment
5.3.4.2 Control flow
5.3.4.3 Stack control
5.3.4.4 Lists
5.3.4.5 Operators
5.3.4.6 Object instructions

5 Netfarm script machine

While it is a component that is packaged with the Netfarm object system, the Netfarm script machine is a sufficiently large and complex component that it deserves its own chapter. (It even gets its own netfarm-scripts package.) Scripts are portable programs that a Netfarm node can run to reproduce the behaviour of objects.

The script machine is an interpreter for a moderately-sized mostly-functional bytecode, which is modelled on the SECD machine, which may be familiar to functional language compiler writers. The instruction set is neither overly minimalist, as it provides instructions for list, number and string processing, nor does it provide any particular “shortcut” instructions, such as retrieving and then calling a procedure, like a real CISC processor may have. A simple assembler is provided with the reference Netfarm implementation, which can compile programs such as:

(netfarm-scripts:define-script *fib* (1 2)

  (:procedure 1)

  (get-env 0 0) (get-value 1) <

  (jump-cond 0 0 0 4)

  (get-env 0 0) return

  (get-env 0 0) (get-value 0) - (get-proc* 0) (call 1)

  (get-env 0 0) (get-value 1) - (get-proc* 0) (call 1)

  + return)

While this (symbolic) assembly code looks much different to a program written in a high-level language (and also unlike assembly code for real processors), it is similar to the bytecode generated by CLISP for Common Lisp programs, and CPython for Python programs.

Scripts are usually written to implement methods for objects, which describe the behaviour of an object. Some common methods include:

A method call may also materialise as a message, if there is no method available to handle a method call.

The actions (typically, side effects) which a script machine may perform are controlled by a capability list. Some instructions will only execute correctly when a capability has been provided to the script machine. Capabilities are represented as a list of keywords, which are provided when creating a script machine. These capabilities currently are:

Scripts running on networks only receive the :write-computed-values capability, as reading introduces non-determinism into the system, and there is no recommender to query. However, scripts running on, for example, a graphical environment used by the end-user to interact with objects, may have all three capabilities.

5.1 Running scripts

setup-interpreter script arguments &optional capabilitiesFunction

setup-method-interpreter object method arguments &optional capabilitiesFunction

Make a script machine which will run either a script or call a method on an object.

If there are no applicable methods (when using setup-method-interpreter), an error will be signalled.


Arguments

arguments is a list of values to pass as arguments, and capabilities is a capability list.

run-interpreter interpreter &key cycle-limit print-cycles stepFunction

Run an interpreter, returning the final data stack, and the final interpreter state.


Arguments

cycle-limit is either the limit on “cycles” that the script machine can execute. (Most instructions take one cycle to execute, but instructions which could

run-script script &rest argumentsFunction

send-message object method-name &rest argumentsFunction

*capabilities* initially ()Variable

Run a script or send a message, by setting up and then immediately running the appropriate interpreter.

The script machine will be run with the capability list stored in *capabilities*.

run-script-machines object &key (apply-side-effects t) (recipient-filter (constantly t))Function

Run the initialization script for an object, returning multiple values corresponding to the final data stack of the script machine.


Arguments

When apply-side-effects is true, side effects will be applied. recipient-filter is a predicate which is called with each side effect, and only side effects which satisfies the predicate will be applied.

(Perhaps :apply-side-effects nil could be expressed as :recipient-filter (constantly nil), but that’s a bit less ergonomic.)

5.2 Side effects

An interpreter may produce side effects to interact with other objects. Currently, there is only one category of side effect: side effects which modify the computed value sets of an object. A side effect is represented as a list, with a symbol in its car and a property list in its cdr, such as (add-computed-value :target #<some-object> :name "foo" :value ("bar"))

The final state of an object is produced by applying all the applicable side effects. Depending on the nature of the implementation, it may or may not be appropriate to implement side effects exactly as suggested, though the implementation must act as if it were implemented as suggested.

The order of which side effects are applied does not affect the produced object. It is for this reason that Netfarm exhibits strong eventual consistency: how objects are synchronised does not affect the computation performed, and so all nodes storing an object will eventually converge on objects with the same values.

with-consistent-state (object) &body bodyMacro

Defer any other threads from applying any side effects to an object, until the body forms have been evaluated. Other threads calling apply-side-effects-on will not block, but the thread executing in a with-consistent-state form will not observe the effects made. However, side effects can still be applied immediately in the current thread.

This non-blocking but deferrable behaviour is a requisite to implementing and using subscribing to side effects in an ergonomic manner. The thread applying side effects should not block for too long, in order to maximize throughput, but client threads should not find computed values changing from under their feet. We store an “effect log” in each object, and write to the log, when a thread already is viewing a consistent state over it, and update the object when the thread unwinds past with-consistent-state. Short of threads disappearing, by means of bt:destroy-thread or the like, objects will always have side effects applied eventually, and the updating thread will not block for very long.

apply-side-effects-on object side-effectsFunction

Apply a list of side effects to an object. It is expected that object is the target of all the side effects.

apply-side-effect side-effectFunction

Apply a side effect. This could be implemented as (apply-side-effects-on (getf (rest side-effect) :target) (list side-effect))

5.2.1 Computed values

We maintain a mapping of computed values to counts for every slot of every object. When all side effects have been considered, the table is traversed, and each computed value is added according to the count if it is not negative, and no computed values are added when a count is negative.

add-computed-value :target :slot :valueSide effect

remove-computed-value :target :slot :valueSide effect


Arguments

target is the object which has affected itself.

slot is the effective slot definition representing the slot which will be affected.

value is the value to be affected.

Increment or decrement the count of the given computed value, respectively.

A related technique for implementing these effects, which is useful when saving effects to a disk, is to add each effect to a log, and then compute the computed values by applying side effects when loading an object. It is also possible to remove effects that cancel out, and to compress the result into one final state periodically. However, the objects-affecting graph would have to be maintained outside the effect log for it to still be correct.

It is trivial to show that computed values exhibit strong eventual consistency. For each relevant effect, we add 1 or add -1 to each counter, and addition of numbers is commutative, i.e. effects can be performed in any order to retrieve the same result.

5.3 Script execution

The script machine is a stack machine with a representation of lexical environments and automatic memory management. The major changes to the SECD machine  (Landin 1964) are additional registers for convenience, and to facilitate method calls between multiple objects, and a more common bytecode-based instruction format, rather than an instruction format based on S-expressions.

5.3.1 Structures

We introduce the script machine structure and procedure structure, which otherwise do not exist in the Netfarm type and class systems. A script object, however, is just an instance of inbuilt@schema.

The script machine has these registers:

Symbol

 

Name

 

Purpose

S

 

Store

 

A stack containing intermediate data.

E

 

Environment

 

The lexical environment, represented as a list of vector "frames".

P

 

Program counter

 

The position the interpreter is reading instructions from.

D

 

Dump

 

A stack telling the interpreter where to return to.

G

 

Globals

 

A vector of global variables.

C

 

Control

 

The script the interpreter is current reading instructions from.

A

 

Capabilities

 

Flags that allow the machine to perform operations that only make sense in some situations.

O

 

Self

 

The current object (or the empty list, if we have not called a method).

O_2

 

Sender

 

The last object, which was the value of O before calling a method.

F

 

Side effects

 

Effects that the interpreter is going to make on the outside world.

A script has these values:

Symbol

 

Name

 

Purpose

P

 

Program

 

An octet vector that encodes instructions to execute.

E

 

Entry points

 

A vector of descriptions for procedures, which are combined with environments by some instructions to produce procedures.

M

 

Methods

 

A vector of descriptions for methods.

G

 

Globals

 

A vector of the initial values of global variables.

A procedure has these values:

Symbol

 

Name

 

Purpose

C

 

Control

 

E

 

Environment

 

P

 

Program counter

 

A

 

Argument count

 

O

 

Object

 

5.3.2 Notation

We refer to the state of a running interpreter with a set of variables (as described in the first table). We write the most common registers in a tuple like \left\langle S, E, P, D \right\rangle, but we do not include all registers in the tuple because we seldom use the others.

We write the states of a halted interpreter as \mathrm{halt}[S, F], should it halt correctly, or \mathrm{error}[message, cause] should it fail. (Only the data stack and side effects are preserved when an interpreter stops.)

We precede an instruction description with a line like

N: NAME BYTE-ARGUMENT*Instruction

Such a line means we define an instruction named NAME, which is encoded starting with the byte N, and then one byte for each of BYTE-ARGUMENT*.

We describe the effects of interpreting an instruction in a method similar to that of the temporal logic of actions  (Lamport 1993) (but with no temporal expressions): we use a logical equation where the new registers have "primed" variables, which look like C{}'. We write the effects on the tuple of common registers like

\left\langle a, b, c, d \right\rangle \rightarrow\allowbreak \left\langle e, f, g, h \right\rangle

which is shorthand for

S = a \land E = b \land C = c \land D = d \land S{}' = e \land E{}' = f \land C{}' = g \land D{}' = h

If the equations for an instruction do not reference the primed variable for a register, the register is left unchanged.

Should an instruction concern itself with only the Store, we further abbreviate, by writing effects in the form

i_1 \ldots i_m\ \text{—}\ o_1 \ldots o_n

(borrowing from Forth conventions) which can be translated to the other notation as

\left\langle (i_m \ldots i_1\ .\ S), E, P, D \right\rangle \rightarrow\allowbreak \left\langle (o_n \ldots o_1\ .\ S), E, P + 1 + bytes, D \right\rangle

where bytes is the number of byte arguments to an instruction. Note that the equivalent transition has the stack prefixes "reversed"; the last input is popped first, and the first output is pushed first. In Forth, we may write a program 3 2 - (⇒ 1) and the stack before evaluating - as a list would be (2 3). To maintain a natural argument order for the programmer, the machine handles arguments "backwards" when described using lists.

To access a value of a structure without destructuring it, we use the notation S_A to get the value of the variable (slot) A of S. We also use subscripts to retrieve values from lists and vectors; S_n denotes the nth value in S. The first element of a sequence is retrieved with a zero subscript, as with most programming languages; and retrieving an element that would be past the last element of a sequence causes interpretation to fail.

We create many procedures while running programs, so we write a procedure in the form \mathrm{proc}[C, E, P, A, O].

All integer operations are defined to only produce values when the result is within the bounds of a signed 256-bit integer as previously defined. We define the set Integer to be [-(2^{255}), 2^{255}).

5.3.3 Execution

The machine first reads the instruction to execute, by reading the Pth octet in C_P. The instruction arguments are then read similarly, by reading the P + nth octet in C_P for the nth argument (starting from 1). Finally, a transition is chosen based on the state of the machine at this point. Unless the machine has transitioned to a halt state or unwound entirely to a error state, it then proceeds to execute another instruction.

A poor attempt at specifying that we signal a condition or throw an error or something - I’m not writing rules to propagate errors through everything thankyou.

If a unwind value appears in a computation, the machine attempts to find the last method call frame produced. If that frame exists, the machine effectively returns from that frame, but pushes \mathtt{\#f} on the stack instead of a singleton list. If it does not exist, the machine transitions to the error state. More formally:

TODO: Write out the recursive function.

5.3.4 Instructions
5.3.4.1 Environment

Variables are indexed by variable and frame positions; the newest bindings (which are function arguments) are located where frame = 0, and older bindings have greater frame positions.

We first define a helper function getenv which gets the value in a particular position in an environment. We also write assignments to this value, which are shorthand for replacing the value at the designated position in the environment.

getenv[env, var, frame] = \begin{cases} F_{var} & (F . \_) = env \land frame = 0 \\ getenv[E, variable, frame - 1] & (\_ . E) = env \land frame > 0 \\ unwind & () = env \land frame > 0 \end{cases}

1: get-proc nInstruction

\ \text{—}\ \mathrm{proc}[C, E, P_0, A, O]
where (P_0\ A) = (C_E)_n

6: get-proc* nInstruction

\ \text{—}\ \mathrm{proc}[C, (), P_0, A, O]
where (P_0\ A) = (C_E)_n

Note that get-proc closes over the lexical environment, but get-proc* does not; the former would be used for top-level functions, and the latter would be used for closures.

2: get-value nInstruction

\ \text{—}\ G_n

3: set-value! nInstruction

value\ \text{—}\
where G_n \leftarrow value

4: get-env variable frameInstruction

\ \text{—}\ getenv[E, variable, frame]

5: set-env! variable frameInstruction

value\ \text{—}\
where getenv[E, variable, frame] \leftarrow value

11: byte valueInstruction

\ \text{—}\ value

5.3.4.2 Control flow

8: return Instruction

\left\langle S, E, P, () \right\rangle \rightarrow\allowbreak \mathrm{halt}[S, A]

\left\langle (Value\ .\ \_), \_, \_, (\mathrm{normalframe}[S, E, P, C{}', O{}'] . D) \right\rangle \rightarrow\allowbreak \left\langle (Value\ .\ S), E, P, D \right\rangle

\left\langle (Value\ .\ \_), \_, \_, (\mathrm{methodframe}[S, E, P, C{}', O{}', O_2{}', F{}'] . D) \right\rangle \rightarrow\allowbreak \left\langle ((Value)\ .\ S), E, P, D \right\rangle

The last transition described for return handles half of the unwinding associated with method calls; when it returns from a method call, it will wrap the value to return in a cons. This serves two purposes: it prevents the stack layout from changing under the feet of a script, and it provides an always-correct method to test if a call was successful.

9: call nInstruction

\left\langle (\mathrm{proc}[C{}', E, P_0, n, O{}']\ A_n \ldots A_1\ .\ S), E, P, D \right\rangle \rightarrow

\left\langle (), ((A_1 \ldots A_n)\ .\ E{}'), P_0, (\mathrm{normalframe}[S, E, P + 2, C, O]\ .\ D) \right\rangle

10: tail-call nInstruction

\left\langle (\mathrm{proc}[C{}', E, P_0, n, O{}']\ A_n \ldots A_1\ .\ S), E, P, D \right\rangle \rightarrow

\left\langle (), ((A_1 \ldots A_n)\ .\ E{}'), P_0, D \right\rangle

We used to preserve S between function calls, in an attempt to support writing Forth-ish code, where functions would just manipulate the stack, and ignore the environment. However, arbitrary control over the stack gets confusing when higher-order methods are involved; the stack layout after calling the passed function would be dependent on how the function leaves it, and a sufficiently clever attacker could use this to corrupt the stack and produce undesired behaviour.

To "fix" this, we decided that functions always return one value, and cannot access the data stacks of any other functions, thus the stack layout will always be known.

16: jump-cond thenhigh thenlow elsehigh elselowInstruction

\left\langle (\mathtt{\#f} . S), E, P, D \right\rangle \rightarrow\allowbreak \left\langle S, E, P + 5 + else, D \right\rangle
{} \land else = elsehigh \times 256 + elselow

\left\langle (V\ .\ S), E, P, D \right\rangle \rightarrow\allowbreak \left\langle S, E, P + 5 + then, D \right\rangle
{} \land then = thenhigh \times 256 + thenlow \land V \neq \mathtt{\#f}

17: jump high lowInstruction

\left\langle S, E, P, D \right\rangle \rightarrow\allowbreak \left\langle S, E, P + 3 + high \times 256 + low, D \right\rangle

Jumps are relative to the end of the instruction, so jumping 0 bytes does nothing.

19: call-method nInstruction

\left\langle (Name\ Object\ A_n \ldots A_1\ .\ S), E, P, D \right\rangle \rightarrow\allowbreak \left\langle (), ((Rest\ A_1 \ldots A_n)\ .\ E{}'), P_0,\allowbreak (\mathrm{methodframe}[S, E, P + 2, C, O, O_2, F]\ .\ D) \right\rangle
{} \land \mathrm{primary}[(Primary\ .\ Rest)] = methods[Object, Name]
{} \land \mathrm{proc}[C{}', E{}', P_0, n + 1, O{}'] = Primary
{} \land O_2{}' = O

\left\langle (Name\ Object\ A_n \ldots A_1\ .\ S), E, P, D \right\rangle \rightarrow\allowbreak \left\langle (), ((Rest\ Name\ (A_1 \ldots A_n))\ .\ E{}'), P_0,\allowbreak (\mathrm{methodframe}[S, E, P + 2, C, O, O_2, F]\ .\ D) \right\rangle
{} \land \mathrm{doesnotunderstand}[(Primary\ .\ Rest)] = methods[Object, Name]
{} \land \mathrm{proc}[C{}', E{}', P_0, 3, O{}'] = Primary
{} \land O_2{}' = O

A_1\ \ldots\ A_n\ Object\ Name\ \text{—}\ \mathtt{\#f}
otherwise

TODO: Fill in how to compute methods.

Every script of the class of an object contains a list of lists (Name\ P_0\ A{}') , where A{}' is the number of "real" arguments. Methods have an implicit "next method list" argument. When calling a method, the next method list is the first argument, and every other method is shifted over one, so the first argument appears in the second position in the new frame.

The global variable vectors for each \langle object, script \rangle pair must be the same. This ensures that one method on an object and one script can observe the changes made by another method on that object and that script.

It is usually a good idea to cache computation of methods; method caching in old Smalltalk was a fairly cheap way to get a reasonable performance gain, as approximately every method call requires dispatching. This is true to a lesser extent for Netfarm code, which has fewer, but still many, method calls.

5.3.4.3 Stack control

34: drop Instruction

x\ \text{—}\

35: dup Instruction

x\ \text{—}\ x\ x

36: swap Instruction

x\ y\ \text{—}\ y\ x

These instructions function identically to the words defined of the same names in ANSI Forth.

5.3.4.4 Lists

64: cons Instruction

car\ cdr\ \text{—}\ (car\ .\ cdr)
{} \land cdr \in List

As we do not have a representation for improper lists, the cdr must be a list.

65: car Instruction

(car\ .\ \_)\ \text{—}\ car

66: cdr Instruction

(\_\ .\ cdr)\ \text{—}\ cdr

67: null Instruction

()\ \text{—}\ \mathtt{\#t}

(\_\ .\ \_)\ \text{—}\ \mathtt{\#f}

68: consp Instruction

x\ \text{—}\ x \in Cons

69: append Instruction

a\ b\ \text{—}\ a {++} b

5.3.4.5 Operators

71: equal Instruction

a\ b\ \text{—}\ equal[a,b]

\begin{aligned} equal[(a_a\ .\ a_d), (b_a\ .\ b_d)] &= equal[a_a, b_a] \land equal[a_d, b_d] \\ equal[(), ()] &= \mathtt{\#t} \\ equal[a \in Integer, b \in Integer] &= a = b \\ equal[a \in String, b \in String] &= a = b \\ equal[a \in Object, b \in Object] &= hash[a] = hash[b] \\ equal[\mathrm{proc}[a_c, \ldots, a_o], \mathrm{proc}[a_c, \ldots, a_o]] &= a_c = b_c \land \ldots \land a_o = b_o \\ equal[\_, \_] &= \mathtt{\#f} \end{aligned}

It is probably wise to use an auxiliary stack when implementing equal, as the client code may provide arbitrarily large data structures.

72: string= Instruction

a\ b\ \text{—}\ a = b
{} \land a \in String \land b \in string

73: list nInstruction

e_1 \ldots e_n\ \text{—}\ (e_1 \ldots e_n)

For OP being each name of the following three instructions:

96: + Instruction

97: - Instruction

98: * Instruction

a\ b\ \text{—}\ a \mathop{OP} b
{} \land a \mathop{OP} b \in Integer

99: / Instruction

a\ b\ \text{—}\ \lfloor \frac{a}{b} \rfloor
{} \land \lfloor \frac{a}{b} \rfloor \in Integer

Flooring is defined to produce the closest integer below the actual value, e.g \lfloor \frac{-5}{2} \rfloor = -3

100: abs Instruction

a\ \text{—}\ |a|
{} \land |a| \in Integer

Note that |a| can be out of bounds when a is in bounds; which is when a = -(2^{255}).

For T being the type in the name of the following three instructions:

105: integer? Instruction

106: list? Instruction

108: string? Instruction

o\ \text{—}\ o \in T

For OP being each name of the following three instructions:

112: = Instruction

113: < Instruction

114: > Instruction

115: /= Instruction

116: <= Instruction

117: >= Instruction

a\ b\ \text{—}\ a \mathop{OP} b
{} \land a \in Integer \land b \in Integer

5.3.4.6 Object instructions

129: schema Instruction

o\ \text{—}\ schema[o]
{} \land o \in Object

For N being each name of the following two instructions:

130: add-computed-value Instruction

140: remove-computed-value Instruction

name\ value\ \text{—}

{} \land \text{\texttt{:write-computed-values}} \in A

{} \land F{}' = ((N\; \text{\texttt{:target}}\; O\; \text{\texttt{:slot}}\; slot[name, O]\; \text{\texttt{:value}}\; value)\; .\; F)

131: object-value Instruction

object\ name\ \text{—}\ value[object, slot[name, object]]

133: object-bound? Instruction

object\ name\ \text{—}\ bound?[object, slot[name, object]]

132: object-computed-values Instruction

object\ name\ \text{—}\ value[object, computedslot[name, object]]
when \text{\texttt{:read-computed-values}} \in A

When we figure out how to deal with "plain old data" with no scripts to provide accessor methods, then we will probably make object-value and object-computed-values only access slots of the current object; in order to preserve an object capability model, where objects can only send each other references to other objects.

134: object-authors Instruction

o\ \text{—}\ map[car, signatures[o]]
when o \in Object