5 Netfarm script machine
While it is a component that is packaged with the Netfarm object system, the Netfarm script machine is a sufficiently large and complex component that it deserves its own chapter. (It even gets its own netfarm-scripts package.) Scripts are portable programs that a Netfarm node can run to reproduce the behaviour of objects.
The script machine is an interpreter for a moderately-sized mostly-functional bytecode, which is modelled on the SECD machine, which may be familiar to functional language compiler writers. The instruction set is neither overly minimalist, as it provides instructions for list, number and string processing, nor does it provide any particular “shortcut” instructions, such as retrieving and then calling a procedure, like a real CISC processor may have. A simple assembler is provided with the reference Netfarm implementation, which can compile programs such as:
(netfarm-scripts:define-script *fib* (1 2)
(:procedure 1)
(get-env 0 0) (get-value 1) <
(jump-cond 0 0 0 4)
(get-env 0 0) return
(get-env 0 0) (get-value 0) - (get-proc* 0) (call 1)
(get-env 0 0) (get-value 1) - (get-proc* 0) (call 1)
+ return)
While this (symbolic) assembly code looks much different to a program written in a high-level language (and also unlike assembly code for real processors), it is similar to the bytecode generated by CLISP for Common Lisp programs, and CPython for Python programs.
Scripts are usually written to implement methods for objects, which describe the behaviour of an object. Some common methods include:
initialize which is called when an object is instantiated in a network. This method takes no arguments.
present which is called to produce a presentation. This method also takes no arguments.
A method call may also materialise as a message, if there is no method available to handle a method call.
does-not-understand which is called to handle messages an object does not understand. This method takes two arguments, the first being the name of the method, and the second being the argument list. If there are no methods for does-not-understand, then method calls that would call it immediately return with an error.
The actions (typically, side effects) which a script machine may perform are controlled by a capability list. Some instructions will only execute correctly when a capability has been provided to the script machine. Capabilities are represented as a list of keywords, which are provided when creating a script machine. These capabilities currently are:
:write-computed-values: Add or remove computed values,
:read-computed-values: Read computed values, and
:query-recommender: Query the recommender system to sort objects by some property.
Scripts running on networks only receive the :write-computed-values capability, as reading introduces non-determinism into the system, and there is no recommender to query. However, scripts running on, for example, a graphical environment used by the end-user to interact with objects, may have all three capabilities.
5.1 Running scripts
setup-interpreter script arguments &optional capabilitiesFunction
setup-method-interpreter object method arguments &optional capabilitiesFunction
Make a script machine which will run either a script or call a method on an object.
If there are no applicable methods (when using setup-method-interpreter), an error will be signalled.
Arguments
arguments is a list of values to pass as arguments, and capabilities is a capability list.
run-interpreter interpreter &key cycle-limit print-cycles stepFunction
Run an interpreter, returning the final data stack, and the final interpreter state.
Arguments
cycle-limit is either the limit on “cycles” that the script machine can execute. (Most instructions take one cycle to execute, but instructions which could
run-script script &rest argumentsFunction
send-message object method-name &rest argumentsFunction
*capabilities* initially ()Variable
Run a script or send a message, by setting up and then immediately running the appropriate interpreter.
The script machine will be run with the capability list stored in *capabilities*.
run-script-machines object &key (apply-side-effects t)
(recipient-filter (constantly t))Function
Run the initialization script for an object, returning multiple values corresponding to the final data stack of the script machine.
Arguments
When apply-side-effects is true, side effects will be applied. recipient-filter is a predicate which is called with each side effect, and only side effects which satisfies the predicate will be applied.
(Perhaps :apply-side-effects nil could be expressed as :recipient-filter (constantly nil), but that’s a bit less ergonomic.)
5.2 Side effects
An interpreter may produce side effects to interact with other objects. Currently, there is only one category of side effect: side effects which modify the computed value sets of an object. A side effect is represented as a list, with a symbol in its car and a property list in its cdr, such as (add-computed-value :target #<some-object> :name "foo" :value ("bar"))
The final state of an object is produced by applying all the applicable side effects. Depending on the nature of the implementation, it may or may not be appropriate to implement side effects exactly as suggested, though the implementation must act as if it were implemented as suggested.
The order of which side effects are applied does not affect the produced object. It is for this reason that Netfarm exhibits strong eventual consistency: how objects are synchronised does not affect the computation performed, and so all nodes storing an object will eventually converge on objects with the same values.
with-consistent-state (object) &body bodyMacro
Defer any other threads from applying any side effects to an object, until the body forms have been evaluated. Other threads calling apply-side-effects-on will not block, but the thread executing in a with-consistent-state form will not observe the effects made. However, side effects can still be applied immediately in the current thread.
This non-blocking but deferrable behaviour is a requisite to implementing and using subscribing to side effects in an ergonomic manner. The thread applying side effects should not block for too long, in order to maximize throughput, but client threads should not find computed values changing from under their feet. We store an “effect log” in each object, and write to the log, when a thread already is viewing a consistent state over it, and update the object when the thread unwinds past with-consistent-state. Short of threads disappearing, by means of bt:destroy-thread or the like, objects will always have side effects applied eventually, and the updating thread will not block for very long.
apply-side-effects-on object side-effectsFunction
Apply a list of side effects to an object. It is expected that object is the target of all the side effects.
apply-side-effect side-effectFunction
Apply a side effect. This could be implemented as (apply-side-effects-on (getf (rest side-effect) :target) (list side-effect))
5.2.1 Computed values
We maintain a mapping of computed values to counts for every slot of every object. When all side effects have been considered, the table is traversed, and each computed value is added according to the count if it is not negative, and no computed values are added when a count is negative.
add-computed-value :target :slot :valueSide effect
remove-computed-value :target :slot :valueSide effect
Arguments
target is the object which has affected itself.
slot is the effective slot definition representing the slot which will be affected.
value is the value to be affected.
Increment or decrement the count of the given computed value, respectively.
A related technique for implementing these effects, which is useful when saving effects to a disk, is to add each effect to a log, and then compute the computed values by applying side effects when loading an object. It is also possible to remove effects that cancel out, and to compress the result into one final state periodically. However, the objects-affecting graph would have to be maintained outside the effect log for it to still be correct.
It is trivial to show that computed values exhibit strong eventual consistency. For each relevant effect, we add 1 or add -1 to each counter, and addition of numbers is commutative, i.e. effects can be performed in any order to retrieve the same result.
5.3 Script execution
The script machine is a stack machine with a representation of lexical environments and automatic memory management. The major changes to the SECD machine (Landin 1964) are additional registers for convenience, and to facilitate method calls between multiple objects, and a more common bytecode-based instruction format, rather than an instruction format based on S-expressions.
5.3.1 Structures
We introduce the script machine structure and procedure structure, which otherwise do not exist in the Netfarm type and class systems. A script object, however, is just an instance of inbuilt@schema.
The script machine has these registers:
Symbol |
| Name |
| Purpose |
S |
| Store |
| A stack containing intermediate data. |
E |
| Environment |
| The lexical environment, represented as a list of vector "frames". |
P |
| Program counter |
| The position the interpreter is reading instructions from. |
D |
| Dump |
| A stack telling the interpreter where to return to. |
G |
| Globals |
| A vector of global variables. |
C |
| Control |
| The script the interpreter is current reading instructions from. |
A |
| Capabilities |
| Flags that allow the machine to perform operations that only make sense in some situations. |
O |
| Self |
| The current object (or the empty list, if we have not called a method). |
O_2 |
| Sender |
| The last object, which was the value of O before calling a method. |
F |
| Side effects |
| Effects that the interpreter is going to make on the outside world. |
A script has these values:
Symbol |
| Name |
| Purpose |
P |
| Program |
| An octet vector that encodes instructions to execute. |
E |
| Entry points |
| A vector of descriptions for procedures, which are combined with environments by some instructions to produce procedures. |
M |
| Methods |
| A vector of descriptions for methods. |
G |
| Globals |
| A vector of the initial values of global variables. |
A procedure has these values:
Symbol |
| Name |
| Purpose |
C |
| Control |
| |
E |
| Environment |
| |
P |
| Program counter |
| |
A |
| Argument count |
| |
O |
| Object |
|
5.3.2 Notation
We refer to the state of a running interpreter with a set of variables (as described in the first table). We write the most common registers in a tuple like \left\langle S, E, P, D \right\rangle, but we do not include all registers in the tuple because we seldom use the others.
We write the states of a halted interpreter as \mathrm{halt}[S, F], should it halt correctly, or \mathrm{error}[message, cause] should it fail. (Only the data stack and side effects are preserved when an interpreter stops.)
We precede an instruction description with a line like
N: NAME BYTE-ARGUMENT*Instruction
Such a line means we define an instruction named NAME, which is encoded starting with the byte N, and then one byte for each of BYTE-ARGUMENT*.
We describe the effects of interpreting an instruction in a method similar to that of the temporal logic of actions (Lamport 1993) (but with no temporal expressions): we use a logical equation where the new registers have "primed" variables, which look like C{}'. We write the effects on the tuple of common registers like
\left\langle a, b, c, d \right\rangle \rightarrow\allowbreak \left\langle e, f, g, h \right\rangle
which is shorthand for
S = a \land E = b \land C = c \land D = d \land S{}' = e \land E{}' = f \land C{}' = g \land D{}' = h
If the equations for an instruction do not reference the primed variable for a register, the register is left unchanged.
Should an instruction concern itself with only the Store, we further abbreviate, by writing effects in the form
i_1 \ldots i_m\ \text{—
(borrowing from Forth conventions) which can be translated to the other notation as
\left\langle (i_m \ldots i_1\ .\ S), E, P, D \right\rangle \rightarrow\allowbreak \left\langle (o_n \ldots o_1\ .\ S), E, P + 1 + bytes, D \right\rangle
where bytes is the number of byte arguments to an instruction. Note that the equivalent transition has the stack prefixes "reversed"; the last input is popped first, and the first output is pushed first. In Forth, we may write a program 3 2 - (⇒ 1) and the stack before evaluating - as a list would be (2 3). To maintain a natural argument order for the programmer, the machine handles arguments "backwards" when described using lists.
To access a value of a structure without destructuring it, we use the notation S_A to get the value of the variable (slot) A of S. We also use subscripts to retrieve values from lists and vectors; S_n denotes the nth value in S. The first element of a sequence is retrieved with a zero subscript, as with most programming languages; and retrieving an element that would be past the last element of a sequence causes interpretation to fail.
We create many procedures while running programs, so we write a procedure in the form \mathrm{proc}[C, E, P, A, O].
All integer operations are defined to only produce values when the result is within the bounds of a signed 256-bit integer as previously defined. We define the set Integer to be [-(2^{255}), 2^{255}).
5.3.3 Execution
The machine first reads the instruction to execute, by reading the Pth octet in C_P. The instruction arguments are then read similarly, by reading the P + nth octet in C_P for the nth argument (starting from 1). Finally, a transition is chosen based on the state of the machine at this point. Unless the machine has transitioned to a halt state or unwound entirely to a error state, it then proceeds to execute another instruction.
A poor attempt at specifying that we signal a condition or throw an error or something - I’m not writing rules to propagate errors through everything thankyou.
If a unwind value appears in a computation, the machine attempts to find the last method call frame produced. If that frame exists, the machine effectively returns from that frame, but pushes \mathtt{\#f} on the stack instead of a singleton list. If it does not exist, the machine transitions to the error state. More formally:
TODO: Write out the recursive function.
5.3.4 Instructions
5.3.4.1 Environment
Variables are indexed by variable and frame positions; the newest bindings (which are function arguments) are located where frame = 0, and older bindings have greater frame positions.
We first define a helper function getenv which gets the value in a particular position in an environment. We also write assignments to this value, which are shorthand for replacing the value at the designated position in the environment.
getenv[env, var, frame] = \begin{cases} F_{var} & (F . \_) = env \land frame = 0 \\ getenv[E, variable, frame - 1] & (\_ . E) = env \land frame > 0 \\ unwind & () = env \land frame > 0 \end{cases}
1: get-proc nInstruction
6: get-proc* nInstruction
Note that get-proc closes over the lexical environment, but get-proc* does not; the former would be used for top-level functions, and the latter would be used for closures.
2: get-value nInstruction
\ \text{—
4: get-env variable frameInstruction
\ \text{—
5: set-env! variable frameInstruction
11: byte valueInstruction
\ \text{—
5.3.4.2 Control flow
8: return Instruction
\left\langle S, E, P, () \right\rangle \rightarrow\allowbreak \mathrm{halt}[S, A]
\left\langle (Value\ .\ \_), \_, \_, (\mathrm{normalframe}[S, E, P, C{}', O{}'] . D) \right\rangle \rightarrow\allowbreak \left\langle (Value\ .\ S), E, P, D \right\rangle
\left\langle (Value\ .\ \_), \_, \_, (\mathrm{methodframe}[S, E, P, C{}', O{}', O_2{}', F{}'] . D) \right\rangle \rightarrow\allowbreak \left\langle ((Value)\ .\ S), E, P, D \right\rangle
The last transition described for return handles half of the unwinding associated with method calls; when it returns from a method call, it will wrap the value to return in a cons. This serves two purposes: it prevents the stack layout from changing under the feet of a script, and it provides an always-correct method to test if a call was successful.
9: call nInstruction
\left\langle (\mathrm{proc}[C{}', E, P_0, n, O{}']\ A_n \ldots A_1\ .\ S), E, P, D \right\rangle \rightarrow
\left\langle (), ((A_1 \ldots A_n)\ .\ E{}'), P_0, (\mathrm{normalframe}[S, E, P + 2, C, O]\ .\ D) \right\rangle
10: tail-call nInstruction
\left\langle (\mathrm{proc}[C{}', E, P_0, n, O{}']\ A_n \ldots A_1\ .\ S), E, P, D \right\rangle \rightarrow
\left\langle (), ((A_1 \ldots A_n)\ .\ E{}'), P_0, D \right\rangle
We used to preserve S between function calls, in an attempt to support writing Forth-ish code, where functions would just manipulate the stack, and ignore the environment. However, arbitrary control over the stack gets confusing when higher-order methods are involved; the stack layout after calling the passed function would be dependent on how the function leaves it, and a sufficiently clever attacker could use this to corrupt the stack and produce undesired behaviour.
To "fix" this, we decided that functions always return one value, and cannot access the data stacks of any other functions, thus the stack layout will always be known.
16: jump-cond thenhigh thenlow elsehigh elselowInstruction
17: jump high lowInstruction
\left\langle S, E, P, D \right\rangle \rightarrow\allowbreak \left\langle S, E, P + 3 + high \times 256 + low, D \right\rangle
Jumps are relative to the end of the instruction, so jumping 0 bytes does nothing.
19: call-method nInstruction
{} \land \mathrm{proc}[C{}', E{}', P_0, n + 1, O{}'] = Primary
{} \land O_2{}' = O
{} \land \mathrm{proc}[C{}', E{}', P_0, 3, O{}'] = Primary
{} \land O_2{}' = O
TODO: Fill in how to compute methods.
Every script of the class of an object contains a list of lists (Name\ P_0\ A{}') , where A{}' is the number of "real" arguments. Methods have an implicit "next method list" argument. When calling a method, the next method list is the first argument, and every other method is shifted over one, so the first argument appears in the second position in the new frame.
The global variable vectors for each \langle object, script \rangle pair must be the same. This ensures that one method on an object and one script can observe the changes made by another method on that object and that script.
It is usually a good idea to cache computation of methods; method caching in old Smalltalk was a fairly cheap way to get a reasonable performance gain, as approximately every method call requires dispatching. This is true to a lesser extent for Netfarm code, which has fewer, but still many, method calls.
5.3.4.3 Stack control
34: drop Instruction
x\ \text{—
35: dup Instruction
x\ \text{—
36: swap Instruction
x\ y\ \text{—
These instructions function identically to the words defined of the same names in ANSI Forth.
5.3.4.4 Lists
64: cons Instruction
As we do not have a representation for improper lists, the cdr must be a list.
65: car Instruction
(car\ .\ \_)\ \text{—
66: cdr Instruction
(\_\ .\ cdr)\ \text{—
67: null Instruction
()\ \text{—
(\_\ .\ \_)\ \text{—
68: consp Instruction
x\ \text{—
69: append Instruction
a\ b\ \text{—
5.3.4.5 Operators
71: equal Instruction
a\ b\ \text{—
\begin{aligned} equal[(a_a\ .\ a_d), (b_a\ .\ b_d)] &= equal[a_a, b_a] \land equal[a_d, b_d] \\ equal[(), ()] &= \mathtt{\#t} \\ equal[a \in Integer, b \in Integer] &= a = b \\ equal[a \in String, b \in String] &= a = b \\ equal[a \in Object, b \in Object] &= hash[a] = hash[b] \\ equal[\mathrm{proc}[a_c, \ldots, a_o], \mathrm{proc}[a_c, \ldots, a_o]] &= a_c = b_c \land \ldots \land a_o = b_o \\ equal[\_, \_] &= \mathtt{\#f} \end{aligned}
It is probably wise to use an auxiliary stack when implementing equal, as the client code may provide arbitrarily large data structures.
72: string= Instruction
73: list nInstruction
e_1 \ldots e_n\ \text{—
For OP being each name of the following three instructions:
96: + Instruction
97: - Instruction
98: * Instruction
Flooring is defined to produce the closest integer below the actual value, e.g \lfloor \frac{-5}{2} \rfloor = -3
100: abs Instruction
Note that |a| can be out of bounds when a is in bounds; which is when a = -(2^{255}).
For T being the type in the name of the following three instructions:
105: integer? Instruction
106: list? Instruction
108: string? Instruction
o\ \text{—
For OP being each name of the following three instructions:
112: = Instruction
113: < Instruction
114: > Instruction
115: /= Instruction
116: <= Instruction
117: >= Instruction
5.3.4.6 Object instructions
129: schema Instruction
For N being each name of the following two instructions:
130: add-computed-value Instruction
140: remove-computed-value Instruction
name\ value\ \text{—
{} \land \text{\texttt{:write-computed-values}} \in A
{} \land F{}' = ((N\; \text{\texttt{:target}}\; O\; \text{\texttt{:slot}}\; slot[name, O]\; \text{\texttt{:value}}\; value)\; .\; F)
131: object-value Instruction
object\ name\ \text{—
133: object-bound? Instruction
object\ name\ \text{—
132: object-computed-values Instruction
When we figure out how to deal with "plain old data" with no scripts to provide accessor methods, then we will probably make object-value and object-computed-values only access slots of the current object; in order to preserve an object capability model, where objects can only send each other references to other objects.
134: object-authors Instruction