The Netfarm Book

The Cooperative of pplied Language

January 20, 2021

Contents

 0.1 Introduction
  0.1.1 “Abstract”
  0.1.2 How Netfarm deviates from existing designs
  0.1.3 The structure of Netfarm
  0.1.4 Thanks
  0.1.5 What to do next
1 Conventions
 1.1 Block structure
 1.2 Set specifiers
 1.3 Strings
 1.4 Integers
2 decentralise2
 2.1 Connections
  2.1.1 Connection types
  2.1.2 Threaded connections
  2.1.3 Netfarm wire format connection
  2.1.4 Netfarm binary connection
 2.2 Messages
  2.2.1 Built in message types
  2.2.2 Defining message types
  2.2.3 Translators
 2.3 Acceptors
  2.3.1 Acceptor protocol
  2.3.2 Threaded acceptors
 2.4 Systems
  2.4.1 Database protocol
  2.4.2 Synchronisation protocol
  2.4.3 Leader loop protocol
  2.4.4 Tuning the standard-system
 2.5 Standard system behaviour
  2.5.1 Message interpretation
  2.5.2 Special blocks
 2.6 Clients
  2.6.1 Client types
  2.6.2 Actions
3 Netfarm
 3.1 Objects
  3.1.1 Slots
  3.1.2 Scripts
  3.1.3 Inheritance
 3.2 Inbuilt objects
 3.3 External formats
  3.3.1 Vague objects
  3.3.2 Character format
  3.3.3 Binary format
 3.4 Cryptographic operations
  3.4.1 Hashing
  3.4.2 Signing
4 Networking
 4.1 The client
5 Netfarm script machine
 5.1 Using the script machine
 5.2 Presentations
 5.3 Side effects
  5.3.1 Computed values
 5.4 The script machine
  5.4.1 Structures
  5.4.2 Notation
  5.4.3 Execution
  5.4.4 Environment
  5.4.5 Control flow
  5.4.6 Forthisms
  5.4.7 Operators
6 The server
 6.1 Database protocol
  6.1.1 Implementations of the database protocol
  6.1.2 Vague object storage
  6.1.3 Presentable set
  6.1.4 Side effects
  6.1.5 Other interesting set
  6.1.6 Dependency graph
A A description of a voting-web system
B A brief style guide
 B.1 Abstraction
 B.2 Feature “creep”
 B.3 Portability
C Index
 Listings
Bibliography

0.1 Introduction

0.1.1 “Abstract”

Netfarm is an attempt at creating a distributed, trustless object system. We develop Netfarm to fill a perceived void in the rapidly-expanding “decentralisation” bubble, to utilise the greater fault tolerance of a distributed hash table and simplify programming via a client language’s meta object protocol, and to devise less hierarchical moderation techniques that do not leave them at the whim of their servers’ operators and administrators.

The design of Netfarm is not ideal due to historical and planning issues; but it should serve well as an introduction to trustless object systems, which we believe can be easier to program and have better throughput than some of the models that are used frequently today, such as blockchains (demonstrated to be usable for programming in Ethereum) and various federated protocols (such as ActivityPub, notably used in Mastodon among other “fediverse” blogging servers).

To summarise the design, Netfarm implements a synchronous message-passing object system (like Smalltalk or Self) hosted on a distributed hash table. A Netfarm network can reach eventual consistency as state is either so unlikely to collide, such as objects which are stored by hashes, which are probably less likely to collide than for a computer to incorrectly compute a hash; or it is simple to unify, such as computed values. However, Netfarm does not decide on an ordering of computed values, and so has fairly weak consistency guarantees.

0.1.2 How Netfarm deviates from existing designs

Some of these points are made redundant by the preceding paragraphs, but it may be worthwhile to explain why we made those claims, and why they are desirable.

0.1.3 The structure of Netfarm

We see the Netfarm system as a synthesis of various components:

However, a user should be able to modify and use any component separately, and should have extensive control over many aspects of the system, to implement any changes while still acting as a “client” of the system, that does not modify the system itself.

0.1.4 Thanks

We would like to thank Luke Nuttall1 for providing a drawing of the Netfarm mascot Demeter.

We would also like to thank Robert Strandh for providing advice on optimising the Netfarm script machine and how Netfarm should interact with client code, and Gilbert Baumann for also suggesting optimisations for the script machine and several stylistic concerns, among other many other insightful discussions.

0.1.5 What to do next

The boundaries that separate a Netfarm implementation and its host language are not yet clear. Several questions remain, such as how much a Netfarm node must know about the data it is verifying, and what it is expected to do for its clients.

We forsee that a new distributed object system will have to be created, to overcome flaws that were introduced to Netfarm through its age and attempts to persue multiple goals. One such problem was that we chose the means of interaction with the network to be sending and receiving data, which creates an awkward split in mechanisms, between the creation of an object and objects sending messages to each other.

As such, we would like to remind the reader that they should take care to form their own opinions on how such a system should be structured from any problems they have noticed, and that Netfarm is not the end-all of distributed object systems. We would hope for something like Pierre-Joseph Proudhon’s dream in this quotation:

Je rêve une société où je serai guillotiné comme conservateur.

(I dream of a world in which I would be executed as a reactionary.)

We hope the approach we have developed for Netfarm, of identifying and constructing alternatives to many forms of centralisation, will be commonplace and we will be the ones developing centralised systems.

Chapter 1
Conventions

1.1 Block structure

A “block” in decentralise2 is represented in two ways, depending on how it is used:

multiple-value-call can be used to convert from the second representation to the first, but there are currently no reasons why that should be useful.

1.2 Set specifiers

The interesting set and uninteresting set for a decentralise2 system are represented as predicates, which usually should not change.

When the interesting set must be changed, the system is notified using a function which takes two set specifiers. A set specifier is either:

The latter is very useful when sets are almost “infinite”, such as in a distributed hash table where the change (from “rehashing” some types of distributed hash tables, or the operator deciding they can commit to storing more objects) can well exceed 2100, and is impractical to represent as a pair of lists.

1.3 Strings

Strings are always encoded using UTF-8 in the Netfarm systems. No buts, no ifs there. The sequence of code points used to form a string must be preserved in encoding and decoding.1

1.4 Integers

Integers are encoded in binary data in big endian format, i.e. such that the most significant octet (or other unit) is written first, and the least significant is written last. This format is most common in networking applications, and is the most convenient for users of human languages that write left-to-right to read.

Octet vectors are written in hexadecimal, with the first octet to be written or read appearing at the leftmost position in a vector, and the last at the right.

For example, 10 01 is the vector consisting of the octets 16 and 1, and can be interpreted as the integer 4097.

Chapter 2
decentralise2

The first component that was developed for Netfarm was the decentralise1 library, which allows a programmer to implement some kind of distributed network by writing the components that they are most interested in, and using provided implementations of the components they are not interested in writing.

A decentralise user may implement one or more of:

In a peer-to-peer context, servers and clients do not exist, as there is no distinction between them. This is still true for decentralise2, as servers do not distinguish between servers, clients, and any other program creating connections and sending and receiving messages. However, the client is written to retrieve specific objects for a user and provide a synchronous interface to them, and a system retrieves many objects asynchronously, and stores them in some kind of database; so it may be argued that there is something that looks like a client, and something that looks like a server.

On the other hand, a server also may handle accepting connections and parsing messages, and thus a system would not be a server, as it does not do those things; so we should stick with the term system to be precise in what a component does.

2.1 Connections

A connection connects a client and a server together, allowing the two to exchange messages between each other. decentralise2 is able to handle many different types of connections, including in-memory passing connections and socketed connections that use the operating system’s networking capabilities, which only have to implement a simple protocol.


PIC

Figure 2.1: A diagram of provided connection classes. Shaded classes are instantiable, and not shaded classes are not instantiable. Dashed classes are used to denote connection types.


connection [Protocol Class]

Note that a connection is implicitly started when it is instantiated.

stop-connection connection [Generic Function]

Stop the connection, which should free any resources that creating the connection acquired, such as threads and sockets. This generic function uses the progn method combination, and thus every method should be qualified with progn (or :around).

handle-message connection message [Generic Function]

Handle a message that was sent to the connection.

handle-message (connection connection) message [Method]

The default method calls the message handler of the connection with the message.

:message-handler [Initarg]

connection-message-handler connection [Accessor]

The message handler is a function that should be called with each received message. The default handler will wait for a new handler to be set, in order to prevent some race conditions, where a message could be received before the client has set a message handler.

write-message connection message [Generic Function]

Write a message to a connection. The implementation of this must be thread safe, such that two threads can call write-message simultaneously, and will correctly write both messages.

2.1.1 Connection types

The types of data that a connection can send are represented by a connection’s class, not unlike the stream classes such as fundamental-character-stream and fundamental-binary-stream in the Gray streams de-facto standard.

Note that, unlike Gray streams, all connections are expected to be bidirectional, but a connection that drops all outgoing messages, or never receives any incoming messages is arguably still correct.

character-connection [Protocol Class]

A connection that can send blocks with vectors of characters (strings) as data.

binary-connection [Protocol Class]

A connection that can send blocks with vectors of octets as data.

2.1.2 Threaded connections

threaded-connection [Protocol Class]

A connection that uses a thread to read messages.

read-message connection [Generic Function]

Read a message from the connection, blocking until one is present, then returning it and T, or return something and NIL if a message cannot be read ever again (e.g the other node closed the connection).

listener-loop connection [Generic Function]

The thread of the connection will call this function, and stop when it returns.

listener-loop (connection threaded-connection) [Method]

Repeatedly calls read-message, handling its return values appropriately.

Passing connections

A passing connection passes its messages to another connection, sending messages that are written to the connection to the target. Two of these connections in a loop can be used to create a hidden socket.

Hidden sockets are very useful for testing, as they function entirely in the Lisp image, without acquiring external resources or having to encode and decode any messages.

passing-connection [Class]

:target [Initarg]

passing-connection-target connection [Accessor]

The connection that the passing connection should pass its messages to.

:direction [Initarg]

passing-connection-direction connection [Reader]

The “direction” of a passing connection. Any object that provides context to the use of a connection may be supplied, and this object is printed when printing a passing connection.

:name [Initarg]

The “name” of a passing connection, a string which provides context to the use of a connection. This defaults to "Passing-Connection", and is printed when printing a passing connection.

make-hidden-socket &key class name initargs [Function]

Create two passing connections that are each others’ targets. The connections are instances of the provided class (defaulting to passing-connection), and are instantiated with the provided initargs.

2.1.3 Netfarm wire format connection

netfarm-format-connection [Class]

A character connection that exchanges messages using a simple textual format, based on the wire protocol Nettle used.

As the name suggests, this was intended to be the default connection class for Netfarm; but the creation of Netfarm’s much more efficient binary format suggests that we should use a binary connection class as default.

Nonetheless, it is quite easy to read, and superficially looks like the HTTP/1.0 wire protocol. Messages begin with unique “verbs”, and are separated by new lines.


boolean ::= yes no
integer ::= (0 9)*
length-prefixed-string ::= integer : character*



block-name ::= length-prefixed-string
channel-name ::= length-prefixed-string
id ::= length-prefixed-string
line-count ::= integer
reason ::= length-prefixed-string
uri ::= length-prefixed-string
version ::= integer



announce ::= announce boolean
block-header ::= block block-name version line-count channel-name*
error ::= error block-name reason
get ::= get block-name*
ok ::= ok block-name
node ::= node uri id
subscribe ::= subscribe channel-name*
subscription ::= subscription block-name version channel-name*

Length prefixed strings are preceded by the length of the string (written in base 10), and a colon.
Figure 2.2: An Extended Backus-Naur form-like syntax description of the Netfarm wire format


The base decentralise2 messages are represented as follows:

2.1.4 Netfarm binary connection


boolean ::= 00 01
byte ::= 00 ff
long-integer ::= short-integer × 8
short-integer ::= byte
long-bytes ::= long-integer byte*
short-bytes ::= short-integer byte*
long-string ::= long-bytes
short-string ::= short-bytes
name-list ::= long-integer short-string*



block-name ::= short-string
block-list ::= name-list
channel-list ::= name-list
version ::= long-integer
metadata ::= block-name long-integer channel-list
uri ::= short-string
id ::= short-string



get ::= 01 block-list
character-block ::= 02 metadata long-string
binary-block ::= 03 metadata long-bytes
ok ::= 04 block-name
error ::= 05 block-name long-string
subscribe ::= 06 block-list
subscription ::= 07 metadata
allow-announcement ::= 08 boolean
announce ::= 09 short-string short-string

Byte vectors are preceded by the length of the vector. Strings are byte vectors, that are decoded and encoded as character vectors. Name lists are lists of names, that are preceded by the length of the list.
Figure 2.3: An Extended Backus-Naur form-like syntax description of the Netfarm binary wire format


2.2 Messages

A message is an abstract object that is sent over a connection, and can be considered the lowest-level communication protocol decentralise2 is concerned with.

message-case message &body case* [Macro]

case ::= ( pattern form* )
pattern ::= ( keyword variable* )
dont-care ::= - _ NIL
variable ::= dont-care symbol

Match the value of message against some patterns, which look like the arguments to the function message, and evaluate the forms of the first matching case, with all variables that don’t match the rule dont-care bound to the arguments.

message keyword &rest value* [Function]

Create a message from the type designated by the keyword with the given values. The number of values must be the same as the number of accessors of the type.

2.2.1 Built in message types

get-blocks (:get names) [Message Type]

put-block (:block name version channels data) [Message Type]

ok-response (:ok name) [Message Type]

error-response (:error name reason) [Message Type]

new-subscriptions (:subscription name version channels) [Message Type]

subscription (:subscribe channels) [Message Type]

announcement-control (:allow-announcement allow?) [Message Type]

node-announcement (:announce uri id) [Message Type]

2.2.2 Defining message types

define-message-type keyword type-specifier constructor-name &rest accessors [Macro]

Define a message type named by the given keyword, which corresponds to the given Common Lisp type specifier, can be constructed by calling the function named by the constructor-name, and can be destructured using all the functions named in accessors.

2.2.3 Translators

The content of messages (particuarly :block messages) is often subject to translation, so that logic in a system can be separated cleanly from serializing and deserializing for connections.

object->data object target source [Generic Function]

“Serialize” or “render” an object (which came from, or represents an instance of the source), into a value that can be sent as data on the target.

data->object data source target [Generic Function]

“Deserialize” or “parse” the data (which represents an instance of, or something the target can consume), into an object, which was received from the source.

Note that all three arguments should be specialized, else weird things can occur. For example, an ansible-connection may be both a binary-connection and a character-connection, so it is possible it will receive either character or binary messages. If you write methods to parse some data with specializers (t binary-connection target) and (t character-connection target), either method may be selected, based on the class precedence list of ansible-connection, independent of the type of data in the message; and the wrong parser may be used. Instead, you should specialize on (string character-connection target) and (vector binary-connection target), so that the correct method will be selected.

2.3 Acceptors

An acceptor creates connections by accepting them from something that faces the outside world, such as a server socket, and gives them to a system.

acceptor [Protocol Class]

The protocol class for an acceptor.

threaded-acceptor [Protocol Class]

The protocol class for an acceptor that uses a thread to give a system connections.

2.3.1 Acceptor protocol

give-system-connection system connection [Generic Function]

Give a connection to the system the acceptor accepts connections for.

An acceptor may have to define :after methods on:

start-acceptor acceptor system [Generic Function]

Start the acceptor, allowing it to create connections on behalf of the system.

start-acceptor (acceptor threaded-acceptor) system [Method]

Start a thread that repeatedly accepts connections by calling acceptor-loop.

stop-acceptor acceptor [Generic Function]

stop-acceptor (acceptor acceptor) [Method]

2.3.2 Threaded acceptors

An acceptor must define a method on:

accept-connection (acceptor) [Generic Function]

Return a new connection that should be added to the acceptor’s system.

An acceptor should usually only have to define a primary method on:

acceptor-loop acceptor [Generic Function]

acceptor-loop (acceptor acceptor) [Method]

Forever accept connections and add them to a system.

Socket acceptor

socket-acceptor [Class]

An acceptor that accepts connections from a socket, also wrapping the connections in SSL.

:connection-class [Initarg]

The class to make instances of. This must be a subclass of either character-connection or binary-connection.

:host [Initarg]

:port [Initarg]

The hostname and port to bind to. These default to "0.0.0.0" and 1892, respectively.

:certificate [Initarg]

:key [Initarg]

Pathnames for the certificate and key files.

start-acceptor (acceptor socket-acceptor) system [:Before Method]

This method starts listening on the given hostname and port.

stop-acceptor (acceptor socket-acceptor) system [:Before Method]

This method stops listening on the given hostname and port.

accept-connection (acceptor socket-acceptor) system [Method]

A socket method accepts a connection by accepting a connection from the socket, and instantiating an instance of the acceptor’s connection class.

Connections are instantiated with initargs :socket and :stream, with values for the raw socket and the SSL-wrapped stream, respectively.

2.4 Systems

A system handles the requests sent to a node in a distributed system, and sends requests for blocks it has not yet retrieved to synchronise itself. A block is a tuple consisting of a name, version, channel set, and some data; of which the metadata can be used by a system to determine what it must retrieve and update.

system [Protocol Class]

The protocol class for a system. A system is maintained by some maintenance threads. One of which is the leader thread, which runs a leader loop that schedules requests for synchronisation and stops itself when the system is stopped.

TODO: The standard-system should track maintenance threads like it tracks the leader thread, and eventually, this should all be integrated into the bailout thread supervision library.

standard-system [Class]

A system that uses the database and synchronisation protocols to implement the basic behaviour of a node.

echo-system [Class]

A silly system that responds to any messages other than invalid syntax with the message sent.

:id [Initarg]

system-id system [Generic Function]

The ID of a system, as an integer in the range [0,2256)

:acceptors [Initarg]

The acceptors of a system.

2.4.1 Database protocol

The database protocol is part of the system protocol, because a system may process the data sent over the wire into some other format that should be stored and serialised by some other protocol. For example, the Netfarm system implements get-block in terms of netfarm-system:get-vague-object, and provides the rendered vague object and all the object names that affected it via computed values.

It is possible to separate the synchronisation and database protocol implementations, and combine them into a complete system implementation using multiple inheritance when using the reference implementation:

Listing 2.1: Using multiple inheritance to create a system
(defclass foobar-sql-database () 
  ((database-host :initarg :host 
                  :reader foobar-database-host) 
   ...)) 
 
(defclass my-system 
  (decentralise-system:standard-system 
  foobar-sql-database) 
  ()) 
 
(defvar *system* 
 (make-instance my-system 
  ; an initarg for FOOBAR-SQL-DATABASE 
  :host "sql-server.local" 
  ; an initarg for STANDARD-SYSTEM 
  :concurrent-requests 20))

memory-database-mixin [Class]

The simplest database implementation, which stores block data in a concurrent hash table.

not-found [Condition]

A condition type that is raised when get-block or get-block-metadata are called with the name of a block that is not stored by the system.

The following generic functions must be implemented by a database:

get-block system name [Generic Function]

Return block structure for the block named name from the system, or signal a condition of type not-found.

put-block system name version channels data [Generic Function]

Store a block with the provided block structure in the system, or signal a condition of type error.

map-blocks function system [Generic Function]

Call function repeatedly with block structure for the metadata of every object in the system.

get-block-metadata system name [Generic Function]

Return block structure for the metadata of the block named name from the system, or signal a condition of type not-found.

This generic function will be called for each block’s metadata sent to a system, so adding a method which does not have to load a block’s data is highly recommended. Otherwise, for the implementors’ convenience, the following method will be used:

get-block-metadata (system system) name [Method]

The default method for get-block-metadata calls get-block, and returns the metadata from the data it retrieved, but implementations which are even slightly concerned with efficiency should implement a method for get-block-metadata that does not have to read the data for a block.

2.4.2 Synchronisation protocol

When metadata about a block is received, it is categorised into either the interesting set or the uninteresting set, or discarded.

These two sets are represented by predicates, and changes between the sets can be represented in several ways (see Set specifiers for a reference of these set specifiers). The interesting and uninteresting set predicates must usually remain referentially transparent and not change, to allow the system to cache block metadata and schedule requests; but the sets can be modified when the system is notified, using update-system-for-new-interesting-block-predicate.

interesting-block-p system name version channels [Generic Function]

Determine if a block should be requested, based on its metadata.

interesting-block-p (system standard-system) name version channels [Method]

Everything is interesting by default.

uninteresting-block-p system name version channels [Generic Function]

Determine if a block’s metadata should be kept, in case it may become interesting later, due to update-system-for-new-interesting-block-predicate.

uninteresting-block-p (system standard-system) name version channels [Method]

Everything that has a newer version than the system stores is uninteresting to the system.

update-system-for-new-interesting-block-predicate system interesting-set uninteresting-set [Generic Function]

Update the system’s cache of the interesting set, adding every member of the interesting-set to the interesting set and removing every member of the uninteresting-set from it.

2.4.3 Leader loop protocol

The leader loop of a system performs tasks that are not easy to delegate to a connection, such as watching for requests that timed out, and replacing them with new requests.

leader-loop system [Generic Function]

Run the leader loop for the system. This is expected to return only when the system is stopped.

leader-loop (system standard-system) [Method]

The leader loop of a standard-system implements the description we gave of a leader loop.

start-system system [Generic Function]

Start a system, by starting all its acceptors and a leader thread.

stop-system system [Generic Function]

Stop a system by closing all its acceptors and connections, and advising maintenance threads to stop.

system-stopping-p system [Generic Function]

Should the threads maintaining the system stop now?

2.4.4 Tuning the standard-system

:scheduler-timeout [Initarg]

The time (in seconds) that the scheduler should wait for a request to be responded to. This defaults to 10 seconds.

:scheduler-concurrent-requests [Initarg]

The number of requests that the scheduler can wait for at a time. This defaults to 80 requests.

TODO: It may be useful to add a(n optional) limit on how many requests can be sent to one connection at a time.

2.5 Standard system behaviour

2.5.1 Message interpretation

TODO: A subclass of standard-system could handle :announce in order to implement peer discovery.

2.5.2 Special blocks

Retrieving and storing (using :get and :block messages, respectively, with the names of) some blocks from a standard system will cause them to perform some special behaviour that does not invoke get-block or put-block as per normal.

define-special-block (class-name block-name &optional (data-type-name t)) &key get put [Macro]

Define a special block for instances of a class, with associated get and put functions.

NOTE: Because we use generic functions to implement special block dispatch, it is possible to use call-next-method. Should this be standard behaviour?

The data returned by the get function is translated by object->data, with the given data-type-name as source, and the connection as target. The data given as an argument to the put function is translated by data->object, with the connection as source, and the given data-type-name as target.

The standard system provides some special blocks:

list [Special block]

This block contains a list of every block name stored (excluding special block names).

On a character connection, the list is encoded with the metadata of a block per line, each line containing the length-prefixed name, the version number, and then length-prefixed channels, somewhat mirroring the syntax of the header of :block messages when sent from a netfarm-format-connection.

On a binary connection, the list is encoded as repeating metadata records. These records do not have any delimiters between them; as their length can be detected from the record data. Each record contains the name encoded as a short-string, the version as a long-integer, the number of channels as a long-integer, then each channel name as a short-string.

More formally, both encodings can be generated with the following grammars:


On character connections (building upon the rules in Figure 2.1.3):

block-metadata ::= block-name version channel-name*
block-listing ::= empty-string
| (block-metadata newline)* block-metadata

On binary connections (building upon the rules in Figure 2.1.4):

metadata-record ::= block-name version channel-list
block-listing ::= metadata-record*

Figure 2.4: Extended Backus-Naur form-like syntax descriptions of listings


When a standard system receives a listing, it assumes that the blocks in the listings may be retrieved from the client, and it may send :get requests for those blocks. If the listing cannot be parsed, a system will instead respond with an :error message, for the block list, and with the reason invalid listing.

id [Special block]

This block contains the ID of the node.

On a character connection, the ID is encoded as a 64-character hexadecimal string, with uppercase A through F characters, and the most significant digit written first. For example, the ID 9781484261347 would be encoded as 000000000000000000000000000000000000000000000000000008E56DE50FE3.

TODO: Add a binary encoding. Currently, all our binary connection classes are also character connections, and are thus capable of sending character blocks, but this may not be ideal. On the other hand, a 50% reduction of the size of a 64-byte block that is only sent once is not anything to write home about.

When a standard system receives an ID, it associates that ID with the connection used to send the ID. This ID appears in node listings, should the connection then allow itself to be announced.

nodes [Special block]

This block contains a listing of all nodes that have announced themselves to the system, with each node record separated by a newline. Each item in the listing has a length-prefixed URI, which may be used to connect to a node, a space, and its ID.

TODO: Add a binary encoding. Again, this block is only retrieved once (as nodes are announced after that with another message type), but it would still be nice to have a binary encoding.

2.6 Clients

2.6.1 Client types

client [Protocol Class]

connection-client [Class]

A client which uses a connection to perform actions.

:connection [Initarg]

client-connection client [Reader]

The connection the client uses to perform actions.

decentralise-kademlia:kademlia-client [Class]

A client which uses multiple connections to perform actions, deciding on which connections to use and searching for more connections using the Kademlia distributed hash table search algorithm.

2.6.2 Actions

Asynchronous execution is handled using a action protocol. An action is some computation that can be composed (in a method similar to monads in functional programming), and run later. Actions have some benefits over some other asynchronous protocols, such as callbacks:

chain &body binding* return-form [Macro]

A macro which chains some actions to create a more complex action.

Each binding is either of the form (variable <- action), indicating that the value of the action should be bound to a variable, or (variable = form), indicating that the value of a form should be bound to a variable. The package of <- or = does not matter, much like the keywords in extended loop (but :<- doesn’t look very nice, and := wouldn’t really re-assign anything). The value of the action is produced by the return-form.

The method for expanding chain is quite similar to the method for desugaring do notation. The expansion of chain is defined by a recursive function:

E[(chain value)] = value
E[(chain (v = f) . rest)] = (let ((v f)) E[(chain . rest)])
E[(chain (v  a) . rest)] = (then a (lambda (v) E[(chain . rest)])

run action [Function]

Run an action, either returning the value it succeeded with, or signalling the condition it failed with.

run-away action [Function]

Run an action in the background, returning nothing of interest immediately.

%run action success failure [Generic Function]

Implement run by setting up the action to be run somewhere, which should end up calling either success with a value, or failure with a condition. This probably should not block.

Basic actions

We have picked these actions as primitives to compose other actions out of. It is possible to implement every other action in the client, in terms of these actions, but this is not necessary. (NOTE: We don’t.)

finish value [Function]

An action which always returns value.

functional-action function [Function]

An action which calls function with a success continuation and a failure continuation. If the function calls the success continuation with a value, it succeeds with that value. If the function calls the failure continuation with a value, it fails with that value. If the function signals an error, the action fails with that error.

Derived actions

From these basic actions, it is possible to derive more useful actions.

then action action-producer [Function]

Produce an action which runs action, then calls action-producer with the value action produced to produce another action, then runs that action.

then could be implemented so that (then a p) (functional-action (lambda (s f) (%run a (lambda (v) (%run (funcall p v) s f)) f))).

then* action1 action2 [Function]

Produce an action which runs action1 and discards its result, then calls action2 and produces its result.

then* could be implemented so that (then* a1 a2) (then* a1 (constantly a2)).

parallel actions [Function]

Run a sequence of actions in parallel. If all actions succeed, this action succees with a vector of all the values of the actions, or if one action fails, this action fails with the first condition.

Evidently (parallel empty-sequence) (finish #()), and (parallel (singleton-sequence-of a)) (chain (x <- a) (finish (vector a))). The other cases are much more hairy.

attempt action failure-producer [Function]

Similar to then, except that instead failure-producer will be called with a failure condition, and the action will succeed with the value of calling failure-producer.

The implementation can be similar to that of then, but extending the failure continuation instead of the success continuation.

Client actions

get-block* client name &key &allow-other-keys  [Generic Function]

An action which retrieves a block from a client.

put-block* client name version channels data &key fail-if-too-old &allow-other-keys  [Generic Function]

An action which puts a block to a client. If fail-if-too-old is true, we expect that this block should be the newest version, and fail if it is too old. If fail-if-too-old is false, then we consider the action successful if a node has a newer version of the block..

Chapter 3
Netfarm

The second component that was developed was the Netfarm object system. The Netfarm object system introduces a text format for representing an object, as well as code to serialize and deserialize objects.

The part of the Netfarm system that implements an object system is also named Netfarm. It may be helpful to describe this component as the “Netfarm object system”, and the entire system as the “Netfarm software stack” or words to those meanings.

As an example of how Netfarm enables its users to exchange objects with little friction, we will imagine that there are two car enthusiasts, Adrian and Bob, who want to discuss their cars. Here is a class definition for a car that they could have used in a client program:

Listing 3.1: A class definition for a car
(defclass car () 
  ((colour :initarg :colour :reader car-colour) 
   (year   :initarg :year   :reader car-year) 
   (make   :initarg :make   :reader car-make) 
   (model  :initarg :model  :reader car-model)) 
  (:metaclass netfarm:netfarm-class))

This class functions identically to any other class defined using the Common Lisp Object System:

Listing 3.2: Instantiating and describing an instance of a Netfarm class
Adrian> (make-instance car :year 1952 :make "Studebaker" 
                            :model "Starlight" :colour "blue") 
#<CAR> 
Adrian> (defvar *my-car* *) 
*MY-CAR* 
Adrian> (car-model *my-car*) 
"Starlight"

We will discuss how Adrian can send that description of a car to Bob later.

3.1 Objects

The Netfarm object system has objects, which are instances of schemas, which are themselves objects.

This system maps well to the Common Lisp object system, and we can describe schemas using classes that are instances of a special metaclass.

netfarm-class [Metaclass]

The class of a Netfarm class.

object [Class]

The class of a Netfarm object.

A Netfarm object also has a list of signatures, and a source function, that is called to retrieve referenced objects lazily.

:signatures [Initarg]

object-signatures object [Accessor]

The signatures of an object, stored as an association list of user object-signature data pairs.

:source [Initarg]

object-source object [Accessor]

A function that takes an object name, and returns the object with that object name. This function typically requests the object from a client.

3.1.1 Slots

Much like objects in the Common Lisp meta-object protocol, Netfarm objects are represented as vectors of slots; an object is comprised of a vector of “normal” slots, and a vector of computed slots.

Each Netfarm slot maps to a Common Lisp slot, but some Common Lisp slots don’t map to Netfarm slots1.

To avoid some potential non-determinism in how slot lists of schemas expressed with Common Lisp classes are ordered when read by Netfarm, slots are sorted by string< on their names in slot vectors.

:netfarm-name [Initarg]

slot-netfarm-name slot-definition [Reader]

The name of a slot, a string or nil if the slot should not be present in the Netfarm object. This defaults to the downcased name of the slot-definition-name of the slot.

Multiple slots of a class cannot have the same name.

:renderer [Initarg]

slot-renderer slot [Reader]

:parser [Initarg]

slot-parser slot [Reader]

The names of functions which are used to render and parse slot values. The parser is called when calling apply-class, and is used to parse a Netfarm representation into the preferred representation of your program. The renderer is called when calling render-object or binary-render-object, and is used to render your representation as a Netfarm representation.

For all values v, it is expected that (netfarm-equal (funcall (slot-renderer slot) (funcall (slot-parser slot) v)) v); the renderer and parser should round trip. If this is not the case, weird things can happen when processing objects, and you should instead use :netfarm-name to create duplicate slots with your preferred representation, preserving the Netfarm representation in the original slot.

fix-up-copied-object object [Generic Function]

Fix up the slots of an object that has been loaded (called from apply-class). This should be used like initialize-instance; you should add :after methods to it, and should probably be used to reproduce your preferred representation slots from the Netfarm representation slots.

3.1.2 Scripts

script [Protocol Class]

:scripts [Initarg]

netfarm-class-scripts class [Reader]

The scripts of a class.

3.1.3 Inheritance

Slots are inherited as usual for Common Lisp.

The presentation script, if not provided, is inherited from the first superclass of the class that has a presentation script. The message script, if not provided, is inherited from the first superclass of the class that has a message script.

The initialization scripts of a class are the union of the direct initialization scripts of a class, and the initialization scripts of the superclasses of the class, with duplicates removed.

3.2 Inbuilt objects

Inbuilt schemas do not have meaningful values for the documentation slot, because portable code could depend on the values otherwise, and we would like to be able to modify the documentation if it is unclear or incorrect.

inbuilt@schema[Inbuilt instance of inbuilt@schema]



Slot name

Value



computed-slots

()

documentation

""

scripts

()

slots

(("computed-slots") ("documentation") ("scripts") ("slots"))



Schemas are special objects that describes the behaviour and storage of their instances. Note that the schema inbuilt@schema describes itself.

inbuilt@user[Inbuilt instance of inbuilt@schema]



Slot name

Value



computed-slots

(("data"))

documentation

""

scripts

(ref:inbuilt@user-script)

slots

(("ecdh-key") ("sign-key"))



Users are the primary form of authentication in Netfarm, as object signatures are verified with the keys of the users attached with them.

A user object stores a public Ed52219 signing key (in sign-key), and a Curve25519 exchange key (in ecdh-key).

inbuilt@script[Inbuilt instance of inbuilt@schema]



Slot name

Value



computed-slots

()

documentation

""

scripts

()

slots

(("entry-points") ("methods") ("program") ("variables"))



Scripts are “programs” that describe the behaviour of an object. A script has entry points that describe internal procedures, methods that describe external procedures, an octet vector encoding instructions in the program, and a list of the initial values of its variables.

inbuilt@user-script[Inbuilt instance of inbuilt@script]



Slot name

Value



entry-points

()

methods

(("add-datum" 2 0))

program

#(7 73 1 12 134 71 16 0 1 0 0 8 2 0 4 1 0 130 8)

variables

("data")



This script can be assembled with

(define-script *user-add-datum-script* ("data") 
  (:method "add-datum" 1) 
  ;; Make the expected author list (just myself). 
  self (list 1) 
  ;; Get the senders author list. 
  sender object-authors 
  ;; If they arent equal, get out. 
  equal (jump-cond 0 1 0 0) return 
  ;; Otherwise, add their message as a computed value. 
  (get-value 0) (get-env 1 0) add-computed-value return)

inbuilt@mirror-schema[Inbuilt instance of inbuilt@schema]



Slot name

Value



computed-slots

()

documentation

""

scripts

(ref:inbuilt@mirror-script)

slots

()



The schema of an object that can ‘reflect’ messages in a way that makes the messages anonymous, by making the reference to the sender useless.

For example, this object could be used in an authentication scheme where one object has to challenge another to present a reference to itself. Without a mirror, the other object could cheat by always presenting the sender object; but with a mirror, it cannot do so.

This schema cannot be initialized by the user.

inbuilt@mirror[Inbuilt instance of inbuilt@mirror-schema]



Slot name

Value





The only instance of inbuilt@mirror-schema.

3.3 External formats

Objects are usually serialized in order to transmit them to another client that does not share memory with the origin of the object, and that client must then deserialize the serialized form to produce a usable object.

In Netfarm, serialization is done by converting an object to a vague object, which has as much context of the object removed as possible, which can then be serialized into a character or octet vector. These vectors can then be deserialized by deserializing the vector into a vague object, and then re-introducing the context that the origin of the object supplied.

The octet-vector representation is also used for generating and verifying signatures.

3.3.1 Vague objects

vague-object [Class]

The class of a vague object.

apply-class vague-object class [Function]

Recontextualize a vague object by “applying” a class to it, producing an object.

3.3.2 Character format

render value &optional stream [Function]

Render a value. If a stream is given, then write the value to that stream, else return the rendered value as a string.

render-object object &optional stream [Function]

Render an object. If a stream is given, then write the object to that stream, else return the rendered object as a string.

parse stream-or-string [Function]

Parse a value from a stream or a string.

parse-block string &key source name [Function]

Parse an object from a string. The returned vague object has the supplied source and name.

3.3.3 Binary format


byte ::= 00 ff
byte-count ::= byte



integer ::= byte-count byte*
length ::= integer



string ::= 01 length byte*
byte-vector ::= 02 length byte*
positive-integer ::= 03 integer
negative-integer ::= 04 integer
list ::= 05 length value*
reference ::= 06 length byte*
true ::= 07
false ::= 08
slot-unbound ::= 09



value ::= string byte-vector positive-integer
negative-integer list
reference true false
slot ::= value slot-unbound
slot-vector ::= length slot*



metadata-element ::= string value
metadata ::= length metadata-element*



slots ::= slot-vector
computed-slots ::= slot-vector
object ::= metadata slots computed-slots

Strings, byte vectors, slot vectors and references have as many bytes as the length represents, lists have as many values as the length represents, and metadata has as many elements as the length represents.

The integer 0 must be represented as a positive-integer, and integers must be represented with as few bytes as possible; for example, 12,345 must be represented as 03 02 30 39, and not 03 03 00 30 39 or with more zero bytes.

Figure 3.1: An Extended Backus-Naur form-like syntax description of the Netfarm object format


binary-render value &optional function [Function]

Render a value. If a function is given, then write the value by calling it with octet vectors that it should write, else return the rendered value as an octet vector.

binary-render-object object &key function (emit-computed-values t) (emit-signatures t) [Function]

Render an object. If a function is given, then write the value by calling it with octet vectors that it should write, else return the rendered value as an octet vector.

emit-computed-values and emit-signatures control if computed values and signatures will be rendered, respectively. The resulting vector will not be parseable if computed values and/or signatures are not rendered. These arguments are present to facilitate hashing and signing, and usually should be left to their defaults.

binary-parse function-or-vector [Function]

Parse a value from a function (called with no arguments, which returns octets) or a vector.

binary-parse-block function-or-vector &key source name [Function]

Parse an object from a function (as previously described for binary-parse) or a vector.

3.4 Cryptographic operations

3.4.1 Hashing

Netfarm uses cryptographic hashes to “name” objects, so that they can be referenced and retrieved by clients that do not already have them stored. Hashes are also used in the process of signing objects.

An object is hashed by rendering it in the binary format, and then hashing the resulting octet vector using the SHA-256 hash algorithm. Usually, computed values are not included in the render, as computed values must not affect object identity. Furthermore, when hashes are used in signing objects, signatures are not included, as they would need to be already present in the object to be signed, or break identity (by changing the resultant hash, or by allowing signatures to be added or removed to an object while preserving hash value).

hash-object object &key (emit-signatures t) [Function]

Produce a binary hash. Note that inbuilt objects cannot be hashed, and thus hashing them will signal an error. emit-signatures will control if signatures are included in the hash.

The most obvious inbuilt object that could not possibly be hashed is inbuilt@schema, which is its own schema. Of course, you could hash it by referencing its inbuilt name and not its hash, but the renderer would have to have a special case to use inbuilt objects’ names instead of the result of hash-object*, which felt worse than disallowing hashing inbuilt objects somehow.

hash-object* object [Function]

Produce a textual hash, equivalent to (bytes->base64 (hash-object object)) for non-inbuilt objects, and in the form inbuilt@name for inbuilt objects.

with-hash-cache () &body body [Macro]

Cache hashes produced with both hashing functions in the body.

Hashing can result in accidentally quadratic runtime2 when traversing graphs of objects and hashing them (such as saving a graph of objects with the Netfarm client). with-hash-cache will reduce hashing operations that have already been done (usually by recursive calls between hashing and the renderer) to hash-table lookups, reducing that kind of pattern to roughly linear runtime.


PIC

Figure 3.2: Hashing a graph of objects naively requires O(n2) hashes


Hashes used in signing are currently not cached (TODO: though they probably should be.)

If objects that have been hashed are modified, then the results of hash-object and hash-object* are undefined; but the dynamic extent of with-hash-cache should allow for fairly easy analysis of where hashes can be cached.

Object hashes are stored in a weak hash table, so they will be removed when the objects are garbage, and the garbage collector runs.

with-lexical-hash-cache () &body body [Macro]

Bind the hash caches in a way that allows for them to be closed over, and restored using the local macro with-restored-lexical-hash-cache. This is intended to be used to share cached hashes between threads; which takes some more deliberation to ensure correctness, but can be beneficial.

The Netfarm server uses a hash cache shared between all threads; and this is safe as objects are never mutated inside the server.

with-restored-lexical-hash-cache () &body body [Macro]

Restore the hash caches from with-lexical-hash-cache.

3.4.2 Signing

Chapter 4
Networking

Netfarm uses a small network protocol to exchange objects and updates to objects. This protocol is implemented on decentralise2, but it is not necessary at all for a Netfarm implementation to use any given networking protocol. The basic requirements of the network protocol are:

Any other extensions or mechanisms are not necessary to communicate Netfarm objects, but some may be very useful, such as permitting chunking; so that a user may begin using partially transmitted information before it has all been rendered and received.

4.1 The client

The netfarm-client system provides functions to read and write Netfarm objects from a decentralise client, as well as a client class which can search through a network for an object.

Continuing our car example, Adrian can create a client and use it to save his car:

Listing 4.1: Creating a client and saving an object using it
Adrian> (defvar *client* 
                (make-instance netfarm-client:client 
                               :bootstrap-uris ’("netfarm:a-server"))) 
*CLIENT* 
Adrian> (netfarm-client:save-object *client* *my-car*) 
"Lb6TI6j/GdkLsyCqt20xZhz7PohttpXejZTCKQEjT58="

save-object returns the hash of the object, which can be used by Bob to retrieve the object. Having to pass around a hash is not very convienent, and we can attach our car to some other object Bob may already know about, but we will begin by using this hash to retrieve the car.

Listing 4.2: Retrieving an object using its hash
Bob> (netfarm-client:find-object *client* 
      "Lb6TI6j/GdkLsyCqt20xZhz7PohttpXejZTCKQEjT58=") 
#<CAR> 
Bob> (describe *) 
#<CAR> 
  [standard-object] 
 
Slots with :INSTANCE allocation: 
  ... 
  COLOUR                         = "blue" 
  YEAR                           = 1952 
  MAKE                           = "Studebaker" 
  MODEL                          = "Starlight"

Chapter 5
Netfarm script machine

While it is a component that is packaged with the Netfarm object system, the Netfarm script machine is a sufficiently large and complex component that it deserves its own chapter. Scripts are portable programs that a Netfarm node can run to handle the behaviour of objects.

The script machine is an interpreter for a moderately-sized1 mostly-functional bytecode, which is modelled on the SECD machine, which may be familiar to functional language compiler writers. A simple assembler is provided with the reference Netfarm implementation, which can compile programs such as:

Listing 5.1: A recursive program that computes a Fibonacci number
(netfarm-scripts:define-script *fib* (1 2) 
  (:procedure 1) 
  (get-env 0 0) (get-value 1) < 
  (jump-cond 0 0 0 4) 
  (get-env 0 0) return 
  (get-env 0 0) (get-value 0) - (get-proc* 0) (call 1) 
  (get-env 0 0) (get-value 1) - (get-proc* 0) (call 1) 
  + return)

While this (symbolic) assembly code looks much different to a program written in a high-level language (and also unlike assembly code for real processors), it is similar to the bytecode generated by CLISP for Common Lisp programs, and CPython for Python programs.

Scripts are usually written to implement methods for objects, which describe the behaviour of an object. Some common methods include:

A method call may also materialise as a message, if there is no method available to handle a method call.

The actions (typically, side effects) which a script machine may perform are controlled by a capability list. Some instructions will only execute correctly when a capability has been provided to the script machine. These capabilities currently are:

Scripts running on networks only receive the :write-computed-values capability, as reading introduces non-determinism into the system, and there is no recommender to query. However, scripts running on, for example, a graphical environment used by the end-user to interact with objects, may have all three capabilities.

5.1 Using the script machine

setup-interpreter script arguments &optional capabilities [Function]

run-interpreter interpreter &key cons-limit cycle-limit print-cycles step [Function]

run-script script &rest arguments [Function]

TODO: Add a dynamic variable holding the capabilities the script machine should have.

run-script-machines object &key apply-side-effects recipient-filter [Function]

5.2 Presentations

TODO: This entire section.

Presentations are the other use of scripts in Netfarm. A presentation is a graphical representation of an object that retains its identity. A presentation model allows Netfarm to exchange machine-readable objects between clients, while also being able to present human-readable descriptions of objects. This model was introduced by [Cic84], and has been used in the Common Lisp Interface Manager to great effect.

Presentations are generated by presentation methods that are implemented using the script machine.

5.3 Side effects

An interpreter may produce side effects to interact with other objects. Currently, there is only one category of side effect: side effects which modify the computed value sets of an object.

The final state of an object is produced by applying all the applicable side effects. Depending on the nature of the implementation, it may or may not be appropriate to implement side effects as suggested.

The order of which side effects are applied does not affect the produced object. It is for this reason that Netfarm exhibits strong eventual consistency: how objects are synchronised does not affect the computation performed, and so all nodes storing an object will eventually converge on objects with the same values.

5.3.1 Computed values

We maintain a mapping of computed values to counts for every slot of every object. When all side effects have been considered, the table is traversed, and each computed value is added according to the count if it is not negative, and no computed values are added when a count is negative.

add-computed-value :target :name :value [Side effect]

Increment the count of the given computed value.

remove-computed-value :target :name :value [Side effect]

Decrement the count of the given computed value.

A related technique for implementing these effects, which is useful when saving effects to a disk, is to add each effect to a log, and then compute the computed values by applying side effects when loading an object.

It is trivial to show that computed values exhibit strong eventual consistency. For each relevant effect, we add 1 or add -1 to each counter, and addition of numbers is commutative, i.e. effects can be performed in any order to retrieve the same result. It is also possible to remove effects that cancel out, and to compress the result into one final state periodically. However, the objects-affecting graph would have to be maintained outside the effect log for it to still be correct.

5.4 The script machine

The script machine is a stack machine with a representation of lexical environments and automatic memory management. The major changes to the SECD machine are additional registers for convenience, and to facilitate method calls between multiple objects, and a more common bytecode-based instruction format, rather than an instruction format based on S-expressions.

5.4.1 Structures

The script machine has these registers:




Variable Name

Purpose




S Store

A stack containing intermediate data.

E Environment

The lexical environment, represented as a list of vector “frames”.

P Program counter

The position the interpreter is reading instructions from.

D Dump

A stack telling the interpreter where to return to.




G Globals

A vector of global variables.

C Control

The script the interpreter is current reading instructions from.

A Capabilities

Flags that allow the machine to perform operations that only make sense in some situations.

O Self

The current object, or the empty list, if we have not called a method.

O2 Sender

The last object, which was the value of O before calling a method.

F Side effects

Effects that the interpreter is going to make on the outside world.




A script has these values:




Variable Name

Purpose




P Program

An octet vector that encodes instructions to execute.

E Entry points

A vector of descriptions for procedures, which are combined with environments by some instructions to produce procedures.

M Methods

A vector of descriptions for methods.

G Initial globals

A vector of the initial values of global variables.




A procedure has these values:




Variable Name

Purpose




C Control

The script to execute instructions from.

E Environment

The environment to extend when called.

P0 Initial program counter

The program counter to start executing from.

A Argument count

The number of arguments that this procedure must be called with.

O Object

The object that the procedure belongs to.




5.4.2 Notation

We write the state of a running interpreter as the tuple S,E,P,D. We sometimes reference the registers G, C, A, O and Ooutside the tuple, because they are seldom used by instructions, and so they are just tedious to write most of the time.

We write the states of a halted interpreter as halt[S], should it halt correctly, or error[message, cause] should it fail. (Only the data stack is preserved when an interpreter stops.)

We write the effects of interpreting an instruction as

old state new state

Should an instruction concern itself with only the Store, we may write effects in the form

i1imo1on

(borrowing from Forth conventions) which can be translated to the other notation as

(imi1 . S),E,P,D⟩ → ⟨(ono1 . S),E,P + length,D

where length is the total length of the instruction. Note that the equivalent transition has the stack prefixes “reversed”; the last input is popped first, and the first output is pushed first. In Forth, we may write a program 3 2 - (1) and the stack before evaluating - as a list would be (2 3). To maintain a natural argument order for the programmer, the machine handles arguments “backwards” when described using lists.

To access a value of a structure without destructuring it, we use the notation SA to get the value of the variable (slot) A of S. We also use subscripts to retrieve values from lists and vectors; Sn denotes the nth value in S. The first element of a sequence is retrieved with a zero subscript, as with most programming languages; and retrieving an element that would be past the last element of a sequence causes interpretation to fail.

We create many procedures while running programs, so we write a procedure in the form proc[C,E,P0,A,O].

5.4.3 Execution

The machine first reads the instruction to execute, by reading the Pth octet in CP, and then incrementing P. The instruction arguments are then read the same way. Finally, a transition is chosen based on the state of the machine at this point. Unless the machine has transitioned to a halt state or unwound entirely to a error state, it then proceeds to execute another instruction.

5.4.4 Environment

getenv[(F._),variable,0] = Fvariable
getenv[(_.E),variable,frame] = getenv[E,variable,frame 1]

1: get-proc n [Instruction]

S,E,P,D⟩ → ⟨(proc[C,E,P0,A,O] . S),E,P + 2,D

where (P0 A) = (CE)n

This closes over the current environment, whereas get-proc* does not.

2: get-value n [Instruction]

S,E,P,D⟩ → ⟨(Gn . S),E,P + 2,D

3: set-value! n [Instruction]

(value.S),E,P,D⟩ → ⟨S,E,P + 2,D

where Gn value

4: get-env variable frame [Instruction]

S,E,P,D⟩ → ⟨S,E,P + 3,(value.D)

where value = getenv[E,variable,frame]

5: set-env! variable frame [Instruction]

(value . S),E,P,D⟩ → ⟨S,E,P + 3,D

where getenv[E,variable,frame] value

6: get-proc* n [Instruction]

S,E,P,D⟩ → ⟨(proc[C,(),P0,A,O] . S),E,P + 2,D

where (P0 A) = (CE)n

11: byte n [Instruction]

— n

5.4.5 Control flow

8: return [Instruction]

S,E,P,()⟩ → halt[S]

S,E,P,(normalframe[E,P,C,O].D)⟩ → ⟨S,E,P,D

where O O,C C

S,E,P,(methodframe[E,P,C,O,O2,F,S].D)⟩ → (("ok" . S) . S),E,P,D

where O O,O2 O2,F F

This transition handles half of the “isolation” of method calls, returning a value that is distinguishable from a value produced by propagating an error (with unwind).

9: call n [Instruction]

(proc[C,E,P0,count,O] AnA1 . S),E,P,D⟩ → S,((A1An) . E),P0,(normalframe[E,P + 2,C,O] . D)

when count = n, where O O,C C

Note that the return address must be placed at the instruction after the call instruction.

10: tail-call n [Instruction]

(proc[C,E,P0,count,O] AnA1 . S),E,P,D⟩ → S,((A1An) . E),P0,D

when count = n, where O O,C C

16: jump-cond thenhigh thenlow elsehigh elselow [Instruction]

(#f.S),E,P,D⟩ → ⟨S,E,P + 5 + else,D

where else = elsehigh × 256 + elselow

(v.S),E,P,D⟩ → ⟨S,E,P + 5 + then,D

where then = thenhigh × 256 + thenlow,v#f

Jumps are relative to the end of the instruction, so jumping 0 bytes basically does nothing.

17: jump high low [Instruction]

S,E,P,D⟩ → ⟨S,E,P + 3 + high × 256 + low,D

19: call-method n [Instruction]

S,E,P,(Name Object AnA1. D)⟩ → (),((Methods A1An)),P0,(methodframe[E,P,C,O,O2,F,S].D)

where primary[(... . Methods)] = methods[Object,Name] then O Object,O2 O

TODO: How to compute methods, and how to handle does-not-understand methods.

Every script of the class of an object contains a list of lists (Name P0 A), where Ais the number of “real” arguments. Methods have an implicit “next method list” argument. When calling a method, the next method list is the first argument, and every other method is shifted over one, so the first argument appears in the second position in the new frame.

The global variable vectors for each object, scriptpair must be the same. This ensures that one method on an object and one script can observe the changes made by another method on that object and that script.

It is usually a good idea to cache computation of methods; method caching in old Smalltalk was a fairly cheap way to get a reasonable performance gain, as approximately every method call requires dispatching. This is true to a lesser extent for Netfarm code, which has fewer, but still many, method calls.

24: error [Instruction]

(cause message.S),E,P,D⟩ → error[message, cause]

5.4.6 Forthisms

34: drop [Instruction]

x —

35: dup [Instruction]

x — x x

36: swap [Instruction]

x y — y x

5.4.7 Operators

64: cons [Instruction]

car cdr — (car . cdr)

cdr must be a list; we can’t represent improper lists in Netfarm, so we don’t give the script machine improper lists.

65: car [Instruction]

(car . _) — car

66: cdr [Instruction]

(_ . cdr) — cdr

67: null [Instruction]

() — #t

_ — #f

68: consp [Instruction]

(_ . _) — #t

_ — #f

69: append [Instruction]

a b — a++b

71: equal [Instruction]

a b — equal[a,b]

equal[ (aa . ad), (ba . bd)] = equal[aa,ba] equal[ad,bd]
equal[ (), ()] = #t
equal[ a Integer, b Integer] = a = b
equal[ a String, b String] = a = b
equal[ a Object, b Object] = hash[a] = hash[b]
equal[ proc[aa,ae,ac], proc[ba,be,bc]] = aa = ba ae = be ac = bc
equal[ _, _] = #f

72: string= [Instruction]

a b — equal[a,b]

when a String,b String

73: list n [Instruction]

e1en(e1en)

96: + [Instruction]

a b — a+b

97: - [Instruction]

a b — a-b

98: * [Instruction]

a b — a × b

99: / [Instruction]

a b — ab

The results of division are rounded to the closest integer below the actual value; i.e -5 2 / ⇒−3

100: abs [Instruction]

a — |a|

112: = [Instruction]

a b — a = b

when a Integer,b Integer

Chapter 6
The server

The Netfarm server is a decentralise2 system, which retrieves a partition of all the objects available to it, and makes them available to its connections.

In the introduction to decentralise2, we argued that the terms “client” and “server” do not have enough meaning to be useful when discussing Netfarm software; but we ended up calling this module a “server”.

netfarm-system [Class]

The class of a Netfarm server.

6.1 Database protocol

6.1.1 Implementations of the database protocol


PIC

Figure 6.1: A diagram of Netfarm server classes, including system and database mixin classes. Shaded classes are instantiable, and not shaded classes are not instantiable. Dashed classes are decentralise2 classes included for completion.


memory-system [Class]

A system that stores everything in memory. Compare this to decentralise2’s memory-database-mixin.

simple-memory-system [Class]

A system that stores everything in memory, using the simplest feasible implementations of the database protocol.

filesystem-mixin [Class]

A mixin that stores objects in a filesystem.

Objects are stored in a “trie” constructed of directories, to avoid slowdowns that occur on some file systems, should one directory contain many files. For each object, three files are stored:

Some files are also stored in the storage directory:

:directory [Initarg]

The directory that objects should be stored in.

caching-mixin [Class]

A mixin that caches some objects in memory, but defers to another database implementation when it does not have an object stored.

:cache-size [Initarg]

The maximum rendered size of objects that can be cached at once, in bytes. This defaults to 100 megabytes (108 bytes).

:fallback-vague-object-size [Initarg]

The size that vague objects should be assumed to take when rendered, should the real database not provide a size.

6.1.2 Vague object storage

put-vague-object system vague-object [Generic Function]

get-vague-object system name [Generic Function]

map-vague-objects function system [Generic Function]

vague-object-stored-p system name [Generic Function]

count-vague-objects system [Generic Function]

6.1.3 Presentable set

presentable-name-p system name [Accessor]

count-vague-objects system [Generic Function]

6.1.4 Side effects

apply-side-effect system name cause side-effect-type &key  [Generic Function]

map-computed-values-caused-by function cause-name system [Generic Function]

objects-affected-by system name [Generic Function]

Return a list of the object hashes affected by a given object.

objects-affecting system name [Generic Function]

Return a list of the object hashes which affect a given object.

objects-affecting-hash system name [Generic Function]

Compute a hash value for the objects-affecting list, by    ∏    a (mod 2256)
a∈affecting

This can be implemented incrementally, but the default method computes it from the entire objects-affecting list. (Recall multiplication and modular multiplication are commutative - a database could store the last hash, and then compute h′≡ h × a (mod 2256) when adding a.)

(We were initially going to reduce logxor over the set, but bitwise exclusive-or, among all additive groups, can be attacked by [Wag02]. It is unclear what such an attack could achieve, but we are going to avoid it by using a multiplicative group instead.)

6.1.5 Other interesting set

add-other-interesting-block system name [Generic Function]

remove-other-interesting-block system name [Generic Function]

other-interesting-block-p system name [Generic Function]

6.1.6 Dependency graph

A Netfarm server maintains a dependency graph, consisting of a set of dependency edges, which each store the name of a dependent (a string), the name of a dependency (another string), and a type (a keyword).

For the avoidance of doubt (because we frequently got confused by the names): when a relevant edge exists, storing a dependent object is prevented until a dependency object is retrieved.

All functions that take edge types except for add-dependency-edge also accept the type :all to retrieve all edges, regardless of type.

add-dependency-edge system type dependent dependency [Generic Function]

remove-dependents system dependency [Generic Function]

map-dependents function system type dependency [Generic Function]

map-dependencies function system type dependencies [Generic Function]

map-dependency-edges function system [Generic Function]

no-dependencies-p system type name [Generic Function]

dependency-list-mixin [Class]

A mixin that implements a dependency graph by storing a list of its edges. This mixin may be slower than dependency-table-mixin, but it is easier to read about, as it does not do any tricky indexing.

dependency-table-mixin [Class]

A mixin that implements a dependency graph by storing the edges in hash tables. This may look up dependencies and dependents more efficiently than dependency-table-mixin, especially with many dependencies to manage.

Appendix A
A description of a voting-web system

We have not thoroughly tested this voting-web system, but we should describe it, because it is a very useful but infrequently applied technique for moderating an online system. This system may be seen as an extension of many Usenet clients’ kill lists, which contain a set of patterns that match posts the user does not want to see; but it can also be used as a whitelist, or just to rank some objects in some order.

It is very important that a distributed system has a means of distributed “moderation”. We wonder why distributed systems are built without distributed moderation of some form, and we believe that using centralised moderation on such a system would be an equivalent of the “people’s stick” described in [Bak73]. Unlike some kinds of moderation in the real world, we are not required to give computers some kind of “objective” truth, and we do not attempt that with Netfarm. Freeing ourselves from that obligation allows us to approximate the desires of each user with greater precision.

This system allows a user to maintain a map of “facts” that they assert to be true or false. A presentation script may query the voting-web system while rendering an object, to decide what related objects it should render, or a client program may just refuse to render some presentation forms of an object that has certain facts about it asserted.

In the absence of a fact asserted by the user, we believe it is possible to determine the probability that the user would assert the fact, as a function of some delegates’ assertions. Suppose Pr(F is true to U) is the probability that a user U takes the fact F to be true, and that delegates(U) denotes the set of delegates of the user U. We can define the probability to be:

Pr(F is true to U) = 1 (if U asserts F to be true)
Pr(F is true to U) = 0 (if U asserts F to be false)
Pr(F is true to U) =     ∑      P r(F is true to D)correlation(U,D)
D ∈delegates(U)
------------∑----------------------------
                   |correlation(U,D )|
        D∈delegates(U )

We also introduced a function correlation(U,D) which estimates the correlation coefficient between two users, which we have appropriated from the memory-based collaborative filtering techniques used in recommender systems. This could be defined as:

correlation(U,D) =        ∑   support(U,f )support(D, f)
     F ∈facts
∘--∑---------------∘---∑-----------------
       support(U,F )2       support(D, F)2
 F ∈facts              F ∈facts
where support(U,F) = (|1    (if U asserts F to be true)
{
|(− 1  (if U asserts F to be false)
 0    (if U does not assert F )
and facts = the rules U asserts the rules D asserts

In the absence of a large number of common facts, setting the correlation coefficient to be a small constant may yield results that are not very good, but are better than not attempting to calculate a probability.

It may also be wise to bias the correlation and resultant probabilities towards zero, to produce less confident values for fewer common facts, and fewer delegates, respectively. This can be done by adding a small positive integer to the denominators of our definitions of correlation(U,D) and Pr(F is true to U).

Appendix B
A brief style guide

How we write Common Lisp code is relatively uninteresting. We mostly follow the SICL style guide (chapter 30 of http://metamodular.com/SICL/sicl-specification.pdf), with the exception of making package designators keywords. This is not set in stone, and using uninterned symbol syntax (eg #:decentralise-system) is also acceptable. Generally, we attempt to use one file per concept (such as “scheduling requests”, or “rendering an object to an octet vector”), and one package per module (such as “the client” or “the script machine”).

Most of our style is derived from the purposes we write code for, and this is reflected in the protocols we write. There are some principles behind how we decide on these protocols:

B.1 Abstraction

Abstraction is not the enemy! It is much easier to reason about an abstract protocol, than more concrete and lower-level constructs.

We are not the only users of any layer of our system, so a seemingly “over-abstracted” design may be beneficial for another user, when someone else attempts to use a layer for a different purpose. This is not something that can easily spiral out of control, contrary to popular belief. As the layers of our Netfarm implementation are completely separated, it is easy to observe what abstractions and generalisations must be provided to support the other layers of this system.

Performing thought experiments and attempting to design systems for other use cases may provide some insight into what features should be included. However, it is likely that armchair philosophy won’t provide details which would only be seen by testing such a system; so prototype it and watch what happens! If this is difficult to do, then we have failed further in designing an easily adaptable system. The contrapositive, however, is that designing an easily adaptable system makes such experiments easy to do. This further provides insights into how to design an easily adaptable system, and we benefit greatly from this feedback loop.

B.2 Feature “creep”

What is the smallest feature set you can use to implement as much as possible? Then, what do you need to add to have users want to attempt as much as possible?

We are not particuarly interested in keeping the system simple, because that tends to mean making stupid systems to many people. Overspecialising the system, by prescribing a use case to the system, such as presenting documents, or creating a chat application, is also not desirable. To repeat part of the introduction, the aim is, by “using schemas and their scripts, it [should be] possible to implement many different programs atop Netfarm.” Implementing a paradigm or program possible should be made possible at almost all costs, but making implementation of one program easier, at the cost of making other programs harder to implement, is never worth it.

If we need to remove a feature from a module, we must consider what a user of that feature could use instead, and test if the resulting designs are still as effective (or, in the case that the feature was of poor quality, more effective). If the programming style is too hampered by modifying a feature, it should not be modified, and an alternative should be found.

For example, we took efforts to understand and analyse the uses of the old client protocol before updating the protocol. The implementation of the protocol in decentralise2 and the one known user (the Netfarm client) were examined, and we compared the ergonomics and performance of either protocol. Constructs were introduced where we couldn’t easily express features of the old protocol in the new, such as introducing run-away to run an action “in the background”, which would occur by creating a request but not waiting for its value, and we also observed what code could be simplified by using features of the new protocol, such as being able to generate actions just like any other value, using loops and conditionals, as they are run independently of being created.

B.3 Portability

Ignoring all other connotations with the word modern, use any features that you would expect a modern programming language to provide.

There should be some concern towards portability, between implementations and programming languages. The system should not be based on “magic” constructs which can only be used in few implementations or programming languages.

However, it is not a goal to support implementations which are designed without useful constructs, such as automatic memory management and exception handling, which greatly decrease the work required to implement a system that is designed with them in mind. There are features in Netfarm which require these constructs, such as the script machine, which does not perform explicit memory management, and also has unwinding, which is easier to program with exceptions; and most of the object semantics demand being able to create multiple references to an object, with non-trivial lifetimes.

Anything which is almost universally considered easy to compile today could probably be expected. Counter-examples include call-with-current-continuation and fexprs, for which new compilation techniques are being produced, but the performance is rarely close to the performance of systems without these constructs. (Also consider that most programming language implementations are old, and if they implement these constructs, they may or may not be using the latest techniques.) Note that this is not related to wide-spread adoption, which is not strictly required. For example, many so-called object-oriented languages, such as C++ and Java, do not feature metaobject protocols, and a convenient way to create new classes at runtime. Implementing systems, such as the Common Lisp meta-object protocol, relatively efficiently has been understood almost since it was introduced. (Okay, compiling with call/cc has been understood for a long time, notably with Steele’s RABBIT Scheme compiler and Appel’s book Compiling with Continuations, but an efficient implementation of CLOS is not much slower than an efficient implementation of a worse object system, such as the Java object system.)

Appendix C
Index

Index

%run Generic Function, 1
:acceptors Initarg, 2
:cache-size Initarg, 3
:certificate Initarg, 4
:connection Initarg, 5
:connection-class Initarg, 6
:direction Initarg, 7
:directory Initarg, 8
:fallback-vague-object-size Initarg, 9
:host Initarg, 10
:id Initarg, 11
:key Initarg, 12
:message-handler Initarg, 13
:name Initarg, 14
:netfarm-name Initarg, 15
:parser Initarg, 16
:port Initarg, 17
:renderer Initarg, 18
:scheduler-concurrent-requests Initarg, 19
:scheduler-timeout Initarg, 20
:scripts Initarg, 21
:signatures Initarg, 22
:source Initarg, 23
:target Initarg, 24

accept-connection Generic Function, 25
accept-connection Method, 26
acceptor, 27
acceptor Protocol Class, 28
acceptor-loop Generic Function, 29
acceptor-loop Method, 30
action, 31
add-computed-value Side effect, 32
add-dependency-edge Generic Function, 33
add-other-interesting-block Generic Function, 34
announcement-control Message Type, 35
apply-class Function, 36
apply-side-effect Generic Function, 37
attempt Function, 38

binary-connection Protocol Class, 39
binary-parse Function, 40
binary-parse-block Function, 41
binary-render Function, 42
binary-render-object Function, 43
block, 44

caching-mixin Class, 45
capability list, 46
chain Macro, 47
character-connection Protocol Class, 48
client Protocol Class, 49
client-connection Reader, 50
computed value sets, 51
connection, 52
connection Protocol Class, 53
connection-client Class, 54
connection-message-handler Accessor, 55
count-vague-objects Generic Function, 56, 57

data->object Generic Function, 58
decentralise-kademlia:kademlia-client Class, 59
define-message-type Macro, 60
define-special-block Macro, 61
dependency edges, 62
dependency graph, 63
dependency-list-mixin Class, 64
dependency-table-mixin Class, 65
deserialize, 66

echo-system Class, 67
error-response Message Type, 68

filesystem-mixin Class, 69
finish Function, 70
fix-up-copied-object Generic Function, 71
functional-action Function, 72

get-block Generic Function, 73
get-block* Generic Function, 74
get-block-metadata Generic Function, 75
get-block-metadata Method, 76
get-blocks Message Type, 77
get-vague-object Generic Function, 78
give-system-connection Generic Function, 79

handle-message Generic Function, 80
handle-message Method, 81
hash-object Function, 82
hash-object* Function, 83
hashes, 84
hidden socket, 85

id Special block, 86
inbuilt@mirror Inbuilt object, 87
inbuilt@mirror-schema Inbuilt object, 88
inbuilt@schema Inbuilt object, 89
inbuilt@script Inbuilt object, 90
inbuilt@user Inbuilt object, 91
inbuilt@user-script Inbuilt object, 92
interesting set, 93
interesting-block-p Generic Function, 94
interesting-block-p Method, 95

leader loop, 96
leader thread, 97
leader-loop Generic Function, 98
leader-loop Method, 99
list Special block, 100
listener-loop Generic Function, 101
listener-loop Method, 102

maintenance threads, 103
make-hidden-socket Function, 104
map-blocks Generic Function, 105
map-computed-values-caused-by Generic Function, 106
map-dependencies Generic Function, 107
map-dependency-edges Generic Function, 108
map-dependents Generic Function, 109
map-vague-objects Generic Function, 110
memory-database-mixin Class, 111
memory-system Class, 112
message, 113, 114
message Function, 115
message handler, 116
message-case Macro, 117
metadata, 118

Netfarm object system, 119
Netfarm server, 120
netfarm-class Metaclass, 121
netfarm-class-scripts Reader, 122
netfarm-format-connection Class, 123
netfarm-system Class, 124
new-subscriptions Message Type, 125
no-dependencies-p Generic Function, 126
node-announcement Message Type, 127
nodes Special block, 128
not-found Condition, 129

object Class, 130
object->data Generic Function, 131
object-signatures Accessor, 132
object-source Accessor, 133
objects-affected-by Generic Function, 134
objects-affecting Generic Function, 135
objects-affecting-hash Generic Function, 136
ok-response Message Type, 137
other-interesting-block-p Generic Function, 138

parallel Function, 139
parse Function, 140
parse-block Function, 141
passing connection, 142
passing-connection Class, 143
passing-connection-direction Reader, 144
passing-connection-target Accessor, 145
presentable-name-p Accessor, 146
presentation, 147
presentation methods, 148
put-block Generic Function, 149
put-block Message Type, 150
put-block* Generic Function, 151
put-vague-object Generic Function, 152

read-message Generic Function, 153
remove-computed-value Side effect, 154
remove-dependents Generic Function, 155
remove-other-interesting-block Generic Function, 156
render Function, 157
render-object Function, 158
run Function, 159
run-away Function, 160
run-interpreter Function, 161
run-script Function, 162
run-script-machines Function, 163

script machine, 164
script Protocol Class, 165
Scripts, 166
SECD machine, 167
serialized, 168
set specifiers, 169
setup-interpreter Function, 170
side effects, 171
simple-memory-system Class, 172
slot-netfarm-name Reader, 173
slot-parser Reader, 174
slot-renderer Reader, 175
socket-acceptor Class, 176
source, 177
standard-system Class, 178
start-acceptor :Before Method, 179
start-acceptor Generic Function, 180
start-acceptor Method, 181
start-system Generic Function, 182
stop-acceptor :Before Method, 183
stop-acceptor Generic Function, 184
stop-acceptor Method, 185
stop-connection Generic Function, 186
stop-system Generic Function, 187
strong eventual consistency, 188
subscription Message Type, 189
system, 190
system Protocol Class, 191
system-id Generic Function, 192
system-stopping-p Generic Function, 193

then Function, 194
then* Function, 195
threaded-acceptor Protocol Class, 196
threaded-connection Protocol Class, 197
translation, 198

uninteresting set, 199
uninteresting-block-p Generic Function, 200
uninteresting-block-p Method, 201
update-system-for-new-interesting-block-predicate Generic Function, 202

vague object, 203
vague-object Class, 204
vague-object-stored-p Generic Function, 205

with-hash-cache Macro, 206
with-lexical-hash-cache Macro, 207
with-restored-lexical-hash-cache Macro, 208
write-message Generic Function, 209

Listings

Bibliography

[Bak73]   Mikhail Bakunin. Statism and anarchy, 1873.

[Cic84]    Eugene C. Ciccarelli. Presentation based user interface. https://dspace.mit.edu/handle/1721.1/6946, 1984.

[Eth19]   Ethereum developers. Sharding faq. https://github.com/ethereum/wiki/wiki/Sharding-FAQ, 2019.

[Rou17]   Indhi Rousseau. Mastodon instances. https://instances.social/list/advanced, 2017.

[Wag02]   David Wagner. A generalised birthday problem. https://www.iacr.org/archive/crypto2002/24420288/24420288.pdf, 2002.

1currently reachable on Instagram at https://www.instagram.com/luke_is_frog/

1I believe the Common Lisp babel library preserves code points, as the encoding and decoding functions are quite simple, but some guarantees on how code points are treated would be nice.

1We do need a more unique and less vague name for this...

1This may, for example, be useful to separate internal and external representations; it would be wise to represent a mapping of objects as a hash table in Lisp, but provide an association list for Netfarm to work with.

2We don’t want to end up on https://accidentallyquadratic.tumblr.com/!

1The instruction set is neither overly minimalist, as it provides instructions for list, number and string processing, nor does it provide any particular “shortcut” instructions, such as retrieving and then calling a procedure, like a real CISC processor may have.