The Netfarm Book

The Cooperative of pplied Language

October 25, 2020

Contents

 0.1 Introduction
  0.1.1 “Abstract”
  0.1.2 How Netfarm deviates from existing designs
  0.1.3 The structure of Netfarm
  0.1.4 Thanks
  0.1.5 What to do next
1 Conventions
 1.1 Block structure
 1.2 Set specifiers
 1.3 Strings
 1.4 Integers
2 decentralise2
 2.1 Systems
  2.1.1 Database protocol
  2.1.2 Synchronisation protocol
  2.1.3 Leader loop protocol
  2.1.4 Tuning the standard-system
 2.2 Acceptors
  2.2.1 Acceptor protocol
  2.2.2 Threaded acceptors
 2.3 Connections
  2.3.1 Connection types
  2.3.2 Threaded connections
  2.3.3 Netfarm wire format connection
  2.3.4 Netfarm binary connection
 2.4 Messages
  2.4.1 Built in message types
  2.4.2 Defining message types
  2.4.3 Translators
3 Netfarm
 3.1 Objects
  3.1.1 Slots
  3.1.2 Scripts
  3.1.3 Inheritance
 3.2 Inbuilt objects
 3.3 External formats
  3.3.1 Vague objects
  3.3.2 Character format
  3.3.3 Binary format
 3.4 Cryptographic operations
  3.4.1 Hashing
  3.4.2 Signing
 3.5 The client
4 Netfarm script machine
 4.1 Using the script machine
 4.2 Presentations
 4.3 Side effects
 4.4 The script machine
  4.4.1 Structures
  4.4.2 Notation
  4.4.3 Environment
  4.4.4 Control flow
  4.4.5 Forthisms
  4.4.6 Operators
5 The server
 5.1 Database protocol
  5.1.1 Implementations of the database protocol
  5.1.2 Vague object storage
  5.1.3 Presentable set
  5.1.4 Side effects
  5.1.5 Other interesting set
  5.1.6 Dependency graph
A A description of a voting-web system
B Index
 Listings
Bibliography

0.1 Introduction

0.1.1 “Abstract”

Netfarm is an attempt at creating a distributed, trustless object system. We develop Netfarm to fill a perceived void in the rapidly-expanding “decentralisation” bubble, to utilise the greater fault tolerance of a distributed hash table and simplify programming via a client language’s meta object protocol, and to devise less hierarchical moderation techniques that do not leave them at the whim of their servers’ operators and administrators.

The design of Netfarm is not ideal due to historical and planning issues; but it should serve well as an introduction to trustless object systems, which we believe can be easier to program and have better throughput than some of the models that are used frequently today, such as blockchains (demonstrated to be usable for programming in Ethereum) and various federated protocols (such as ActivityPub, notably used in Mastodon among other “fediverse” blogging servers).

To summarise the design, Netfarm implements a synchronous message-passing object system (like Smalltalk or Self) hosted on a distributed hash table. A Netfarm network can reach eventual consistency as state is either so unlikely to collide, such as objects which are stored by hashes, which are probably less likely to collide than for a computer to incorrectly compute a hash; or it is simple to unify, such as computed values. However, Netfarm does not decide on an ordering of computed values, and so has fairly weak consistency guarantees.

0.1.2 How Netfarm deviates from existing designs

Some of these points are made redundant by the preceding paragraphs, but it may be worthwhile to explain why we made those claims, and why they are desirable.

0.1.3 The structure of Netfarm

We see the Netfarm system as a synthesis of various components:

However, a user should be able to modify and use any component separately, and should have extensive control over many aspects of the system, to implement any changes while still acting as a “client” of the system, that does not modify the system itself.

0.1.4 Thanks

We would like to thank Luke Nuttall1 for providing a drawing of the Netfarm mascot Demeter.

We would also like to thank Robert Strandh for providing advice on optimising the Netfarm script machine and how Netfarm should interact with client code, and Gilbert Baumann for also suggesting optimisations for the script machine and several stylistic concerns, among other many other insightful discussions.

0.1.5 What to do next

The boundaries that separate a Netfarm implementation and its host language are not yet clear. Several questions remain, such as how much a Netfarm node must know about the data it is verifying, and what it is expected to do for its clients.

We forsee that a new distributed object system will have to be created, to overcome flaws that were introduced to Netfarm through its age and attempts to persue multiple goals. One such problem was that we chose the means of interaction with the network to be sending and receiving data, which creates an awkward split in mechanisms, between the creation of an object and objects sending messages to each other.

As such, we would like to remind the reader that they should take care to form their own opinions on how such a system should be structured from any problems they have noticed, and that Netfarm is not the end-all of distributed object systems. We would hope for something like Pierre-Joseph Proudhon’s dream in this quotation:

Je rêve une société où je serai guillotiné comme conservateur.

(I dream of a world in which I would be executed as a reactionary.)

We hope the approach we have developed for Netfarm, of identifying and constructing alternatives to many forms of centralisation, will be commonplace and we will be the ones developing centralised systems.

Chapter 1
Conventions

1.1 Block structure

A “block” in decentralise2 is represented in two ways, depending on how it is used:

multiple-value-call can be used to convert from the second representation to the first, but there are currently no reasons why that should be useful.

1.2 Set specifiers

The interesting set and uninteresting set for a decentralise2 system are represented as predicates, which usually should not change.

When the interesting set must be changed, the system is notified using a function which takes two set specifiers. A set specifier is either:

The latter is very useful when sets are almost “infinite”, such as in a distributed hash table where the change (from “rehashing” some types of distributed hash tables, or the operator deciding they can commit to storing more objects) can well exceed 2100, and is impractical to represent as a pair of lists.

1.3 Strings

Strings are always encoded using UTF-8 in the Netfarm systems. No buts, no ifs there. The sequence of code points used to form a string must be preserved in encoding and decoding.1

1.4 Integers

Integers are encoded in binary data in big endian format, i.e. such that the most significant octet (or other unit) is written first, and the least significant is written last. This format is most common in networking applications, and is the most convenient for users of human languages that write left-to-right to read.

Octet vectors are written in hexadecimal, with the first octet to be written or read appearing at the leftmost position in a vector, and the last at the right.

For example, 10 01 is the vector consisting of the octets 16 and 1, and can be interpreted as the integer 4097.

Chapter 2
decentralise2

The first component that was developed for Netfarm was the decentralise1 library, which allows a programmer to implement some kind of distributed network by writing the components that they are most interested in, and using provided implementations of the components they are not interested in writing.

A decentralise user may implement one or more of:

In a peer-to-peer context, servers and clients do not exist, as there is no distinction between them. This is still true for decentralise2, as servers do not distinguish between servers, clients, and any other program creating connections and sending and receiving messages. However, the client is written to retrieve specific objects for a user and provide a synchronous interface to them, and a system retrieves many objects asynchronously, and stores them in some kind of database; so it may be argued that there is something that looks like a client, and something that looks like a server.

On the other hand, a server also may handle accepting connections and parsing messages, and thus a system would not be a server, as it does not do those things; so we should stick with the term system to be precise in what a component does.

2.1 Systems

A system handles the requests sent to a node in a distributed system, and sends requests for blocks it has not yet retrieved to synchronise itself. A block is a tuple consisting of a name, version, channel set, and some data; of which the metadata can be used by a system to determine what it must retrieve and update.

system [Protocol Class]

The protocol class for a system. A system is maintained by some maintenance threads. One of which is the leader thread, which runs a leader loop that schedules requests for synchronisation and stops itself when the system is stopped.

TODO: The standard-system should track maintenance threads like it tracks the leader thread, and eventually, this should all be integrated into the bailout thread supervision library.

standard-system [Class]

A system that uses the database and synchronisation protocols to implement the basic behaviour of a node.

echo-system [Class]

A silly system that responds to any messages other than invalid syntax with the message sent.

:id [Initarg]

system-id system [Generic Function]

The ID of a system, as an integer in the range [0,2256)

:acceptors [Initarg]

The acceptors of a system.

2.1.1 Database protocol

The database protocol is part of the system protocol, because a system may process the data sent over the wire into some other format that should be stored and serialised by some other protocol. For example, the Netfarm system implements get-block in terms of netfarm-system:get-vague-object, and provides the rendered vague object and all the object names that affected it via computed values.

It is possible to separate the synchronisation and database protocol implementations, and combine them into a complete system implementation using multiple inheritance when using the reference implementation:

Listing 2.1: Using multiple inheritance to create a system
(defclass foobar-sql-database () 
  ((database-host :initarg :host 
                  :reader foobar-database-host) 
   ...)) 
 
(defclass my-system 
  (decentralise-system:standard-system 
  foobar-sql-database) 
  ()) 
 
(defvar *system* 
 (make-instance my-system 
  ; an initarg for FOOBAR-SQL-DATABASE 
  :host "sql-server.local" 
  ; an initarg for STANDARD-SYSTEM 
  :concurrent-requests 20))

memory-database-mixin [Class]

The simplest database implementation, which stores block data in a concurrent hash table.

not-found [Condition]

A condition type that is raised when get-block or get-block-metadata are called with the name of a block that is not stored by the system.

The following generic functions must be implemented by a database:

get-block system name [Generic Function]

Return block structure for the block named name from the system, or signal a condition of type not-found.

put-block system name version channels data [Generic Function]

Store a block with the provided block structure in the system, or signal a condition of type error.

map-blocks function system [Generic Function]

Call function repeatedly with block structure for the metadata of every object in the system.

get-block-metadata system name [Generic Function]

Return block structure for the metadata of the block named name from the system, or signal a condition of type not-found.

This generic function will be called for each block’s metadata sent to a system, so adding a method which does not have to load a block’s data is highly recommended. Otherwise, for the implementors’ convenience, the following method will be used:

get-block-metadata (system system) name [Method]

The default method for get-block-metadata calls get-block, and returns the metadata from the data it retrieved, but implementations which are even slightly concerned with efficiency should implement a method for get-block-metadata that does not have to read the data for a block.

2.1.2 Synchronisation protocol

When metadata about a block is received, it is categorised into either the interesting set or the uninteresting set, or discarded.

These two sets are represented by predicates, and changes between the sets can be represented in several ways (see Set specifiers for a reference of these set specifiers). The interesting and uninteresting set predicates must usually remain referentially transparent and not change, to allow the system to cache block metadata and schedule requests; but the sets can be modified when the system is notified, using update-system-for-new-interesting-block-predicate.

interesting-block-p system name version channels [Generic Function]

Determine if a block should be requested, based on its metadata.

interesting-block-p (system standard-system) name version channels [Method]

Everything is interesting by default.

uninteresting-block-p system name version channels [Generic Function]

Determine if a block’s metadata should be kept, in case it may become interesting later, due to update-system-for-new-interesting-block-predicate.

uninteresting-block-p (system standard-system) name version channels [Method]

Everything that has a newer version than the system stores is uninteresting to the system.

update-system-for-new-interesting-block-predicate system interesting-set uninteresting-set [Generic Function]

Update the system’s cache of the interesting set, adding every member of the interesting-set to the interesting set and removing every member of the uninteresting-set from it.

2.1.3 Leader loop protocol

The leader loop of a system performs tasks that are not easy to delegate to a connection, such as watching for requests that timed out, and replacing them with new requests.

leader-loop system [Generic Function]

Run the leader loop for the system. This is expected to return only when the system is stopped.

leader-loop (system standard-system) [Method]

The leader loop of a standard-system implements the description we gave of a leader loop.

start-system system [Generic Function]

Start a system, by starting all its acceptors and a leader thread.

stop-system system [Generic Function]

Stop a system by closing all its acceptors and connections, and advising maintenance threads to stop.

system-stopping-p system [Generic Function]

Should the threads maintaining the system stop now?

2.1.4 Tuning the standard-system

:scheduler-timeout [Initarg]

The time (in seconds) that the scheduler should wait for a request to be responded to. This defaults to 10 seconds.

:scheduler-concurrent-requests [Initarg]

The number of requests that the scheduler can wait for at a time. This defaults to 80 requests.

TODO: It may be useful to add a(n optional) limit on how many requests can be sent to one connection at a time.

2.2 Acceptors

An acceptor creates connections by accepting them from something that faces the outside world, such as a server socket, and gives them to a system.

acceptor [Protocol Class]

The protocol class for an acceptor.

threaded-acceptor [Protocol Class]

The protocol class for an acceptor that uses a thread to give a system connections.

2.2.1 Acceptor protocol

give-system-connection system connection [Generic Function]

Give a connection to the system the acceptor accepts connections for.

An acceptor may have to define :after methods on:

start-acceptor acceptor system [Generic Function]

Start the acceptor, allowing it to create connections on behalf of the system.

start-acceptor (acceptor threaded-acceptor) system [Method]

Start a thread that repeatedly accepts connections by calling acceptor-loop.

stop-acceptor acceptor [Generic Function]

stop-acceptor (acceptor acceptor) [Method]

2.2.2 Threaded acceptors

An acceptor must define a method on:

accept-connection (acceptor) [Generic Function]

Return a new connection that should be added to the acceptor’s system.

An acceptor should usually only have to define a primary method on:

acceptor-loop acceptor [Generic Function]

acceptor-loop (acceptor acceptor) [Method]

Forever accept connections and add them to a system.

Socket acceptor

socket-acceptor [Class]

An acceptor that accepts connections from a socket, also wrapping the connections in SSL.

:connection-class [Initarg]

The class to make instances of. This must be a subclass of either character-connection or binary-connection.

:host [Initarg]

:port [Initarg]

The hostname and port to bind to. These default to "0.0.0.0" and 1892, respectively.

:certificate [Initarg]

:key [Initarg]

Pathnames for the certificate and key files.

start-acceptor (acceptor socket-acceptor) system [:Before Method]

This method starts listening on the given hostname and port.

stop-acceptor (acceptor socket-acceptor) system [:Before Method]

This method stops listening on the given hostname and port.

accept-connection (acceptor socket-acceptor) system [Method]

A socket method accepts a connection by accepting a connection from the socket, and instantiating an instance of the acceptor’s connection class.

Connections are instantiated with initargs :socket and :stream, with values for the raw socket and the SSL-wrapped stream, respectively.

2.3 Connections

A connection connects a client and a server together, allowing the two to exchange messages between each other. decentralise2 is able to handle many different types of connections, including in-memory passing connections and socketed connections that use the operating system’s networking capabilities, which only have to implement a simple protocol.


PIC

Figure 2.1: A diagram of provided connection classes.


connection [Protocol Class]

Note that a connection is implicitly started when it is instantiated.

stop-connection connection [Generic Function]

Stop the connection, which should free any resources that creating the connection acquired, such as threads and sockets.

handle-message connection message [Generic Function]

Handle a message that was sent to the connection.

handle-message (connection connection) message [Method]

The default method calls the message handler of the connection with the message.

:message-handler [Initarg]

connection-message-handler connection [Accessor]

A function that should be called with each received message. The default handler will wait for a new handler to be set, in order to (hopefully) prevent some race conditions, where a message could be received before the client has set a message handler.

write-message connection message [Generic Function]

Write a message to a connection. The implementation of this must be thread safe, such that two threads can call write-message simultaneously, and will correctly write both messages.

2.3.1 Connection types

The types of data that a connection can send are represented by a connection’s class, not unlike the stream classes such as fundamental-character-stream and fundamental-binary-stream in the de-facto Gray streams standard.

Note that, unlike Gray streams, all connections are bidirectional.

character-connection [Protocol Class]

A connection that can send blocks with vectors of characters (strings) as data.

binary-connection [Protocol Class]

A connection that can send blocks with vectors of octets as data.

2.3.2 Threaded connections

threaded-connection [Protocol Class]

A connection that uses a thread to read messages.

read-message connection [Generic Function]

Read a message from the connection, blocking until one is present, then returning it and T, or return something and NIL if a message cannot be read ever again (e.g the other node closed the connection).

listener-loop connection [Generic Function]

The thread of the connection will call this function, and stop when it returns.

listener-loop (connection threaded-connection) [Method]

Repeatedly calls read-message, handling its return values appropriately.

passing-connection [Class]

A connection that passes its messages to another connection.

:target [Initarg]

The connection that the passing connection should pass its messages to.

2.3.3 Netfarm wire format connection

netfarm-format-connection [Class]

A character connection that exchanges messages using a simple textual format, based on the wire protocol Nettle used.

As the name suggests, this was intended to be the default connection class for Netfarm; but the creation of Netfarm’s much more efficient binary format suggests that we should use a binary connection class as default.

Nonetheless, it is quite easy to read, and superficially looks like the HTTP/1.0 wire protocol. Messages begin with unique “verbs”, and are separated by new lines.


boolean ::= yes no
integer ::= (0 9)*
length-prefixed-string ::= integer : character*



block-name ::= length-prefixed-string
channel-name ::= length-prefixed-string
id ::= length-prefixed-string
line-count ::= integer
reason ::= length-prefixed-string
uri ::= length-prefixed-string
version ::= integer



announce ::= announce boolean
block-header ::= block block-name version line-count channel-name*
error ::= error block-name reason
get ::= get block-name*
ok ::= ok block-name
node ::= node uri id
subscribe ::= subscribe channel-name*
subscription ::= subscription block-name version channel-name*

Length prefixed strings are preceded by the length of the string (written in base 10), and a colon.
Figure 2.2: An Extended Backus-Naur form-like syntax description of the Netfarm wire format


The base decentralise2 messages are represented as follows:

2.3.4 Netfarm binary connection


boolean ::= 00 01
byte ::= 00 ff
long-integer ::= short-integer × 8
short-integer ::= byte
long-bytes ::= long-integer byte*
short-bytes ::= short-integer byte*
long-string ::= long-bytes
short-string ::= short-bytes
name-list ::= long-integer short-string*



block-name ::= short-string
block-list ::= name-list
channel-list ::= name-list
version ::= long-integer
metadata ::= block-name long-integer channel-list
uri ::= short-string
id ::= short-string



get ::= 01 block-list
character-block ::= 02 metadata long-string
binary-block ::= 03 metadata long-bytes
ok ::= 04 block-name
error ::= 05 block-name long-string
subscribe ::= 06 block-list
subscription ::= 07 metadata
allow-announcement ::= 08 boolean
announce ::= 09 short-string short-string

Byte vectors are preceded by the length of the vector. Strings are byte vectors, that are decoded and encoded as character vectors. Name lists are lists of names, that are preceded by the length of the list.
Figure 2.3: An Extended Backus-Naur form-like syntax description of the Netfarm binary wire format


2.4 Messages

A message is an abstract object that is sent over a connection, and can be considered the lowest-level communication protocol decentralise2 is concerned with.

message-case message &body case* [Macro]

case ::= ( pattern form* )
pattern ::= ( keyword variable* )
dont-care ::= - _ NIL
variable ::= dont-care symbol

Match the value of message against some patterns, which look like the arguments to the function message, and evaluate the forms of the first matching case, with all variables that don’t match the rule dont-care bound to the arguments.

message keyword &rest values* [Function]

Create a message from the type designated by the keyword with the given values. The number of values must be the same as the number of accessors of the type.

2.4.1 Built in message types

get-blocks (:get names) [Message Type]

put-block (:block name version channels data) [Message Type]

ok-response (:ok name) [Message Type]

error-response (:error name reason) [Message Type]

new-subscriptions (:subscription name version channels) [Message Type]

subscription (:subscribe channels) [Message Type]

announcement-control (:allow-announcement allow?) [Message Type]

node-announcement (:announce uri id) [Message Type]

2.4.2 Defining message types

define-message-type keyword type-specifier constructor-name &rest accessors [Macro]

Define a message type named by the given keyword, which corresponds to the given Common Lisp type specifier, can be constructed by calling the function named by the constructor-name, and can be destructured using all the functions named in accessors.

2.4.3 Translators

object->data object target source [Generic Function]

“Serialize” or “render” an object (which came from, or represents an instance of the source), into a value that can be sent as data on the target.

data->object data source target [Generic Function]

“Deserialize” or “parse” the data (which represents an instance of, or something the target can consume), into an object, which was received from the source.

Chapter 3
Netfarm

The second component that was developed was the Netfarm object system. The Netfarm object system introduces a text format for representing an object, as well as code to serialize and deserialize objects.

The part of the Netfarm system that implements an object system is also named Netfarm. It may be helpful to describe this component as the “Netfarm object system”, and the entire system as the “Netfarm software stack” or words to those meanings.

As an example of how Netfarm enables its users to exchange objects with little friction, we will imagine that there are two car enthusiasts, Adrian and Bob, who want to discuss their cars. Here is a class definition for a car that they could have used in a client program:

Listing 3.1: A class definition for a car
(defclass car () 
  ((colour :initarg :colour :reader car-colour) 
   (year   :initarg :year   :reader car-year) 
   (make   :initarg :make   :reader car-make) 
   (model  :initarg :model  :reader car-model)) 
  (:metaclass netfarm:netfarm-class))

This class functions identically to any other class defined using the Common Lisp Object System:

Listing 3.2: Instantiating and describing an instance of a Netfarm class
Adrian> (make-instance car :year 1952 :make "Studebaker" 
                            :model "Starlight" :colour "blue") 
#<CAR> 
Adrian> (defvar *my-car* *) 
*MY-CAR* 
Adrian> (car-model *my-car*) 
"Starlight"

We will discuss how Adrian can send that description of a car to Bob later.

3.1 Objects

The Netfarm object system has objects, which are instances of schemas, which are themselves objects.

This system maps well to the Common Lisp object system, and we can describe schemas using classes that are instances of a special metaclass.

netfarm-class [Metaclass]

The class of a Netfarm class.

object [Class]

The class of a Netfarm object.

A Netfarm object also has a list of signatures, and a source function, that is called to retrieve referenced objects lazily.

:signatures [Initarg]

object-signatures object [Accessor]

The signatures of an object, stored as an association list of user object-signature data pairs.

:source [Initarg]

object-source object [Accessor]

A function that takes an object name, and returns the object with that object name. This function typically requests the object from a client.

3.1.1 Slots

Much like the Common Lisp meta-object protocol, Netfarm objects are represented as vectors of slots; an object is comprised of a vector of “normal” slots, and a vector of computed slots.

Each Netfarm slot maps to a Common Lisp slot, but some Common Lisp slots don’t map to Netfarm slots1.

To avoid some potential non-determinism in how slot lists are ordered when read by Netfarm, slots are sorted by string< on their names in slot vectors.

:netfarm-name [Initarg]

slot-netfarm-name slot-definition [Reader]

The name of a slot, a string or nil if the slot should not be present in the Netfarm object. This defaults to the downcased name of the slot-definition-name of the slot.

Multiple slots of a class cannot have the same name.

3.1.2 Scripts

Three types of scripts describe the behaviour of instances of a schema:

3.1.3 Inheritance

Slots are inherited as usual for Common Lisp.

The presentation script, if not provided, is inherited from the first superclass of the class that has a presentation script. The message script, if not provided, is inherited from the first superclass of the class that has a message script.

The initialization scripts of a class are the union of the direct initialization scripts of a class, and the initialization scripts of the superclasses of the class, with duplicates removed.

3.2 Inbuilt objects

inbuilt@schema[Inbuilt instance of inbuilt@schema]



Slot name

Value



computed-slots

()

documentation

To be defined by the implementor.

scripts

()

slots

(("computed-slots") ("documentation") ("scripts") ("slots"))



inbuilt@user[Inbuilt instance of inbuilt@schema]



Slot name

Value



computed-slots

(("data"))

documentation

To be defined by the implementor.

scripts

(ref:inbuilt@user-script)

slots

(("ecdh-key") ("sign-key"))



inbuilt@script[Inbuilt instance of inbuilt@schema]



Slot name

Value



computed-slots

()

documentation

To be defined by the implementor.

scripts

()

slots

(("entry-points") ("methods") ("program") ("variables"))



inbuilt@user-script[Inbuilt instance of inbuilt@script]



Slot name

Value



entry-points

()

methods

(("add-datum" 2 0))

program

#(7 73 1 12 134 71 16 0 1 0 0 8 2 0 4 1 0 130 8)

variables

("data")



3.3 External formats

Objects are usually serialized in order to transmit them to another client that does not share memory with the origin of the object, and that client must then deserialize the serialized form to produce a usable object.

In Netfarm, serialization is done by converting an object to a vague object, which has as much context of the object removed as possible, which can then be serialized into a character or octet vector. These vectors can then be deserialized by deserializing the vector into a vague object, and then re-introducing the context that the origin of the object supplied.

The octet-vector representation is also used for generating and verifying signatures.

3.3.1 Vague objects

vague-object [Class]

The class of a vague object.

apply-class vague-object class [Function]

Recontextualize a vague object by “applying” a class to it, producing an object.

3.3.2 Character format

render value &optional stream [Function]

Render a value. If a stream is given, then write the value to that stream, else return the rendered value as a string.

render-object object &optional stream [Function]

Render an object. If a stream is given, then write the object to that stream, else return the rendered object as a string.

parse stream-or-string [Function]

Parse a value from a stream or a string.

parse-block string &key source name [Function]

Parse an object from a string. The returned vague object has the supplied source and name.

3.3.3 Binary format


byte ::= 00 ff
byte-count ::= byte



integer ::= byte-count byte*
length ::= integer



string ::= 01 length byte*
byte-vector ::= 02 length byte*
positive-integer ::= 03 integer
negative-integer ::= 04 integer
list ::= 05 length value*
reference ::= 06 length byte*
true ::= 07
false ::= 08
slot-unbound ::= 09



value ::= string byte-vector positive-integer
negative-integer list
reference true false
slot ::= value slot-unbound
slot-vector ::= length slot*



metadata-element ::= string value
metadata ::= length metadata-element*



slots ::= slot-vector
computed-slots ::= slot-vector
object ::= metadata slots computed-slots

Strings, byte vectors, slot vectors and references have as many bytes as the length represents, lists have as many values as the length represents, and metadata has as many elements as the length represents.

The integer 0 must be represented as a positive-integer, and integers must be represented with as few bytes as possible; for example, 12,345 must be represented as 03 02 30 39, and not 03 03 00 30 39 or with more zero bytes.

Figure 3.1: An Extended Backus-Naur form-like syntax description of the Netfarm object format


binary-render value &optional function [Function]

Render a value. If a function is given, then write the value by calling it with octet vectors that it should write, else return the rendered value as an octet vector.

binary-render-object object &key function (emit-computed-values t) (emit-signatures t) [Function]

Render an object. If a function is given, then write the value by calling it with octet vectors that it should write, else return the rendered value as an octet vector.

emit-computed-values and emit-signatures control if computed values and signatures will be rendered, respectively. The resulting vector will not be parseable if computed values and/or signatures are not rendered. These arguments are present to facilitate hashing and signing, and usually should be left to their defaults.

binary-parse function-or-vector [Function]

Parse a value from a function (called with no arguments, which returns octets) or a vector.

binary-parse-block function-or-vector &key source name [Function]

Parse an object from a function (as previously described for binary-parse) or a vector.

3.4 Cryptographic operations

3.4.1 Hashing

Netfarm uses cryptographic hashes to “name” objects, so that they can be referenced and retrieved by clients that do not already have them stored. Hashes are also used in the process of signing objects.

An object is hashed by rendering it in the binary format, and then hashing the resulting octet vector using the SHA-256 hash algorithm. Usually, computed values are not included in the render, as computed values must not affect object identity. Furthermore, when hashes are used in signing objects, signatures are not included, as they would need to be already present in the object to be signed, or break identity (by changing the resultant hash, or by allowing signatures to be added or removed to an object while preserving hash value).

hash-object object &key (emit-signatures t) [Function]

Produce a binary hash. Note that inbuilt objects cannot be hashed, and thus hashing them will signal an error. emit-signatures will control if signatures are included in the hash.

The most obvious inbuilt object that could not possibly be hashed is inbuilt@schema, which is its own schema. Of course, you could hash it by referencing its inbuilt name and not its hash, but the renderer would have to have a special case to use inbuilt objects’ names instead of the result of hash-object*, which felt worse than disallowing hashing inbuilt objects somehow.

hash-object* object [Function]

Produce a textual hash, equivalent to (bytes->base64 (hash-object object)) for non-inbuilt objects, and in the form inbuilt@name for inbuilt objects.

with-hash-cache () &body body [Macro]

Cache hashes produced with both hashing functions in the body.

Hashing can result in accidentally quadratic runtime2 when traversing graphs of objects and hashing them (such as saving a graph of objects with the Netfarm client). with-hash-cache will reduce hashing operations that have already been done (usually by recursive calls between hashing and the renderer) to hash-table lookups, reducing that kind of pattern to roughly linear runtime.


PIC

Figure 3.2: Hashing a graph of objects naively requires O(n2) hashes


Hashes used in signing are currently not hashed (TODO: though they probably should be.)

If objects that have been hashed are modified, then the results of hash-object and hash-object* are undefined; but the dynamic extent of with-hash-cache should allow for fairly easy analysis of where hashes can be cached.

Object hashes are stored in a weak hash table, so they will be removed when the objects are garbage, and the garbage collector runs.

with-lexical-hash-cache () &body body [Macro]

Bind the hash caches in a way that allows for them to be closed over, and restored using the local macro with-restored-lexical-hash-cache. This is intended to be used to share cached hashes between threads; which takes some more deliberation to ensure correctness, but can be beneficial.

The Netfarm server uses a hash cache shared between all threads; and this is safe as objects are never mutated inside the server.

with-restored-lexical-hash-cache () &body body [Macro]

Restore the hash caches from with-lexical-hash-cache.

3.4.2 Signing

3.5 The client

The netfarm-client system provides functions to read and write Netfarm objects from a decentralise client, as well as a client class which can search through a network for an object.

Continuing our car example, Adrian can create a client and use it to save his car:

Listing 3.3: Creating a client and saving an object using it
Adrian> (defvar *client* 
                (make-instance netfarm-client:client 
                               :bootstrap-uris ’("netfarm:a-server"))) 
*CLIENT* 
Adrian> (netfarm-client:save-object *client* *my-car*) 
"Lb6TI6j/GdkLsyCqt20xZhz7PohttpXejZTCKQEjT58="

save-object returns the hash of the object, which can be used by Bob to retrieve the object. Having to pass around a hash is not very convienent, and we can attach our car to some other object Bob may already know about, but we will begin by using this hash to retrieve the car.

Listing 3.4: Retrieving an object using its hash
Bob> (netfarm-client:find-object *client* 
      "Lb6TI6j/GdkLsyCqt20xZhz7PohttpXejZTCKQEjT58=") 
#<CAR> 
Bob> (describe *) 
#<CAR> 
  [standard-object] 
 
Slots with :INSTANCE allocation: 
  ... 
  COLOUR                         = "blue" 
  YEAR                           = 1952 
  MAKE                           = "Studebaker" 
  MODEL                          = "Starlight"

Chapter 4
Netfarm script machine

While it is a component that is packaged with the Netfarm object system, the Netfarm script machine is a sufficiently large and complex component that it deserves its own chapter. Scripts are portable programs that a Netfarm node can run to handle the behaviour of objects.

The script machine is based off the abstract SECD machine, which may be familiar to functional language compiler writers. The machine has several registers:

A simple assembler is provided with the reference Netfarm implementation, which can compile programs such as:

Listing 4.1: A recursive program that computes a Fibonacci number
(netfarm-scripts:define-script *fib* (1 2) 
  (:procedure 1) 
  (get-env 0 0) (get-value 1) < 
  (jump-cond 0 0 0 4) 
  (get-env 0 0) return 
  (get-env 0 0) (get-value 0) - (get-proc* 0) (call 1) 
  (get-env 0 0) (get-value 1) - (get-proc* 0) (call 1) 
  + return)

While this (symbolic) assembly code looks much different to a program written in a high-level language (and also unlike assembly code for real processors), it is similar to the bytecode generated by CLISP for Common Lisp programs, and CPython for Python programs.

This example program doesn’t really have a purpose in a Netfarm system. Scripts in Netfarm have three purposes (currently):

4.1 Using the script machine

setup-interpreter script arguments &optional capabilities [Function]

run-interpreter interpreter &key cons-limit cycle-limit print-cycles step [Function]

run-script script &rest arguments [Function]

TODO: Add a dynamic variable holding the capabilities the script machine should have.

run-script-machines object &key apply-side-effects recipient-filter [Function]

4.2 Presentations

TODO: This entire section.

Presentations are the other use of scripts in Netfarm. A presentation is a graphical representation of an object that retains its identity. A presentation model allows Netfarm to exchange machine-readable objects between clients, while also being able to present human-readable descriptions of objects. This model was introduced by [Cic84], and has been used in the Common Lisp Interface Manager to great effect.

Presentations are generated by presentation methods that are implemented using the script machine.

4.3 Side effects

4.4 The script machine

The script machine is a small stack machine, based off the SECD machine, with additional registers for convenience and to facilitate calling between multiple scripts, and a more common bytecode-based instruction format (rather than S-expression based).

4.4.1 Structures

The script machine has these registers:




Variable Name

Purpose




S Store

A stack containing intermediate data.

E Environment

The lexical environment, represented as a list of vector “frames”.

P Program counter

The position the interpreter is reading instructions from.

D Dump

A stack telling the interpreter where to return to.




G Globals

A vector of global variables.

C Control

The script the interpreter is current reading instructions from.

A Capabilities

Flags that allow the machine to perform operations that only make sense in some situations.

O Self

The current object, or the empty list, if we have not called a method.

O Sender

The last object, which was the value of O before calling a method.




A script has these values:




Variable Name

Purpose




P Program

An octet vector that encodes instructions to execute.

E Entry points

A vector of descriptions for procedures, which are combined with environments by some instructions to produce procedures.

M Methods

A vector of descriptions for methods.

G Initial globals

A vector of the initial values of global variables.




4.4.2 Notation

We write the state of a running interpreter as the tuple S,E,P,D. We sometimes reference the registers G, C, A, O and Ooutside the tuple, because they are seldom used by instructions, and so they are just tedious to write most of the time.

We write the states of a halted interpreter as halt[S], should it halt correctly, or error[message, cause] should it fail. (Only the data stack is preserved when an interpreter stops.)

We write the effects of interpreting an instruction as

old state new state

Should an instruction concern itself with only the Store, we may write effects in the form

i1imo1on

(borrowing from Forth conventions) which can be translated to the other notation as

(imi1 . S),E,P,D⟩ → ⟨(ono1 . S),E,P + length,D

where length is the total length of the instruction. Note that the equivalent transition has the stack prefixes “reversed”; the last input is popped first, and the first output is pushed first. In Forth, we may write a program 3 2 - (1) and the stack before evaluating - as a list would be (2 3). To maintain a natural argument order for the programmer, the machine handles arguments “backwards” when described using lists.

To access a value of a structure without destructuring it, we use the notation SA to get the value of the variable (slot) A of S. We also use subscripts to retrieve values from lists and vectors; Sn denotes the nth value in S. The first element of a sequence is retrieved with a zero subscript, as with most programming languages; and retrieving an element that would be past the last element of a sequence causes interpretation to fail.

4.4.3 Environment

getenv[(F._),variable,0] = Fvariable
getenv[(_.E),variable,frame] = getenv[E,variable,frame 1]

1: get-proc n [Instruction]

S,E,P,D⟩ → ⟨(proc[P,E,P0,A] . S),E,P + 2,D

where (P0 A) = (CE)n

This closes over the current environment, whereas get-proc* does not.

2: get-value n [Instruction]

S,E,P,D⟩ → ⟨(Gn . S),E,P + 2,D

3: set-value! n [Instruction]

(value.S),E,P,D⟩ → ⟨S,E,P + 2,D

where Gn value

4: get-env variable frame [Instruction]

S,E,P,D⟩ → ⟨S,E,P + 3,(value.D)

where value = getenv[E,variable,frame]

5: set-env! variable frame [Instruction]

(value . S),E,P,D⟩ → ⟨S,E,P + 3,D

where getenv[E,variable,frame] value

6: get-proc* n [Instruction]

S,E,P,D⟩ → ⟨(proc[P,(),P0,A] . S),E,P + 2,D

where (P0 A) = (CE)n

11: byte n [Instruction]

— n

4.4.4 Control flow

8: return [Instruction]

S,E,P,()⟩ → halt[S]

S,E,P,((E.C).D)⟩ → ⟨S,E,C,D

9: call n [Instruction]

(proc[C,E,count] AnA1 . S),E,P,D⟩ → S,((A1An) . E),C,((E . P + 2) . D)

when count = n

Note that the return address must be placed at the instruction after the call instruction.

10: tail-call n [Instruction]

(proc[C,E] AnA1 . S),E,P,D⟩ → ⟨S,((A1An) . E),C,D

16: jump-cond thenhigh thenlow elsehigh elselow [Instruction]

(#f.S),E,P,D⟩ → ⟨S,E,P + 5 + else,D

where else = elsehigh × 256 + elselow

(v.S),E,P,D⟩ → ⟨S,E,P + 5 + then,D

where then = thenhigh × 256 + thenlow,v#f

Jumps are relative to the end of the instruction, so jumping 0 bytes basically does nothing.

17: jump high low [Instruction]

S,E,P,D⟩ → ⟨S,E,P + 3 + high × 256 + low,D

24: error [Instruction]

(cause message.S),E,P,D⟩ → error[message, cause]

4.4.5 Forthisms

34: drop [Instruction]

x —

35: dup [Instruction]

x — x x

36: swap [Instruction]

x y — y x

4.4.6 Operators

64: cons [Instruction]

car cdr — (car . cdr)

cdr must be a list; we can’t represent improper lists in Netfarm, so we don’t give the script machine improper lists.

65: car [Instruction]

(car . _) — car

66: cdr [Instruction]

(_ . cdr) — cdr

67: null [Instruction]

() — #t

_ — #f

68: consp [Instruction]

(_ . _) — #t

_ — #f

69: append [Instruction]

a b — a++b

71: equal [Instruction]

a b — equal[a,b]

equal[(aa . ad),(ba . bd)] = equal[aa,ba] equal[ad,bd]
equal[(),()] = #t
equal[a Integer,b Integer] = a = b
equal[a String,b String] = a = b
equal[a Object,b Object] = hash[a] = hash[b]
equal[proc[aa,ae,ac],proc[ba,be,bc]] = aa = ba ae = be ac = bc
equal[_, _] = #f

72: string= [Instruction]

a b — equal[a,b]

when a String,b String

73: list n [Instruction]

e1en(e1en)

96: + [Instruction]

a b — a+b

97: - [Instruction]

a b — a-b

98: * [Instruction]

a b — a × b

99: / [Instruction]

a b — ab

The results of division are rounded to the closest integer below the actual value; i.e -5 2 / ⇒−3

100: abs [Instruction]

a — |a|

112: = [Instruction]

a b — a = b

when a Integer,b Integer

Chapter 5
The server

The Netfarm server is a decentralise2 system, which retrieves a partition of all the objects available to it, and makes them available to its connections.

In the introduction to decentralise2, we argued that the terms “client” and “server” do not have enough meaning to be useful when discussing Netfarm software; but we ended up calling this module a “server”.

netfarm-system [Class]

The class of a Netfarm server.

5.1 Database protocol

5.1.1 Implementations of the database protocol

memory-system [Class]

A system that stores everything in memory. Compare this to decentralise2’s memory-database-mixin.

filesystem-mixin [Class]

A mixin that stores objects in a filesystem.

Objects are stored in a “trie” constructed of directories, to avoid slowdowns that occur on some file systems, should one directory contain many files. For each object, three files are stored:

Some files are also stored in the storage directory:

:directory [Initarg]

The directory that objects should be stored in.

caching-mixin [Class]

A mixin that caches some objects in memory, but defers to another database implementation when it does not have an object stored.

TODO: Somehow, including caching makes the system slower. There already is a cache of real objects (rather than vague objects) in the system (in the form of a weak hash table, and thus gets GCed and doesn’t hold objects that aren’t live somewhere else for long), but another cache does significantly reduce the reads in some tests. Perhaps these tests are too write-heavy, and the cache isn’t really able to do much, or perhaps we’re doing something wrong with cacle.

:cache-size [Initarg]

The maximum rendered size of objects that can be cached at once, in bytes. This defaults to 100 megabytes (108 bytes).

:fallback-vague-object-size [Initarg]

The size that vague objects should be assumed to take when rendered, should the real database not provide a size.

5.1.2 Vague object storage

put-vague-object system vague-object [Generic Function]

get-vague-object system name [Generic Function]

map-vague-objects function system [Generic Function]

vague-object-stored-p system name [Generic Function]

count-vague-objects system [Generic Function]

5.1.3 Presentable set

presentable-name-p system name [Accessor]

count-vague-objects system [Generic Function]

5.1.4 Side effects

apply-side-effect system name cause side-effect-type &key  [Generic Function]

map-computed-values-caused-by function cause-name system [Generic Function]

objects-affected-by system name [Generic Function]

5.1.5 Other interesting set

add-other-interesting-block system name [Generic Function]

remove-other-interesting-block system name [Generic Function]

other-interesting-block-p system name [Generic Function]

5.1.6 Dependency graph

add-dependency-vertex system dependent dependency [Generic Function]

remove-dependents system dependency [Generic Function]

map-dependents function system dependency [Generic Function]

map-dependencies function system dependencies [Generic Function]

no-dependencies-p system name [Generic Function]

Appendix A
A description of a voting-web system

We have not thoroughly tested this voting-web system, but we should describe it, because it is a very useful but infrequently applied technique for moderating an online system. This system may be seen as an extension of many Usenet clients’ kill lists, which contain a set of patterns that match posts the user does not want to see; but it can also be used as a whitelist, or just to rank some objects in some order.

It is very important that a distributed system has a means of distributed “moderation”. We wonder why distributed systems are built without distributed moderation of some form, and we believe that using centralised moderation on such a system would be an equivalent of the “people’s stick” described in [Bak73]. Unlike some kinds of moderation in the real world, we are not required to give computers some kind of “objective” truth, and we do not attempt that with Netfarm. Freeing ourselves from that obligation allows us to approximate the desires of each user with greater precision.

This system allows a user to maintain a map of “facts” that they assert to be true or false. A presentation script may query the voting-web system while rendering an object, to decide what related objects it should render, or a client program may just refuse to render some presentation forms of an object that has certain facts about it asserted.

In the absence of a fact asserted by the user, we believe it is possible to determine the probability that the user would assert the fact, as a function of some delegates’ assertions. Suppose Pr(F is true to U) is the probability that a user U takes the fact F to be true, and that delegates(U) denotes the set of delegates of the user U. We can define the probability to be:

Pr(F is true to U) = 1 (if U asserts F to be true)
Pr(F is true to U) = 0 (if U asserts F to be false)
Pr(F is true to U) =     ∑      P r(F is true to D)correlation(U,D)
D ∈delegates(U)
------------∑----------------------------
                   |correlation(U,D )|
        D∈delegates(U )

We also introduced a function correlation(U,D) which estimates the correlation coefficient between two users, which we have appropriated from the memory-based collaborative filtering techniques used in recommender systems. This could be defined as:

correlation(U,D) =        ∑   support(U,f )support(D, f)
     F ∈facts
∘--∑---------------∘---∑-----------------
       support(U,F )2       support(D, F)2
 F ∈facts              F ∈facts
where support(U,F) = (|1    (if U asserts F to be true)
{
|(− 1  (if U asserts F to be false)
 0    (if U does not assert F )
and facts = the rules U asserts the rules D asserts

In the absence of a large number of common facts, setting the correlation coefficient to be a small constant may yield results that are not very good, but are better than not attempting to calculate a probability.

It may also be wise to bias the correlation and resultant probabilities towards zero, to produce less confident values for fewer common facts, and fewer delegates, respectively. This can be done by adding a small positive integer to the denominators of our definitions of correlation(U,D) and Pr(F is true to U).

Appendix B
Index

Index

:acceptors Initarg, 1
:cache-size Initarg, 2
:certificate Initarg, 3
:connection-class Initarg, 4
:directory Initarg, 5
:fallback-vague-object-size Initarg, 6
:host Initarg, 7
:id Initarg, 8
:key Initarg, 9
:message-handler Initarg, 10
:netfarm-name Initarg, 11
:port Initarg, 12
:scheduler-concurrent-requests Initarg, 13
:scheduler-timeout Initarg, 14
:signatures Initarg, 15
:source Initarg, 16
:target Initarg, 17
accept-connection Generic Function, 18
accept-connection Method, 19
acceptor-loop Generic Function, 20
acceptor-loop Method, 21
acceptor Protocol Class, 22
add-dependency-vertex Generic Function, 23
add-other-interesting-block Generic Function, 24
announcement-control Message Type, 25
apply-class Function, 26
apply-side-effect Generic Function, 27
binary-connection Protocol Class, 28
binary-parse-block Function, 29
binary-parse Function, 30
binary-render-object Function, 31
binary-render Function, 32
caching-mixin Class, 33
character-connection Protocol Class, 34
connection-message-handler Accessor, 35
connection Protocol Class, 36
count-vague-objects Generic Function, 37, 38
data->object Generic Function, 39
define-message-type Macro, 40
echo-system Class, 41
error-response Message Type, 42
filesystem-mixin Class, 43
get-block-metadata Generic Function, 44
get-block-metadata Method, 45
get-blocks Message Type, 46
get-block Generic Function, 47
get-vague-object Generic Function, 48
give-system-connection Generic Function, 49
handle-message Generic Function, 50
handle-message Method, 51
hash-object* Function, 52
hash-object Function, 53
inbuilt@schema Inbuilt object, 54
inbuilt@script Inbuilt object, 55
inbuilt@user-script Inbuilt object, 56
inbuilt@user Inbuilt object, 57
interesting-block-p Generic Function, 58
interesting-block-p Method, 59
leader-loop Generic Function, 60
leader-loop Method, 61
listener-loop Generic Function, 62
listener-loop Method, 63
map-blocks Generic Function, 64
map-computed-values-caused-by Generic Function, 65
map-dependencies Generic Function, 66
map-dependents Generic Function, 67
map-vague-objects Generic Function, 68
memory-database-mixin Class, 69
memory-system Class, 70
message-case Macro, 71
message Function, 72
netfarm-class Metaclass, 73
netfarm-format-connection Class, 74
netfarm-system Class, 75
new-subscriptions Message Type, 76
no-dependencies-p Generic Function, 77
node-announcement Message Type, 78
not-found Condition, 79
object->data Generic Function, 80
object-signatures Accessor, 81
object-source Accessor, 82
objects-affected-by Generic Function, 83
object Class, 84
ok-response Message Type, 85
other-interesting-block-p Generic Function, 86
parse-block Function, 87
parse Function, 88
passing-connection Class, 89
presentable-name-p Accessor, 90
put-block Generic Function, 91
put-block Message Type, 92
put-vague-object Generic Function, 93
read-message Generic Function, 94
remove-dependents Generic Function, 95
remove-other-interesting-block Generic Function, 96
render-object Function, 97
render Function, 98
run-interpreter Function, 99
run-script-machines Function, 100
run-script Function, 101
setup-interpreter Function, 102
slot-netfarm-name Reader, 103
socket-acceptor Class, 104
standard-system Class, 105
start-acceptor :Before Method, 106
start-acceptor Generic Function, 107
start-acceptor Method, 108
start-system Generic Function, 109
stop-acceptor :Before Method, 110
stop-acceptor Generic Function, 111
stop-acceptor Method, 112
stop-connection Generic Function, 113
stop-system Generic Function, 114
subscription Message Type, 115
system-id Generic Function, 116
system-stopping-p Generic Function, 117
system Protocol Class, 118
threaded-acceptor Protocol Class, 119
threaded-connection Protocol Class, 120
uninteresting-block-p Generic Function, 121
uninteresting-block-p Method, 122
update-system-for-new-interesting-block-predicate Generic Function, 123
vague-object-stored-p Generic Function, 124
vague-object Class, 125
with-hash-cache Macro, 126
with-lexical-hash-cache Macro, 127
with-restored-lexical-hash-cache Macro, 128
write-message Generic Function, 129

acceptor, 130

block, 131

connection, 132

deserialize, 133

hashes, 134

Initialization scripts, 135
interesting set, 136

leader loop, 137
leader thread, 138

maintenance threads, 139
message, 140
Message scripts, 141
metadata, 142

Netfarm object system, 143
Netfarm server, 144

presentation, 145, 146, 147
presentation methods, 148
Presentation scripts, 149

script machine, 150
Scripts, 151
SECD machine, 152
serialized, 153
set specifiers, 154
source, 155
system, 156

uninteresting set, 157

vague object, 158

Listings

Bibliography

[Bak73]   Mikhail Bakunin. Statism and anarchy, 1873.

[Cic84]    Eugene C. Ciccarelli. Presentation based user interface. https://dspace.mit.edu/handle/1721.1/6946, 1984.

[Eth19]   Ethereum developers. Sharding faq. https://github.com/ethereum/wiki/wiki/Sharding-FAQ, 2019.

[Rou17]   Indhi Rousseau. Mastodon instances. https://instances.social/list/advanced, 2017.

1currently reachable on Instagram at https://www.instagram.com/luke_is_frog/

1I believe the Common Lisp babel library preserves code points, as the encoding and decoding functions are quite simple, but some guarantees on how code points are treated would be nice.

1We do need a more unique and less vague name for this...

1This may, for example, be useful to separate internal and external representations; it would be wise to represent a mapping of objects as a hash table in Lisp, but provide an association list for Netfarm to work with.

2We don’t want to end up on https://accidentallyquadratic.tumblr.com/!