Netfarm is an attempt at creating a distributed, trustless object system. We develop Netfarm to ﬁll a perceived void in the rapidly-expanding “decentralisation” bubble, to utilise the greater fault tolerance of a distributed hash table and simplify programming via a client language’s meta object protocol, and to devise less hierarchical moderation techniques that do not leave them at the whim of their servers’ operators and administrators.
The design of Netfarm is not ideal due to historical and planning issues; but it should serve well as an introduction to trustless object systems, which we believe can be easier to program and have better throughput than some of the models that are used frequently today, such as blockchains (demonstrated to be usable for programming in Ethereum) and various federated protocols (such as ActivityPub, notably used in Mastodon among other “fediverse” blogging servers).
To summarise the design, Netfarm implements a synchronous message-passing object system (like Smalltalk or Self) hosted on a distributed hash table. A Netfarm network can reach eventual consistency as state is either so unlikely to collide, such as objects which are stored by hashes, which are probably less likely to collide than for a computer to incorrectly compute a hash; or it is simple to unify, such as computed values. However, Netfarm does not decide on an ordering of computed values, and so has fairly weak consistency guarantees.
Some of these points are made redundant by the preceding paragraphs, but it may be worthwhile to explain why we made those claims, and why they are desirable.
It is also not uncommon for federated systems to have clustering problems. If the developers of a system run a server in that system, it is likely they will encourage users to use their server, or not encourage users to use other servers. In one egregious system, the developers sell servers for their network. It’s likely that they’re all hosted in the same data centres; what happens if something there goes down? Ouch.
If one is unhappy with the way their server is moderated, some ever-so-wise users remind them they are “free” to leave for another, which is incredibly impolite. We feel obligated to state that it’s analogous to "suggesting" that one should leave their country if they are not happy with how it is governed, which is usually considered by them to be a very lousy argument, and one we consider to be similar in sentiment to theirs (except that moving country is often harder than moving server).
The questions of node operation and moderation in Netfarm are fully separated, allowing users to decide with case-by-case granularity on what content they want to see, and to aggregate their peers’ decisions with various techniques.
We see the Netfarm system as a synthesis of various components:
However, a user should be able to modify and use any component separately, and should have extensive control over many aspects of the system, to implement any changes while still acting as a “client” of the system, that does not modify the system itself.
We would like to thank Luke Nuttall1 for providing a drawing of the Netfarm mascot Demeter.
We would also like to thank Robert Strandh for providing advice on optimising the Netfarm script machine and how Netfarm should interact with client code, and Gilbert Baumann for also suggesting optimisations for the script machine and several stylistic concerns, among other many other insightful discussions.
The boundaries that separate a Netfarm implementation and its host language are not yet clear. Several questions remain, such as how much a Netfarm node must know about the data it is verifying, and what it is expected to do for its clients.
We forsee that a new distributed object system will have to be created, to overcome ﬂaws that were introduced to Netfarm through its age and attempts to persue multiple goals. One such problem was that we chose the means of interaction with the network to be sending and receiving data, which creates an awkward split in mechanisms, between the creation of an object and objects sending messages to each other.
As such, we would like to remind the reader that they should take care to form their own opinions on how such a system should be structured from any problems they have noticed, and that Netfarm is not the end-all of distributed object systems. We would hope for something like Pierre-Joseph Proudhon’s dream in this quotation:
Je rêve une société où je serai guillotiné comme conservateur.
(I dream of a world in which I would be executed as a reactionary.)
A “block” in decentralise2 is represented in two ways, depending on how it is used:
multiple-value-call can be used to convert from the second representation to the ﬁrst, but there are currently no reasons why that should be useful.
The interesting set and uninteresting set for a decentralise2 system are represented as predicates, which usually should not change.
When the interesting set must be changed, the system is notiﬁed using a function which takes two set speciﬁers. A set speciﬁer is either:
The latter is very useful when sets are almost “inﬁnite”, such as in a distributed hash table where the change (from “rehashing” some types of distributed hash tables, or the operator deciding they can commit to storing more objects) can well exceed 2100, and is impractical to represent as a pair of lists.
Strings are always encoded using UTF-8 in the Netfarm systems. No buts, no ifs there. The sequence of code points used to form a string must be preserved in encoding and decoding.1
Integers are encoded in binary data in big endian format, i.e. such that the most signiﬁcant octet (or other unit) is written ﬁrst, and the least signiﬁcant is written last. This format is most common in networking applications, and is the most convenient for users of human languages that write left-to-right to read.
Octet vectors are written in hexadecimal, with the ﬁrst octet to be written or read appearing at the leftmost position in a vector, and the last at the right.
For example, 10 01 is the vector consisting of the octets 16 and 1, and can be interpreted as the integer 4097.
The ﬁrst component that was developed for Netfarm was the decentralise1 library, which allows a programmer to implement some kind of distributed network by writing the components that they are most interested in, and using provided implementations of the components they are not interested in writing.
A decentralise user may implement one or more of:
In a peer-to-peer context, servers and clients do not exist, as there is no distinction between them. This is still true for decentralise2, as servers do not distinguish between servers, clients, and any other program creating connections and sending and receiving messages. However, the client is written to retrieve speciﬁc objects for a user and provide a synchronous interface to them, and a system retrieves many objects asynchronously, and stores them in some kind of database; so it may be argued that there is something that looks like a client, and something that looks like a server.
On the other hand, a server also may handle accepting connections and parsing messages, and thus a system would not be a server, as it does not do those things; so we should stick with the term system to be precise in what a component does.
A system handles the requests sent to a node in a distributed system, and sends requests for blocks it has not yet retrieved to synchronise itself. A block is a tuple consisting of a name, version, channel set, and some data; of which the metadata can be used by a system to determine what it must retrieve and update.
⇒system [Protocol Class]
The protocol class for a system. A system is maintained by some maintenance threads. One of which is the leader thread, which runs a leader loop that schedules requests for synchronisation and stops itself when the system is stopped.
TODO: The standard-system should track maintenance threads like it tracks the leader thread, and eventually, this should all be integrated into the bailout thread supervision library.
A system that uses the database and synchronisation protocols to implement the basic behaviour of a node.
A silly system that responds to any messages other than invalid syntax with the message sent.
⇒system-id system [Generic Function]
The ID of a system, as an integer in the range [0,2256)
The acceptors of a system.
The database protocol is part of the system protocol, because a system may process the data sent over the wire into some other format that should be stored and serialised by some other protocol. For example, the Netfarm system implements get-block in terms of netfarm-system:get-vague-object, and provides the rendered vague object and all the object names that aﬀected it via computed values.
It is possible to separate the synchronisation and database protocol implementations,
and combine them into a complete system implementation using multiple inheritance
when using the reference implementation:
The simplest database implementation, which stores block data in a concurrent hash table.
A condition type that is raised when get-block or get-block-metadata are called with the name of a block that is not stored by the system.
The following generic functions must be implemented by a database:
⇒get-block system name [Generic Function]
Return block structure for the block named name from the system, or signal a condition of type not-found.
⇒put-block system name version channels data [Generic Function]
Store a block with the provided block structure in the system, or signal a condition of type error.
⇒map-blocks function system [Generic Function]
Call function repeatedly with block structure for the metadata of every object in the system.
⇒get-block-metadata system name [Generic Function]
Return block structure for the metadata of the block named name from the system, or signal a condition of type not-found.
This generic function will be called for each block’s metadata sent to a system, so adding a method which does not have to load a block’s data is highly recommended. Otherwise, for the implementors’ convenience, the following method will be used:
⇒get-block-metadata (system system) name [Method]
The default method for get-block-metadata calls get-block, and returns the metadata from the data it retrieved, but implementations which are even slightly concerned with eﬃciency should implement a method for get-block-metadata that does not have to read the data for a block.
When metadata about a block is received, it is categorised into either the interesting set or the uninteresting set, or discarded.
These two sets are represented by predicates, and changes between the sets can be represented in several ways (see Set speciﬁers for a reference of these set speciﬁers). The interesting and uninteresting set predicates must usually remain referentially transparent and not change, to allow the system to cache block metadata and schedule requests; but the sets can be modiﬁed when the system is notiﬁed, using update-system-for-new-interesting-block-predicate.
⇒interesting-block-p system name version channels [Generic Function]
Determine if a block should be requested, based on its metadata.
⇒interesting-block-p (system standard-system) name version channels [Method]
Everything is interesting by default.
⇒uninteresting-block-p system name version channels [Generic Function]
Determine if a block’s metadata should be kept, in case it may become interesting later, due to update-system-for-new-interesting-block-predicate.
⇒uninteresting-block-p (system standard-system) name version channels [Method]
Everything that has a newer version than the system stores is uninteresting to the system.
⇒update-system-for-new-interesting-block-predicate system interesting-set uninteresting-set [Generic Function]
Update the system’s cache of the interesting set, adding every member of the interesting-set to the interesting set and removing every member of the uninteresting-set from it.
The leader loop of a system performs tasks that are not easy to delegate to a connection, such as watching for requests that timed out, and replacing them with new requests.
⇒leader-loop system [Generic Function]
Run the leader loop for the system. This is expected to return only when the system is stopped.
⇒leader-loop (system standard-system) [Method]
The leader loop of a standard-system implements the description we gave of a leader loop.
⇒start-system system [Generic Function]
Start a system, by starting all its acceptors and a leader thread.
⇒stop-system system [Generic Function]
Stop a system by closing all its acceptors and connections, and advising maintenance threads to stop.
⇒system-stopping-p system [Generic Function]
Should the threads maintaining the system stop now?
The time (in seconds) that the scheduler should wait for a request to be responded to. This defaults to 10 seconds.
The number of requests that the scheduler can wait for at a time. This defaults to 80 requests.
TODO: It may be useful to add a(n optional) limit on how many requests can be sent to one connection at a time.
An acceptor creates connections by accepting them from something that faces the outside world, such as a server socket, and gives them to a system.
⇒acceptor [Protocol Class]
The protocol class for an acceptor.
⇒threaded-acceptor [Protocol Class]
The protocol class for an acceptor that uses a thread to give a system connections.
⇒give-system-connection system connection [Generic Function]
Give a connection to the system the acceptor accepts connections for.
An acceptor may have to deﬁne :after methods on:
⇒start-acceptor acceptor system [Generic Function]
Start the acceptor, allowing it to create connections on behalf of the system.
⇒start-acceptor (acceptor threaded-acceptor) system [Method]
Start a thread that repeatedly accepts connections by calling acceptor-loop.
⇒stop-acceptor acceptor [Generic Function]
⇒stop-acceptor (acceptor acceptor) [Method]
An acceptor must deﬁne a method on:
⇒accept-connection (acceptor) [Generic Function]
Return a new connection that should be added to the acceptor’s system.
An acceptor should usually only have to deﬁne a primary method on:
⇒acceptor-loop acceptor [Generic Function]
⇒acceptor-loop (acceptor acceptor) [Method]
Forever accept connections and add them to a system.
An acceptor that accepts connections from a socket, also wrapping the connections in SSL.
The class to make instances of. This must be a subclass of either character-connection or binary-connection.
The hostname and port to bind to. These default to "0.0.0.0" and 1892, respectively.
Pathnames for the certiﬁcate and key ﬁles.
⇒start-acceptor (acceptor socket-acceptor) system [:Before Method]
This method starts listening on the given hostname and port.
⇒stop-acceptor (acceptor socket-acceptor) system [:Before Method]
This method stops listening on the given hostname and port.
⇒accept-connection (acceptor socket-acceptor) system [Method]
A socket method accepts a connection by accepting a connection from the socket, and instantiating an instance of the acceptor’s connection class.
Connections are instantiated with initargs :socket and :stream, with values for the raw socket and the SSL-wrapped stream, respectively.
A connection connects a client and a server together, allowing the two to exchange messages between each other. decentralise2 is able to handle many diﬀerent types of connections, including in-memory passing connections and socketed connections that use the operating system’s networking capabilities, which only have to implement a simple protocol.
⇒connection [Protocol Class]
Note that a connection is implicitly started when it is instantiated.
⇒stop-connection connection [Generic Function]
Stop the connection, which should free any resources that creating the connection acquired, such as threads and sockets.
⇒handle-message connection message [Generic Function]
Handle a message that was sent to the connection.
⇒handle-message (connection connection) message [Method]
The default method calls the message handler of the connection with the message.
⇒connection-message-handler connection [Accessor]
A function that should be called with each received message. The default handler will wait for a new handler to be set, in order to (hopefully) prevent some race conditions, where a message could be received before the client has set a message handler.
⇒write-message connection message [Generic Function]
Write a message to a connection. The implementation of this must be thread safe, such that two threads can call write-message simultaneously, and will correctly write both messages.
The types of data that a connection can send are represented by a connection’s class, not unlike the stream classes such as fundamental-character-stream and fundamental-binary-stream in the de-facto Gray streams standard.
Note that, unlike Gray streams, all connections are bidirectional.
⇒character-connection [Protocol Class]
A connection that can send blocks with vectors of characters (strings) as data.
⇒binary-connection [Protocol Class]
A connection that can send blocks with vectors of octets as data.
⇒threaded-connection [Protocol Class]
A connection that uses a thread to read messages.
⇒read-message connection [Generic Function]
Read a message from the connection, blocking until one is present, then returning it and T, or return something and NIL if a message cannot be read ever again (e.g the other node closed the connection).
⇒listener-loop connection [Generic Function]
The thread of the connection will call this function, and stop when it returns.
⇒listener-loop (connection threaded-connection) [Method]
Repeatedly calls read-message, handling its return values appropriately.
A connection that passes its messages to another connection.
The connection that the passing connection should pass its messages to.
A character connection that exchanges messages using a simple textual format, based on the wire protocol Nettle used.
As the name suggests, this was intended to be the default connection class for Netfarm; but the creation of Netfarm’s much more eﬃcient binary format suggests that we should use a binary connection class as default.
Nonetheless, it is quite easy to read, and superﬁcially looks like the HTTP/1.0 wire protocol. Messages begin with unique “verbs”, and are separated by new lines.
|boolean||::=||yes ∣ no|
|integer||::=||(0 … 9)*|
|length-preﬁxed-string||::=||integer : character*|
|block-header||::=||block block-name version line-count channel-name*|
|error||::=||error block-name reason|
|node||::=||node uri id|
|subscription||::=||subscription block-name version channel-name*|
The base decentralise2 messages are represented as follows:
|boolean||::=||00 ∣ 01|
|byte||::=||00 … ff|
|long-integer||::=||short-integer × 8|
|metadata||::=||block-name long-integer channel-list|
|character-block||::=||02 metadata long-string|
|binary-block||::=||03 metadata long-bytes|
|error||::=||05 block-name long-string|
|announce||::=||09 short-string short-string|
A message is an abstract object that is sent over a connection, and can be considered the lowest-level communication protocol decentralise2 is concerned with.
⇒message-case message &body case* [Macro]
|case||::=||( pattern form* )|
|pattern||::=||( keyword variable* )|
|dont-care||::=||∼ ∣ - ∣ _ ∣ NIL|
|variable||::=||dont-care ∣ symbol|
Match the value of message against some patterns, which look like the arguments to the function message, and evaluate the forms of the ﬁrst matching case, with all variables that don’t match the rule dont-care bound to the arguments.
⇒message keyword &rest values* [Function]
Create a message from the type designated by the keyword with the given values. The number of values must be the same as the number of accessors of the type.
⇒get-blocks (:get names) [Message Type]
⇒put-block (:block name version channels data) [Message Type]
⇒ok-response (:ok name) [Message Type]
⇒error-response (:error name reason) [Message Type]
⇒new-subscriptions (:subscription name version channels) [Message Type]
⇒subscription (:subscribe channels) [Message Type]
⇒announcement-control (:allow-announcement allow?) [Message Type]
⇒node-announcement (:announce uri id) [Message Type]
⇒define-message-type keyword type-speciﬁer constructor-name &rest accessors [Macro]
Deﬁne a message type named by the given keyword, which corresponds to the given Common Lisp type speciﬁer, can be constructed by calling the function named by the constructor-name, and can be destructured using all the functions named in accessors.
⇒object->data object target source [Generic Function]
“Serialize” or “render” an object (which came from, or represents an instance of the source), into a value that can be sent as data on the target.
⇒data->object data source target [Generic Function]
“Deserialize” or “parse” the data (which represents an instance of, or something the target can consume), into an object, which was received from the source.
The second component that was developed was the Netfarm object system. The Netfarm object system introduces a text format for representing an object, as well as code to serialize and deserialize objects.
The part of the Netfarm system that implements an object system is also named Netfarm. It may be helpful to describe this component as the “Netfarm object system”, and the entire system as the “Netfarm software stack” or words to those meanings.
As an example of how Netfarm enables its users to exchange objects with little
friction, we will imagine that there are two car enthusiasts, Adrian and Bob, who
want to discuss their cars. Here is a class deﬁnition for a car that they could have
used in a client program:
This class functions identically to any other class deﬁned using the Common Lisp
We will discuss how Adrian can send that description of a car to Bob later.
The Netfarm object system has objects, which are instances of schemas, which are themselves objects.
This system maps well to the Common Lisp object system, and we can describe schemas using classes that are instances of a special metaclass.
The class of a Netfarm class.
The class of a Netfarm object.
A Netfarm object also has a list of signatures, and a source function, that is called to retrieve referenced objects lazily.
⇒object-signatures object [Accessor]
The signatures of an object, stored as an association list of user object-signature data pairs.
⇒object-source object [Accessor]
A function that takes an object name, and returns the object with that object name. This function typically requests the object from a client.
Much like the Common Lisp meta-object protocol, Netfarm objects are represented as vectors of slots; an object is comprised of a vector of “normal” slots, and a vector of computed slots.
Each Netfarm slot maps to a Common Lisp slot, but some Common Lisp slots don’t map to Netfarm slots1.
To avoid some potential non-determinism in how slot lists are ordered when read by Netfarm, slots are sorted by string< on their names in slot vectors.
⇒slot-netfarm-name slot-deﬁnition [Reader]
The name of a slot, a string or nil if the slot should not be present in the Netfarm object. This defaults to the downcased name of the slot-definition-name of the slot.
Multiple slots of a class cannot have the same name.
Three types of scripts describe the behaviour of instances of a schema:
Slots are inherited as usual for Common Lisp.
The presentation script, if not provided, is inherited from the ﬁrst superclass of the class that has a presentation script. The message script, if not provided, is inherited from the ﬁrst superclass of the class that has a message script.
The initialization scripts of a class are the union of the direct initialization scripts of a class, and the initialization scripts of the superclasses of the class, with duplicates removed.
⇒inbuilt@schema[Inbuilt instance of inbuilt@schema]
|Slot name|| |
To be deﬁned by the implementor.
(("computed-slots") ("documentation") ("scripts") ("slots"))
⇒inbuilt@user[Inbuilt instance of inbuilt@schema]
|Slot name|| |
To be deﬁned by the implementor.
⇒inbuilt@script[Inbuilt instance of inbuilt@schema]
|Slot name|| |
To be deﬁned by the implementor.
(("entry-points") ("methods") ("program") ("variables"))
⇒inbuilt@user-script[Inbuilt instance of inbuilt@script]
|Slot name|| |
(("add-datum" 2 0))
#(7 73 1 12 134 71 16 0 1 0 0 8 2 0 4 1 0 130 8)
Objects are usually serialized in order to transmit them to another client that does not share memory with the origin of the object, and that client must then deserialize the serialized form to produce a usable object.
In Netfarm, serialization is done by converting an object to a vague object, which has as much context of the object removed as possible, which can then be serialized into a character or octet vector. These vectors can then be deserialized by deserializing the vector into a vague object, and then re-introducing the context that the origin of the object supplied.
The octet-vector representation is also used for generating and verifying signatures.
The class of a vague object.
⇒apply-class vague-object class [Function]
Recontextualize a vague object by “applying” a class to it, producing an object.
⇒render value &optional stream [Function]
Render a value. If a stream is given, then write the value to that stream, else return the rendered value as a string.
⇒render-object object &optional stream [Function]
Render an object. If a stream is given, then write the object to that stream, else return the rendered object as a string.
⇒parse stream-or-string [Function]
Parse a value from a stream or a string.
⇒parse-block string &key source name [Function]
Parse an object from a string. The returned vague object has the supplied source and name.
|byte||::=||00 … ff|
|string||::=||01 length byte*|
|byte-vector||::=||02 length byte*|
|list||::=||05 length value*|
|reference||::=||06 length byte*|
|value||::=||string ∣ byte-vector ∣ positive-integer|
|∣ negative-integer ∣ list|
|∣ reference ∣ true ∣ false|
|slot||::=||value ∣ slot-unbound|
|metadata-element||::=||string ∣ value|
|object||::=||metadata slots computed-slots|
The integer 0 must be represented as a positive-integer, and integers must
be represented with as few bytes as possible; for example, 12,345 must be
represented as 03 02 30 39, and not 03 03 00 30 39 or with more zero bytes.
⇒binary-render value &optional function [Function]
Render a value. If a function is given, then write the value by calling it with octet vectors that it should write, else return the rendered value as an octet vector.
⇒binary-render-object object &key function (emit-computed-values t) (emit-signatures t) [Function]
Render an object. If a function is given, then write the value by calling it with octet vectors that it should write, else return the rendered value as an octet vector.
emit-computed-values and emit-signatures control if computed values and signatures will be rendered, respectively. The resulting vector will not be parseable if computed values and/or signatures are not rendered. These arguments are present to facilitate hashing and signing, and usually should be left to their defaults.
⇒binary-parse function-or-vector [Function]
Parse a value from a function (called with no arguments, which returns octets) or a vector.
⇒binary-parse-block function-or-vector &key source name [Function]
Parse an object from a function (as previously described for binary-parse) or a vector.
Netfarm uses cryptographic hashes to “name” objects, so that they can be referenced and retrieved by clients that do not already have them stored. Hashes are also used in the process of signing objects.
An object is hashed by rendering it in the binary format, and then hashing the resulting octet vector using the SHA-256 hash algorithm. Usually, computed values are not included in the render, as computed values must not aﬀect object identity. Furthermore, when hashes are used in signing objects, signatures are not included, as they would need to be already present in the object to be signed, or break identity (by changing the resultant hash, or by allowing signatures to be added or removed to an object while preserving hash value).
⇒hash-object object &key (emit-signatures t) [Function]
Produce a binary hash. Note that inbuilt objects cannot be hashed, and thus hashing them will signal an error. emit-signatures will control if signatures are included in the hash.
The most obvious inbuilt object that could not possibly be hashed is inbuilt@schema, which is its own schema. Of course, you could hash it by referencing its inbuilt name and not its hash, but the renderer would have to have a special case to use inbuilt objects’ names instead of the result of hash-object*, which felt worse than disallowing hashing inbuilt objects somehow.
⇒hash-object* object [Function]
Produce a textual hash, equivalent to (bytes->base64 (hash-object object)) for non-inbuilt objects, and in the form inbuilt@name for inbuilt objects.
⇒with-hash-cache () &body body [Macro]
Cache hashes produced with both hashing functions in the body.
Hashing can result in accidentally quadratic runtime2 when traversing graphs of objects and hashing them (such as saving a graph of objects with the Netfarm client). with-hash-cache will reduce hashing operations that have already been done (usually by recursive calls between hashing and the renderer) to hash-table lookups, reducing that kind of pattern to roughly linear runtime.
Hashes used in signing are currently not hashed (TODO: though they probably should be.)
If objects that have been hashed are modiﬁed, then the results of hash-object and hash-object* are undeﬁned; but the dynamic extent of with-hash-cache should allow for fairly easy analysis of where hashes can be cached.
Object hashes are stored in a weak hash table, so they will be removed when the objects are garbage, and the garbage collector runs.
⇒with-lexical-hash-cache () &body body [Macro]
Bind the hash caches in a way that allows for them to be closed over, and restored using the local macro with-restored-lexical-hash-cache. This is intended to be used to share cached hashes between threads; which takes some more deliberation to ensure correctness, but can be beneﬁcial.
The Netfarm server uses a hash cache shared between all threads; and this is safe as objects are never mutated inside the server.
⇒with-restored-lexical-hash-cache () &body body [Macro]
Restore the hash caches from with-lexical-hash-cache.
The netfarm-client system provides functions to read and write Netfarm objects from a decentralise client, as well as a client class which can search through a network for an object.
Continuing our car example, Adrian can create a client and use it to save his
save-object returns the hash of the object, which can be used by Bob to retrieve
the object. Having to pass around a hash is not very convienent, and we can attach
our car to some other object Bob may already know about, but we will begin by
using this hash to retrieve the car.
While it is a component that is packaged with the Netfarm object system, the Netfarm script machine is a suﬃciently large and complex component that it deserves its own chapter. Scripts are portable programs that a Netfarm node can run to handle the behaviour of objects.
The script machine is based oﬀ the abstract SECD machine, which may be familiar to functional language compiler writers. The machine has several registers:
A simple assembler is provided with the reference Netfarm implementation, which
can compile programs such as:
While this (symbolic) assembly code looks much diﬀerent to a program written in a high-level language (and also unlike assembly code for real processors), it is similar to the bytecode generated by CLISP for Common Lisp programs, and CPython for Python programs.
This example program doesn’t really have a purpose in a Netfarm system. Scripts in Netfarm have three purposes (currently):
⇒setup-interpreter script arguments &optional capabilities [Function]
⇒run-interpreter interpreter &key cons-limit cycle-limit print-cycles step [Function]
⇒run-script script &rest arguments [Function]
TODO: Add a dynamic variable holding the capabilities the script machine should have.
⇒run-script-machines object &key apply-side-eﬀects recipient-ﬁlter [Function]
TODO: This entire section.
Presentations are the other use of scripts in Netfarm. A presentation is a graphical representation of an object that retains its identity. A presentation model allows Netfarm to exchange machine-readable objects between clients, while also being able to present human-readable descriptions of objects. This model was introduced by [Cic84], and has been used in the Common Lisp Interface Manager to great eﬀect.
Presentations are generated by presentation methods that are implemented using the script machine.
The script machine is a small stack machine, based oﬀ the SECD machine, with additional registers for convenience and to facilitate calling between multiple scripts, and a more common bytecode-based instruction format (rather than S-expression based).
The script machine has these registers:
A stack containing intermediate data.
The lexical environment, represented as a list of vector “frames”.
The position the interpreter is reading instructions from.
A stack telling the interpreter where to return to.
A vector of global variables.
The script the interpreter is current reading instructions from.
Flags that allow the machine to perform operations that only make sense in some situations.
The current object, or the empty list, if we have not called a method.
The last object, which was the value of O before calling a method.
A script has these values:
An octet vector that encodes instructions to execute.
A vector of descriptions for procedures, which are combined with environments by some instructions to produce procedures.
A vector of descriptions for methods.
A vector of the initial values of global variables.
We write the state of a running interpreter as the tuple ⟨S,E,P,D⟩. We sometimes reference the registers G, C, A, O and O′ outside the tuple, because they are seldom used by instructions, and so they are just tedious to write most of the time.
We write the states of a halted interpreter as halt[S], should it halt correctly, or error[message, cause] should it fail. (Only the data stack is preserved when an interpreter stops.)
We write the eﬀects of interpreting an instruction as
old state → new state
Should an instruction concern itself with only the Store, we may write eﬀects in the form
i1…im — o1…on
(borrowing from Forth conventions) which can be translated to the other notation as
⟨(im…i1 . S),E,P,D⟩ → ⟨(on…o1 . S),E,P + length,D⟩
where length is the total length of the instruction. Note that the equivalent transition has the stack preﬁxes “reversed”; the last input is popped ﬁrst, and the ﬁrst output is pushed ﬁrst. In Forth, we may write a program 3 2 - (⇒ 1) and the stack before evaluating - as a list would be (2 3). To maintain a natural argument order for the programmer, the machine handles arguments “backwards” when described using lists.
To access a value of a structure without destructuring it, we use the notation SA to get the value of the variable (slot) A of S. We also use subscripts to retrieve values from lists and vectors; Sn denotes the nth value in S. The ﬁrst element of a sequence is retrieved with a zero subscript, as with most programming languages; and retrieving an element that would be past the last element of a sequence causes interpretation to fail.
|getenv[(_.E),variable,frame]||= getenv[E,variable,frame − 1]|
⇒1: get-proc n [Instruction]
⟨S,E,P,D⟩ → ⟨(proc[P,E,P0,A] . S),E,P + 2,D⟩
where (P0 A) = (CE)n
This closes over the current environment, whereas get-proc* does not.
⇒2: get-value n [Instruction]
⟨S,E,P,D⟩ → ⟨(Gn . S),E,P + 2,D⟩
⇒3: set-value! n [Instruction]
⟨(value.S),E,P,D⟩ → ⟨S,E,P + 2,D⟩
where Gn ← value
⇒4: get-env variable frame [Instruction]
⟨S,E,P,D⟩ → ⟨S,E,P + 3,(value.D)⟩
where value = getenv[E,variable,frame]
⇒5: set-env! variable frame [Instruction]
⟨(value . S),E,P,D⟩ → ⟨S,E′,P + 3,D⟩
where getenv[E,variable,frame] ← value
⇒6: get-proc* n [Instruction]
⟨S,E,P,D⟩ → ⟨(proc[P,(),P0,A] . S),E,P + 2,D⟩
where (P0 A) = (CE)n
⇒11: byte n [Instruction]
⇒8: return [Instruction]
⟨S,E,P,()⟩ → halt[S]
⟨S,E,P,((E′.C′).D)⟩ → ⟨S,E′,C′,D⟩
⇒9: call n [Instruction]
⟨(proc[C′,E′,count] An…A1 . S),E,P,D⟩ → ⟨S,((A1…An) . E′),C′,((E . P + 2) . D)⟩
when count = n
Note that the return address must be placed at the instruction after the call instruction.
⇒10: tail-call n [Instruction]
⟨(proc[C′,E′] An…A1 . S),E,P,D⟩ → ⟨S,((A1…An) . E′),C′,D⟩
⇒16: jump-cond thenhigh thenlow elsehigh elselow [Instruction]
⟨(#f.S),E,P,D⟩ → ⟨S,E,P + 5 + else,D⟩
where else = elsehigh × 256 + elselow
⟨(v.S),E,P,D⟩ → ⟨S,E,P + 5 + then,D⟩
where then = thenhigh × 256 + thenlow,v≠#f
Jumps are relative to the end of the instruction, so jumping 0 bytes basically does nothing.
⇒17: jump high low [Instruction]
⟨S,E,P,D⟩ → ⟨S,E,P + 3 + high × 256 + low,D⟩
⇒24: error [Instruction]
⟨(cause message.S),E,P,D⟩ → error[message, cause]
⇒34: drop [Instruction]
⇒35: dup [Instruction]
x — x x
⇒36: swap [Instruction]
x y — y x
⇒64: cons [Instruction]
car cdr — (car . cdr)
cdr must be a list; we can’t represent improper lists in Netfarm, so we don’t give the script machine improper lists.
⇒65: car [Instruction]
(car . _) — car
⇒66: cdr [Instruction]
(_ . cdr) — cdr
⇒67: null [Instruction]
() — #t
_ — #f
⇒68: consp [Instruction]
(_ . _) — #t
_ — #f
⇒69: append [Instruction]
a b — a++b
⇒71: equal [Instruction]
a b — equal[a,b]
|equal[(aa . ad),(ba . bd)]||= equal[aa,ba] ∧ equal[ad,bd]|
|equal[a ∈ Integer,b ∈ Integer]||= a = b|
|equal[a ∈ String,b ∈ String]||= a = b|
|equal[a ∈ Object,b ∈ Object]||= hash[a] = hash[b]|
|equal[proc[aa,ae,ac],proc[ba,be,bc]]||= aa = ba ∧ ae = be ∧ ac = bc|
|equal[_, _]||= #f|
⇒72: string= [Instruction]
a b — equal[a,b]
when a ∈ String,b ∈ String
⇒73: list n [Instruction]
e1…en — (e1…en)
⇒96: + [Instruction]
a b — a+b
⇒97: - [Instruction]
a b — a-b
⇒98: * [Instruction]
a b — a × b
⇒99: / [Instruction]
a b — ⌊⌋
The results of division are rounded to the closest integer below the actual value; i.e -5 2 / ⇒−3
⇒100: abs [Instruction]
a — |a|
⇒112: = [Instruction]
a b — a = b
when a ∈ Integer,b ∈ Integer
The Netfarm server is a decentralise2 system, which retrieves a partition of all the objects available to it, and makes them available to its connections.
In the introduction to decentralise2, we argued that the terms “client” and “server” do not have enough meaning to be useful when discussing Netfarm software; but we ended up calling this module a “server”.
The class of a Netfarm server.
A system that stores everything in memory. Compare this to decentralise2’s memory-database-mixin.
A mixin that stores objects in a ﬁlesystem.
Objects are stored in a “trie” constructed of directories, to avoid slowdowns that occur on some ﬁle systems, should one directory contain many ﬁles. For each object, three ﬁles are stored:
Some ﬁles are also stored in the storage directory:
The directory that objects should be stored in.
A mixin that caches some objects in memory, but defers to another database implementation when it does not have an object stored.
TODO: Somehow, including caching makes the system slower. There already is a cache of real objects (rather than vague objects) in the system (in the form of a weak hash table, and thus gets GCed and doesn’t hold objects that aren’t live somewhere else for long), but another cache does signiﬁcantly reduce the reads in some tests. Perhaps these tests are too write-heavy, and the cache isn’t really able to do much, or perhaps we’re doing something wrong with cacle.
The maximum rendered size of objects that can be cached at once, in bytes. This defaults to 100 megabytes (108 bytes).
The size that vague objects should be assumed to take when rendered, should the real database not provide a size.
⇒put-vague-object system vague-object [Generic Function]
⇒get-vague-object system name [Generic Function]
⇒map-vague-objects function system [Generic Function]
⇒vague-object-stored-p system name [Generic Function]
⇒count-vague-objects system [Generic Function]
⇒presentable-name-p system name [Accessor]
⇒count-vague-objects system [Generic Function]
⇒apply-side-effect system name cause side-eﬀect-type &key [Generic Function]
⇒map-computed-values-caused-by function cause-name system [Generic Function]
⇒objects-affected-by system name [Generic Function]
⇒add-other-interesting-block system name [Generic Function]
⇒remove-other-interesting-block system name [Generic Function]
⇒other-interesting-block-p system name [Generic Function]
⇒add-dependency-vertex system dependent dependency [Generic Function]
⇒remove-dependents system dependency [Generic Function]
⇒map-dependents function system dependency [Generic Function]
⇒map-dependencies function system dependencies [Generic Function]
⇒no-dependencies-p system name [Generic Function]
We have not thoroughly tested this voting-web system, but we should describe it, because it is a very useful but infrequently applied technique for moderating an online system. This system may be seen as an extension of many Usenet clients’ kill lists, which contain a set of patterns that match posts the user does not want to see; but it can also be used as a whitelist, or just to rank some objects in some order.
It is very important that a distributed system has a means of distributed “moderation”. We wonder why distributed systems are built without distributed moderation of some form, and we believe that using centralised moderation on such a system would be an equivalent of the “people’s stick” described in [Bak73]. Unlike some kinds of moderation in the real world, we are not required to give computers some kind of “objective” truth, and we do not attempt that with Netfarm. Freeing ourselves from that obligation allows us to approximate the desires of each user with greater precision.
This system allows a user to maintain a map of “facts” that they assert to be true or false. A presentation script may query the voting-web system while rendering an object, to decide what related objects it should render, or a client program may just refuse to render some presentation forms of an object that has certain facts about it asserted.
In the absence of a fact asserted by the user, we believe it is possible to determine the probability that the user would assert the fact, as a function of some delegates’ assertions. Suppose Pr(F is true to U) is the probability that a user U takes the fact F to be true, and that delegates(U) denotes the set of delegates of the user U. We can deﬁne the probability to be:
|Pr(F is true to U)||= 1||(if U asserts F to be true)|
|Pr(F is true to U)||= 0||(if U asserts F to be false)|
|Pr(F is true to U)||=|
We also introduced a function correlation(U,D) which estimates the correlation coeﬃcient between two users, which we have appropriated from the memory-based collaborative ﬁltering techniques used in recommender systems. This could be deﬁned as:
|and facts||= the rules U asserts ∩ the rules D asserts|
In the absence of a large number of common facts, setting the correlation coeﬃcient to be a small constant may yield results that are not very good, but are better than not attempting to calculate a probability.
It may also be wise to bias the correlation and resultant probabilities towards zero, to produce less conﬁdent values for fewer common facts, and fewer delegates, respectively. This can be done by adding a small positive integer to the denominators of our deﬁnitions of correlation(U,D) and Pr(F is true to U).
uninteresting set, 157
vague object, 158
[Cic84] Eugene C. Ciccarelli. Presentation based user interface. https://dspace.mit.edu/handle/1721.1/6946, 1984.
[Eth19] Ethereum developers. Sharding faq. https://github.com/ethereum/wiki/wiki/Sharding-FAQ, 2019.
[Rou17] Indhi Rousseau. Mastodon instances. https://instances.social/list/advanced, 2017.
1I believe the Common Lisp babel library preserves code points, as the encoding and decoding functions are quite simple, but some guarantees on how code points are treated would be nice.
1We do need a more unique and less vague name for this...
1This may, for example, be useful to separate internal and external representations; it would be wise to represent a mapping of objects as a hash table in Lisp, but provide an association list for Netfarm to work with.