This is a sketch of the basic idea, starting from the simplest example.
The simplest example is a pipeline connecting two endpoints.
Consider a single source (of "sensory input" coming from the external environment) and a single sink (a "motor" that can accept data, and affect the external world in some way.) For this most basic example (coded up in the examples), two xterms are used. You can type into one (this is the "sensory input") while the second xterm displays the text generated by an agent. For this demo, the agent is just a single pipe, from source to sink, and all it does is to copy the data. It's the minimal non-trivial agent: it passes input directly to output. You type into one terminal, and whatever you type in shows up in the other terminal. (The minimal agent would be the null agent: ignores everything and does nothing.)
The hard part in implementing the above is mapping capabilities to actions.
The xterm has four capabilities: describe, open, read, write. The
"describe" capability is "god-given", it always exists, and so the
"lookat" action (cog-execute! (LookatLink ...))
will always return a
description. For the xterm, the description is:
(OPEN- & TXT+) or (WRITE- & TXT-)
The above uses the abstract Link Grammar (LG) notation for connectors and
disjuncts. The OPEN-
connector says that you may send the "open" command
as a valid message to the xterm device. The (OPEN- & TXT+)
disjunct says
that, if you send the "open" message, you will get back a text-source, a
stream that can spew endless amounts of text.
The WRITE-
connector says that you can send the "write" message. The
(WRITE- & TXT-)
disjunct says that if you send the "write" message,
you must also provide a text source. You must provide a readable
text-pipe from which text-data can be sucked out of, and written into
the external environment.
Clearly, TXT-
can be hooked up to TXT+
to form an LG link. However,
the linkage is incomplete, because OPEN- and WRITE-
remain
unconnected. For that, an agent is needed: the agent is able to
provide the connectors needed to obtain a full linkage, completing the
diagram. By examination, the agent needs to be (OPEN+ & WRITE+)
Thus,
the Link Grammar dictionary is:
agent: (OPEN+ & WRITE+);
xterm: (OPEN- & TXT+) or (WRITE- & TXT-);
The "sentence" to parse is
"xterm agent xterm"
and the parse is the LG diagram
+-------------> TXT ------>+
+<-- OPEN <--+--> WRITE -->+
| | |
xterm agent xterm
The above diagram describes a system consisting of one sensory device (the xterm on the left), one motor device (the xterm on the right) and an agent that is capable of triggering the open and write commands, so as to make text flow, from the input device to the output device.
The above is a linkage, and it is the only possible linkage for the bag of two xterms and this particular kind of agent. If the Link Grammar generator were set to run free on this bag-of-parts list, this is the only valid diagram that can result.
Using an electronics analogy, the parts list is the BOM (Bill of Material) and it's like you have a bag with resistors, capacitors, transistors, and you shake it around, and circuits fall out. The only valid circuits are those which have all wires fully connected (soldered together). The Link Grammar linkage is the same thing as the electronics EDA netlist.
Of course, most random electronic circuits are going to be useless. If only there was some way of having a training corpus of electronic circuits, and some way of using deep learning to create a LECM (Large Electronic Circuit Model) ... but alas, no such system exists.
In biology, there is this concept of "assembly theory". There is a concept of autopoesis or autopoetic systems, of self-assembling systems. You throw a bunch of lipids into a bag, shake it, and out pops a bilipid layer. Throw a bunch of amino acids into a bag, shake it, and out pops a protein. Throw in some ribose sugars, out pops a DNA strand. Things connect up to other things. The connection process sometimes seems mysterious. There are (electron) affinities on each connector. Each connector (each chemical atom) is endowed with a collection (set, vector) of Bayesian priors. This set of priors is called a quantum "mixed state"; it is a weighted collection of "pure states" (unconnected, free connectors, in the form of bra and ket). The mixed state assigns a likelihood (a "Bayesian prior", a single number) to each pure state in the mixture (to each possilbility) The actual hookup, where atoms/molecules/electronic bonds actually connect, is a selection of one of these possible connectors from the set of disjuncts, weighted by the "cost" (It's called "cost" in link-grammar, it's called "enthalpy" in chemistry.) The "big idea" from Bill Friston for how to build AI can be understood as a mixed state of Bayesian priors. Hooking up is tensor contraction, weighted as a maxied state, so a different hookup for each "possible world" of the "many worlds". But I digress.
At any rate, the concept of an Action is a powerful organizing principle. The action for electronic circuits and assembly-langauge instructions remains unknown. The generic action for sensori-motor systems done up in Atomese remains equally out of reach.
Returning to the base example. Consider adding a chatbot to this mix.
A chabot is a processing device that can accept text as input, and
generate text as output, and so is decorated with (described by) a
disjunct that is (TXT- & TXT+)
.
The corresponding LG dictionary is
ioagent: (OPEN+ & WRITE+);
chatbot: (TXT- & TXT+);
xterm: (OPEN- & TXT+) or (WRITE- & TXT-);
and one possible linkage (circuit diagram) is:
+------------> TXT --+----> TXT --->+
+<-- OPEN <--+-------|---> WRITE -->+
| | | |
xterm ioagent chatbot xterm
This is perhaps the simplest example of hooking up one sensory device (source of data from the environment) and one motor (device capable of acting upon the environment) to some machine (the chatbot) that does some "data processing" in the middle.
Some subtle points: in general, for some sensory device, we don't know
what kind of data it delivers or accepts. That's why TXT+
and TXT-
are needed. Here, TXT
is a type-theory type. It's a class-label stuck
onto the kinds of messages that can flow about. Likewise, OPEN
and
WRITE
are also types. Sensory devices will typically accept OPEN
messages, and motors will typically accept WRITE
, it is not obvious
that this is always the case. This is why these messages have to be
explicitly described in the device description.
Whether or not any given device has "open" and "write" methods on it is "described" by the dictionary, which encodes the message types that can be sent & received. So "open" and "write" become messages.
The number of "arguments" and argument types for a given message is also
specified by LG connectors. For example, "open" might need a URL as an
argument. For an IRC (Internet Relay Chat) chatbot, you need two
arguments: the name of the IRC network and also the chat channel to
join. In this case, the disjunct for IRC would be (OPEN- & NET- & CHAN-)
where the NET
type is a subtype of TXT
and CHAN
is also a subtype
of TXT
, and any agent attempting to open an IRC chat needs to provide
text strings for these two. Those text strings must "come from
somewhere", they don't just magically appear.
They might, for example, come from an xterm, where a human user types
them in. This is just like a web-browser, where the URL bar is like the
xterm, it allows a human user to type in a URL, which then gets piped
along. Just to be clear, the GUI on a web browser would also have a
disjunct like (OPEN- & FLOAT+ & FLOAT+)
which says that, after
opening, the web browser promises to generate a click-stream of x,y
window coords. There's also a CSS agent that's got a (FLOAT- & FLOAT- & URL+)
that says it will accept a pair of window x,y coordinates, and return
the URL that was actually clicked on by the user.
The above examples use heterosexual +/- marks on the connectors, which,
in the above examples, stand for "input" and "output". Reality is more
complicated, and so the Atomese connectors have a SexNode
which can be
+
or -
but can also be other things with different kinds of mating
rules. The +
/-
rules are enough to implement classical lambda
calculus and beta reduction, so such a language is "Turing complete".
Although lambda calculus is great for conventional software
programming (e.g. the Lisp, Scheme programming languages), it is not
appropriate for network descriptions.
To belabor the point: classical Link Grammar uses +
/-
connector
directions in place of beta-reduction. A classical LG disjunct like
S- & O+
, denoting a transitive verb that connects to a subject S
and
an object O
could have been written using typed lambdas as
lambda:O.x:S
which is just lambda.x
there x
is typed to be S
and the result of
the beta-reduction with x
(a common noun) is of type O
. Clearly,
using typed lambda calculus is awkward for expressing grammatical rules.
A different solution to this problem is seen in pregroup grammars and
in combinatory categorial grammars (CCG), which use forward and
backslashes to indicate direction. Thus, notations like S/NP
or
VP\S
show up in those notational systems. The slashes are type
constriuctors, and VP\S
is the construction of a type that indicates
the object of a transitive verb. In Link Grammar, this is isomorphic
to the O+
connector. The isomorphism is explicitly given in
this PDF.
There are only two slash directions in CCG for the same reason that
there are only +
and -
in classical Link Grammar: it works just fine
for the English language. For languages that have freer word-order,
subjects, objects, verbs, adjectives can appear in any fairly loose
order. Classical Link Grammar was patched up to also provide h
and t
connector types, denoting head
and tail
in a conventional, classical
dependency grammar formulation. And so now, instead of two
connector-direction types, there are four. Each set is heterosexual, so
+
mates to -
and h
mates to t
, but the combinations are more
complicated: h+
can mate to t-
but not t+
or h-
, and so on.
To deal with more complex mating rules than just "input" and "output"
or "command" and "result", while also avoiding the need to to
"Schonfinkelize" or "curry" the "arguments" to a lambda, Atomese
introduces the SexNode
, so as to provide more complex mating rules.
To recap: lambdas and beta-reduction is sufficient to get a
Turing-complete system, but leads to awkward notation for
daggar-symmetric monoidal categories and linear type systems, where
two directions need to be acknowledged. For most natural language,
e.g. English, it is enough to throw away the "symmetric" part of
the category, and two directions are still enough. For free-word-order
langages (or freer) such as Turkish, Lithuanian, Finnish, etc. it is
convenient to have more than two connection possibilities. The SexNode
provides a good way to do this, in Atomese.
The ideas outlined above are widespread in modern software. There are already many systems in the software world that already do most (but not all) of the things mentioned above. Here's a lightning review of some of these systems, focusing on the similar components.
The Robot Operating System (ROS) already implements half of the ideas described above. It has explicit sensor devices, and explicit motors, each of which can be hooked up to others for perception and action. The hookups take the form of unix UDP pipes, although recent ROS has moved to a message-passing system. Once a network pipe is connected (linked) and started, the data flows continuously, forever, until the pipe is closed.
Both sensors and motors are described with YAML files. These are the analogs of the Link Grammar dictionary. Other YAML files provide the netlist of what hooks up to what: they describe the actual robot. They list what sensors and motors are actually used in a given robot, how they are attached to one-another. The robot-engineer designs these linkages. Hitting "run" hooks up all the devices, allowing data to flow on the network pipes.
One can write the netlist YAML files by hand; there are also GUI tools that can be used to drag-n-drop sensors/motors and to create links between them.
The primary issue here, from the perspective, is that none of this is done in Atomese, and there's no particular sense of Link Grammar connectors or linkages. These concepts are "ad hoc" in ROS. The robot designer implicitly knows them and uses them, but there's no explicit manipulable system or API for dealing with this. Yes, the GUI designer can open a YAML file, and "parse" what's inside of it, but the "parsing" is naive and ad hoc. The YAML files are stored as text files in a conventional file system.
This presents a challenge for an AGI system looking to have a robot embodiment: it has to navigate a file system, find a bunch of YAML files, edit those YAML files, examine megabytes of debug output dumped into log files in yet another location of the file system ... there's no YAML file that describes the logfile, that says "heres how you can read a logfile and debug it". The autopoetic aspect is missing in ROS.
Off-topic remark: writing and tweaking ROS YAML is hard. It is comparable to the difficulty of writing Atomese. Both take time and effort to learn.
If you use github actions, some of the above might remind you of the github actions or circleci YML files. This is not an accident: the circleci files are describing a process network flow to perform unit tests, where the sensory device is a read of a git repo, the agents and actions are to compile & run the unit tests, and the output is whether the unit tests passed or failed.
If you've ever screwed around with javascript node.js then you know
package.json
and package-lock.json
These are sensory-device
description files (or, more accurately, agent-description files), in
that they describe what a particular node.js device provides: the
inputs, the outputs of the "node". You then use npm run make
to
parse the system and perform all the hookups. You will also be
familiar with the carpet burns you get from screwing with electron.
Hooking together open connectors into a functioning network is
difficult.
The idea of node.js is hardly an accident. Shortly after the very
first-ever websites were created, and the very first cgi-bin
scripts
were written, it became clear that a sophisticated website needed a
complex process flow, of users logins flowing to an authentication agent
which authorizes user access to a shopping cart and enables the movement
of that shopping cart to the one-click checkout page. This is a complex
functional flow, where lots of things have to flow into other things.
So, of course, you have to solve a network design and dataflow problem.
It has to be solved in such a way that the devops people can do their
job. And so onwards ho the march of technology.
The Unified Modeling Language provides a diagramatic system for describing process flows that occur in software design. It is broad enough to have been extended to other industrial process flows, but with limited sccess. As a graphical, visual notation for describing systems, helping human engineers understand the systems they are working with, it enjoyed some limited success.
One issue is that there's no automation per-se: software does not turn into UML, or vice-versa. There's no one-to-one correspondence, and the diagrams are ultimately hand-drawn. The diagrams are also a bit inscrutable: yes, there are labelled boxes with interconnecting lines drawn between them but what do those circles and triangles mean? Go forbid you draw a circle where a triangle was intended. Expending this kind of mental effort to draw UML seems a bit much. Compare to drawing electrical or electronic circuits correctly: the effort is worth it, because there is an actual physical system that corresponds, more-or-less one-to-one, to the diagram.
But even then: designing electronics with diagrams is hard, once one gets past a certain size. Both VHDL and Verilog turn electronic circuit design into an act of programming resembling conventional software programming. Using UML to convert software into diagrams is a technology that runs counter to natural human abilities. Yes, we think visually, but only up to a certain scale.
For large systems, the visual field gets too busy, and the forest gets lost but for the trees. For small systems, where one is an expert, the compact notation provided by programming languages is easier to work with precisely because it is more compact and dense.
The above systems seem to be missing several desirable properties:
-
Configuration is done with files, and can only be edited with GUI tools. By contrast, Atomese is stored in the AtomSpace, where it is searchable. Also, Atomese can be controlled, edited and altered by other Atomese. Atomese is "visible" to Atomese, and is thus controllable, editable and manipulable by Atomese.
-
The syntax of YAML files is not self-describing. If you want to know what some particular YAML does, you have to RTFM, and the docs are written in English. There's no way of opening a YAML file, and finding a machine-readable description of what's in there. The description is in English, and is usually on a website far away from the actual control file.
The goal of this project is to plug those gaps: to provide I/O devices that come with descriptions of how to actually use them, and to have those descriptions in a form such that logical reasoning can be performed.
Both of the above complaints can be waived away by noting that GPT-type systems can kind of deal with these complexities. A properly-trained GPT system will have had the ROS documentation and the node.js documentation in it's training set, and so it kind-of-ish already kind-of knows how this stuff works. The Microsoft Codepilot can already write semi-coherent snippets of code that are grammatically correct, and kind-of do sort-of what you want them to do, in a way. So maybe all this effort to design a low-level system that behaves correctly for sensori-motor processing is a waste of time, and I should just kicj=k back and wait for GPT and OpenAI to figure it out. Who the hell knows. It is a plausible answer.
There's no "one ring to rule them all". I am not aware of any generic comp-sci theory for exploring the autogeneration of networks. Doing Atomese Sensory is sort of my personal best-guess for the generic theory/system of "making things hook together into a sensible autopoetic system". The "basal cognition" for agent-environment interaction.
I'm not sure if any of this is useful, or if this is just a screwball low-level academic exercise in comp-sci trivia. That's why this whole git repo is labelled "experiment". Build it, and see what happens.