Skip to content

Latest commit

 

History

History
387 lines (342 loc) · 19.3 KB

Architecture.md

File metadata and controls

387 lines (342 loc) · 19.3 KB

Architectural Overview

This is a sketch of the basic idea, starting from the simplest example.

A single pipeline

The simplest example is a pipeline connecting two endpoints.

Consider a single source (of "sensory input" coming from the external environment) and a single sink (a "motor" that can accept data, and affect the external world in some way.) For this most basic example (coded up in the examples), two xterms are used. You can type into one (this is the "sensory input") while the second xterm displays the text generated by an agent. For this demo, the agent is just a single pipe, from source to sink, and all it does is to copy the data. It's the minimal non-trivial agent: it passes input directly to output. You type into one terminal, and whatever you type in shows up in the other terminal. (The minimal agent would be the null agent: ignores everything and does nothing.)

The hard part in implementing the above is mapping capabilities to actions.

The xterm has four capabilities: describe, open, read, write. The "describe" capability is "god-given", it always exists, and so the "lookat" action (cog-execute! (LookatLink ...)) will always return a description. For the xterm, the description is:

   (OPEN- & TXT+) or  (WRITE- & TXT-)

The above uses the abstract Link Grammar (LG) notation for connectors and disjuncts. The OPEN- connector says that you may send the "open" command as a valid message to the xterm device. The (OPEN- & TXT+) disjunct says that, if you send the "open" message, you will get back a text-source, a stream that can spew endless amounts of text.

The WRITE- connector says that you can send the "write" message. The (WRITE- & TXT-) disjunct says that if you send the "write" message, you must also provide a text source. You must provide a readable text-pipe from which text-data can be sucked out of, and written into the external environment.

Clearly, TXT- can be hooked up to TXT+ to form an LG link. However, the linkage is incomplete, because OPEN- and WRITE- remain unconnected. For that, an agent is needed: the agent is able to provide the connectors needed to obtain a full linkage, completing the diagram. By examination, the agent needs to be (OPEN+ & WRITE+) Thus, the Link Grammar dictionary is:

   agent: (OPEN+ & WRITE+);
   xterm: (OPEN- & TXT+) or (WRITE- & TXT-);

The "sentence" to parse is

   "xterm agent xterm"

and the parse is the LG diagram

      +-------------> TXT ------>+
      +<-- OPEN <--+--> WRITE -->+
      |            |             |
    xterm        agent         xterm

The above diagram describes a system consisting of one sensory device (the xterm on the left), one motor device (the xterm on the right) and an agent that is capable of triggering the open and write commands, so as to make text flow, from the input device to the output device.

The above is a linkage, and it is the only possible linkage for the bag of two xterms and this particular kind of agent. If the Link Grammar generator were set to run free on this bag-of-parts list, this is the only valid diagram that can result.

Generic linkages

Using an electronics analogy, the parts list is the BOM (Bill of Material) and it's like you have a bag with resistors, capacitors, transistors, and you shake it around, and circuits fall out. The only valid circuits are those which have all wires fully connected (soldered together). The Link Grammar linkage is the same thing as the electronics EDA netlist.

Of course, most random electronic circuits are going to be useless. If only there was some way of having a training corpus of electronic circuits, and some way of using deep learning to create a LECM (Large Electronic Circuit Model) ... but alas, no such system exists.

In biology, there is this concept of "assembly theory". There is a concept of autopoesis or autopoetic systems, of self-assembling systems. You throw a bunch of lipids into a bag, shake it, and out pops a bilipid layer. Throw a bunch of amino acids into a bag, shake it, and out pops a protein. Throw in some ribose sugars, out pops a DNA strand. Things connect up to other things. The connection process sometimes seems mysterious. There are (electron) affinities on each connector. Each connector (each chemical atom) is endowed with a collection (set, vector) of Bayesian priors. This set of priors is called a quantum "mixed state"; it is a weighted collection of "pure states" (unconnected, free connectors, in the form of bra and ket). The mixed state assigns a likelihood (a "Bayesian prior", a single number) to each pure state in the mixture (to each possilbility) The actual hookup, where atoms/molecules/electronic bonds actually connect, is a selection of one of these possible connectors from the set of disjuncts, weighted by the "cost" (It's called "cost" in link-grammar, it's called "enthalpy" in chemistry.) The "big idea" from Bill Friston for how to build AI can be understood as a mixed state of Bayesian priors. Hooking up is tensor contraction, weighted as a maxied state, so a different hookup for each "possible world" of the "many worlds". But I digress.

At any rate, the concept of an Action is a powerful organizing principle. The action for electronic circuits and assembly-langauge instructions remains unknown. The generic action for sensori-motor systems done up in Atomese remains equally out of reach.

Chatbots and Agents

Returning to the base example. Consider adding a chatbot to this mix. A chabot is a processing device that can accept text as input, and generate text as output, and so is decorated with (described by) a disjunct that is (TXT- & TXT+).

The corresponding LG dictionary is

   ioagent: (OPEN+ & WRITE+);
   chatbot: (TXT- & TXT+);
   xterm: (OPEN- & TXT+) or (WRITE- & TXT-);

and one possible linkage (circuit diagram) is:

      +------------> TXT --+----> TXT --->+
      +<-- OPEN <--+-------|---> WRITE -->+
      |            |       |              |
    xterm       ioagent  chatbot       xterm

This is perhaps the simplest example of hooking up one sensory device (source of data from the environment) and one motor (device capable of acting upon the environment) to some machine (the chatbot) that does some "data processing" in the middle.

Some subtle points: in general, for some sensory device, we don't know what kind of data it delivers or accepts. That's why TXT+ and TXT- are needed. Here, TXT is a type-theory type. It's a class-label stuck onto the kinds of messages that can flow about. Likewise, OPEN and WRITE are also types. Sensory devices will typically accept OPEN messages, and motors will typically accept WRITE, it is not obvious that this is always the case. This is why these messages have to be explicitly described in the device description.

Whether or not any given device has "open" and "write" methods on it is "described" by the dictionary, which encodes the message types that can be sent & received. So "open" and "write" become messages.

Argument Lists

The number of "arguments" and argument types for a given message is also specified by LG connectors. For example, "open" might need a URL as an argument. For an IRC (Internet Relay Chat) chatbot, you need two arguments: the name of the IRC network and also the chat channel to join. In this case, the disjunct for IRC would be (OPEN- & NET- & CHAN-) where the NET type is a subtype of TXT and CHAN is also a subtype of TXT, and any agent attempting to open an IRC chat needs to provide text strings for these two. Those text strings must "come from somewhere", they don't just magically appear.

They might, for example, come from an xterm, where a human user types them in. This is just like a web-browser, where the URL bar is like the xterm, it allows a human user to type in a URL, which then gets piped along. Just to be clear, the GUI on a web browser would also have a disjunct like (OPEN- & FLOAT+ & FLOAT+) which says that, after opening, the web browser promises to generate a click-stream of x,y window coords. There's also a CSS agent that's got a (FLOAT- & FLOAT- & URL+) that says it will accept a pair of window x,y coordinates, and return the URL that was actually clicked on by the user.

Connector Sex

The above examples use heterosexual +/- marks on the connectors, which, in the above examples, stand for "input" and "output". Reality is more complicated, and so the Atomese connectors have a SexNode which can be + or - but can also be other things with different kinds of mating rules. The +/- rules are enough to implement classical lambda calculus and beta reduction, so such a language is "Turing complete". Although lambda calculus is great for conventional software programming (e.g. the Lisp, Scheme programming languages), it is not appropriate for network descriptions.

To belabor the point: classical Link Grammar uses +/- connector directions in place of beta-reduction. A classical LG disjunct like S- & O+, denoting a transitive verb that connects to a subject S and an object O could have been written using typed lambdas as

   lambda:O.x:S

which is just lambda.x there x is typed to be S and the result of the beta-reduction with x (a common noun) is of type O. Clearly, using typed lambda calculus is awkward for expressing grammatical rules.

A different solution to this problem is seen in pregroup grammars and in combinatory categorial grammars (CCG), which use forward and backslashes to indicate direction. Thus, notations like S/NP or VP\S show up in those notational systems. The slashes are type constriuctors, and VP\S is the construction of a type that indicates the object of a transitive verb. In Link Grammar, this is isomorphic to the O+ connector. The isomorphism is explicitly given in this PDF.

There are only two slash directions in CCG for the same reason that there are only + and - in classical Link Grammar: it works just fine for the English language. For languages that have freer word-order, subjects, objects, verbs, adjectives can appear in any fairly loose order. Classical Link Grammar was patched up to also provide h and t connector types, denoting head and tail in a conventional, classical dependency grammar formulation. And so now, instead of two connector-direction types, there are four. Each set is heterosexual, so + mates to - and h mates to t, but the combinations are more complicated: h+ can mate to t- but not t+ or h-, and so on.

To deal with more complex mating rules than just "input" and "output" or "command" and "result", while also avoiding the need to to "Schonfinkelize" or "curry" the "arguments" to a lambda, Atomese introduces the SexNode, so as to provide more complex mating rules.

To recap: lambdas and beta-reduction is sufficient to get a Turing-complete system, but leads to awkward notation for daggar-symmetric monoidal categories and linear type systems, where two directions need to be acknowledged. For most natural language, e.g. English, it is enough to throw away the "symmetric" part of the category, and two directions are still enough. For free-word-order langages (or freer) such as Turkish, Lithuanian, Finnish, etc. it is convenient to have more than two connection possibilities. The SexNode provides a good way to do this, in Atomese.

Similar Systems

The ideas outlined above are widespread in modern software. There are already many systems in the software world that already do most (but not all) of the things mentioned above. Here's a lightning review of some of these systems, focusing on the similar components.

ROS

The Robot Operating System (ROS) already implements half of the ideas described above. It has explicit sensor devices, and explicit motors, each of which can be hooked up to others for perception and action. The hookups take the form of unix UDP pipes, although recent ROS has moved to a message-passing system. Once a network pipe is connected (linked) and started, the data flows continuously, forever, until the pipe is closed.

Both sensors and motors are described with YAML files. These are the analogs of the Link Grammar dictionary. Other YAML files provide the netlist of what hooks up to what: they describe the actual robot. They list what sensors and motors are actually used in a given robot, how they are attached to one-another. The robot-engineer designs these linkages. Hitting "run" hooks up all the devices, allowing data to flow on the network pipes.

One can write the netlist YAML files by hand; there are also GUI tools that can be used to drag-n-drop sensors/motors and to create links between them.

The primary issue here, from the perspective, is that none of this is done in Atomese, and there's no particular sense of Link Grammar connectors or linkages. These concepts are "ad hoc" in ROS. The robot designer implicitly knows them and uses them, but there's no explicit manipulable system or API for dealing with this. Yes, the GUI designer can open a YAML file, and "parse" what's inside of it, but the "parsing" is naive and ad hoc. The YAML files are stored as text files in a conventional file system.

This presents a challenge for an AGI system looking to have a robot embodiment: it has to navigate a file system, find a bunch of YAML files, edit those YAML files, examine megabytes of debug output dumped into log files in yet another location of the file system ... there's no YAML file that describes the logfile, that says "heres how you can read a logfile and debug it". The autopoetic aspect is missing in ROS.

Off-topic remark: writing and tweaking ROS YAML is hard. It is comparable to the difficulty of writing Atomese. Both take time and effort to learn.

Github actions, Circle-CI

If you use github actions, some of the above might remind you of the github actions or circleci YML files. This is not an accident: the circleci files are describing a process network flow to perform unit tests, where the sensory device is a read of a git repo, the agents and actions are to compile & run the unit tests, and the output is whether the unit tests passed or failed.

Node.js

If you've ever screwed around with javascript node.js then you know package.json and package-lock.json These are sensory-device description files (or, more accurately, agent-description files), in that they describe what a particular node.js device provides: the inputs, the outputs of the "node". You then use npm run make to parse the system and perform all the hookups. You will also be familiar with the carpet burns you get from screwing with electron. Hooking together open connectors into a functioning network is difficult.

The idea of node.js is hardly an accident. Shortly after the very first-ever websites were created, and the very first cgi-bin scripts were written, it became clear that a sophisticated website needed a complex process flow, of users logins flowing to an authentication agent which authorizes user access to a shopping cart and enables the movement of that shopping cart to the one-click checkout page. This is a complex functional flow, where lots of things have to flow into other things. So, of course, you have to solve a network design and dataflow problem. It has to be solved in such a way that the devops people can do their job. And so onwards ho the march of technology.

UML

The Unified Modeling Language provides a diagramatic system for describing process flows that occur in software design. It is broad enough to have been extended to other industrial process flows, but with limited sccess. As a graphical, visual notation for describing systems, helping human engineers understand the systems they are working with, it enjoyed some limited success.

One issue is that there's no automation per-se: software does not turn into UML, or vice-versa. There's no one-to-one correspondence, and the diagrams are ultimately hand-drawn. The diagrams are also a bit inscrutable: yes, there are labelled boxes with interconnecting lines drawn between them but what do those circles and triangles mean? Go forbid you draw a circle where a triangle was intended. Expending this kind of mental effort to draw UML seems a bit much. Compare to drawing electrical or electronic circuits correctly: the effort is worth it, because there is an actual physical system that corresponds, more-or-less one-to-one, to the diagram.

But even then: designing electronics with diagrams is hard, once one gets past a certain size. Both VHDL and Verilog turn electronic circuit design into an act of programming resembling conventional software programming. Using UML to convert software into diagrams is a technology that runs counter to natural human abilities. Yes, we think visually, but only up to a certain scale.

For large systems, the visual field gets too busy, and the forest gets lost but for the trees. For small systems, where one is an expert, the compact notation provided by programming languages is easier to work with precisely because it is more compact and dense.

Missing Concepts

The above systems seem to be missing several desirable properties:

  • Configuration is done with files, and can only be edited with GUI tools. By contrast, Atomese is stored in the AtomSpace, where it is searchable. Also, Atomese can be controlled, edited and altered by other Atomese. Atomese is "visible" to Atomese, and is thus controllable, editable and manipulable by Atomese.

  • The syntax of YAML files is not self-describing. If you want to know what some particular YAML does, you have to RTFM, and the docs are written in English. There's no way of opening a YAML file, and finding a machine-readable description of what's in there. The description is in English, and is usually on a website far away from the actual control file.

The goal of this project is to plug those gaps: to provide I/O devices that come with descriptions of how to actually use them, and to have those descriptions in a form such that logical reasoning can be performed.

Both of the above complaints can be waived away by noting that GPT-type systems can kind of deal with these complexities. A properly-trained GPT system will have had the ROS documentation and the node.js documentation in it's training set, and so it kind-of-ish already kind-of knows how this stuff works. The Microsoft Codepilot can already write semi-coherent snippets of code that are grammatically correct, and kind-of do sort-of what you want them to do, in a way. So maybe all this effort to design a low-level system that behaves correctly for sensori-motor processing is a waste of time, and I should just kicj=k back and wait for GPT and OpenAI to figure it out. Who the hell knows. It is a plausible answer.

Conclusion

There's no "one ring to rule them all". I am not aware of any generic comp-sci theory for exploring the autogeneration of networks. Doing Atomese Sensory is sort of my personal best-guess for the generic theory/system of "making things hook together into a sensible autopoetic system". The "basal cognition" for agent-environment interaction.

I'm not sure if any of this is useful, or if this is just a screwball low-level academic exercise in comp-sci trivia. That's why this whole git repo is labelled "experiment". Build it, and see what happens.