In this article I will survey the types and definitions of agents eventually focusing on those useful for engineering. Because it is simply silly to discuss software agents without distinguishing them from other known types of software, I will venture to offer a definition. It will be iconoclastic and perhaps applicable only to a certain type of engineering agent. But it will be useful in identifying some technical implementation issues.
The Franklin and Graesser paper is a good paper because it 1) surveys various agents, 2) presents a reasoned taxonomy based on features, and 3) avoids assigning any meaning to the word "intelligent". However, it proposes a "mathematically formal" definition: "An autonomous agent is a system situated within and a part of an environment that senses that environment and acts on it, over time, in pursuit of its own agenda and so as to effect what it senses in the future." This is, of course, not a definition any mathematician would recognize as being formal.The idea of "senses in the future" is just too open for interpretation to be an objective, much less formal, definition. Moreover, it equates being an agent with this quality of "autonomous" [Only autonomous agents were defined - other kind of agents may exist. - Private Communication from Stan Franklin, June, 1996.]
For Foner, an agent is necessarily "intelligent" and "autonomy" is just one crucial characteristic. His definition of autonomy has a bit more of operational semantics: "This requires aspects of periodic action, spontaneous execution, and initiative, in that the agent must be able to take preemptive or independent actions that will eventually benefit the user."
There are three major problems with attempts to define "agents" as "intelligent". First, as I have alluded above, the meaning of the adjectives "intelligent" and "autonomous", so far, are subjective labels. The Foner definition suggests that there might be a test for autonomy, but saying that some action is "preemptive" or "independent" does not get us far. This definition of intelligence, as do all, depends upon the opinion of an intelligent observer after interacting with the candidate agent.
Furthermore, the example agent, Julia, does not exhibit much initiative. The fact that Julia maps a maze without direction from users with whom she interacts does not distinguish Julia from almost any other software that performs a background task while answering queries from users and performing other tasks when directed, such as message forwarding. In fact, Julia never interrupts to volunteer information except to deliver a message as directed: she speaks only when spoken to. Julia's claim of intelligence is much more of the Eliza sort: Julia strikes users as a person. And indeed, the implementation and documentation suggests that Julia is intended to pass a Turing test just above the level of Eliza.
Second, these subjective labels are applicable only to an epiphenonmenon rather than a design objective. Except to pass a Turing test, no one sets out to build an "intelligent agent" as that is a poor target for software. One sets out to build an agent that accomplishes a task in hopes that the task is so difficult or it is so well-accomplished that the agent might be considered intelligent or somehow self-directed. This begs the question of why the agent is one, and not some other kind of software.
Third, various definitions of intelligence exist, but the main deficiency of such a label is that it does not sufficiently distinguish the resulting software from other technologies that may also claim intelligence as an attribute. One can take any definition of intelligent software that covers the work in Artificial Intelligence and find that it does not serve to distinguish "agents" as a kind of software. The point is that if it is claimed that to be an agent is to be intelligent, then we have still begged the question of what is an "agent" apart from all of the other intelligent software that has been developed.
Nevertheless, autonomy seems to be central to agenthood. For instance, Pattie Maes' Autonomous Agents Group clearly has identified a group of good research projects under a common theme. And Reddy, Foner, Franklin, and Graesser all point to autonomy as critical to the notion of an agent. But what is an operational, objective definition of autonomy? Is there even a subjective Turing test for autonomy? Is there autonomy without intelligence?
When the terms "autonomous" or "intelligent" is used it is clear the user means the software to be something more than a mere server, mobile or not. Often, the term is only a reference to a context of a community and technology. With respect to agents, the "intelligent" label often refers to a concern with abstract, domain-independent theories of agent architecture and communication and/or aspects of human characteristics. That "autonomous" is emerging as an important characteristic does not mean that it is yet sufficiently well-defined a term to have a formal technical meaning.
Just to drive this into the ground, one might say that intelligent software that is accessible via the Internet is an agent, but this would then include a continuously running expert system to which one could open a remote display. Where's the novel technology in this? Very simply, the term "intelligence" does not sufficiently specify agenthood. The term "autonomy" gets at something more in that expert systems are not. But how did I know that? What's the objective operational definition? And, what difference would it make in software design? These are the sorts of questions one needs to answer in designing agents for engineering.
Finally, we note that there are formal definitions of autonomy and agenthood. Wooldridge and Jennings give a comprehensive overview of theories of "strong" agenthood in their paper "Intelligent Agents: Theory and Practice". It is just that there are many theories and this is why using subjective terms (e.g., "intention" and "belief") make agenthood debatable. Similarly, as previously noted the Franklin and Graesser paper also uses subjective terms for in their formal definition of autonomous agents. This is not to say that such terms should not be used or formal theories not developed: just that "intelligent/autonomous agents" is a term that, for the moment, is not of obvious utility and competing theories are best left to the research literature. These definitions have nothing to do with the World-Wide Web and are not very helpful for the integration and interaction of engineering agents, as we shall see.
Some examples of these agents include the "BargainFinder Agent" and Cyber Yenta, which perform searches for the user. The first is incredibly simpleminded and and the latter an incredibly simplified version of the original MIT Yenta work/, which is still in progress. The claim to intelligence here is basically string matching. Along the same lines, but somewhat more "AIish", CompassWare offers an Intelligent News Filter that parses natural language to perform a search.
Certainly it is far from clear that any of this web-based software should be described as "intelligent", regardless of the definition of "agent". This point is well-made in Foner's paper as well as in articles in the trade press[Griswold 96]. Rather than dwell on the fact this software is not so clever, I would like to note there are more useful descriptions of this kind of software than "agent".
BargainFinder, Yenta, and CompassWare are essentially one-time query answering mechanisms, much like the "MetaCrawler Multi-Threaded Web Search Service" (and the very userful Ahoy!). And even though AlphaConnect searches even legacy systems and translates it to a variety of formats, it is still a search service. It is notable in that updating is automatic, but since this happens according to a user-defined schedule, much like the automatic timer in your house, it is also not very autonomous.
The term "agent" may connote that these software services contact other sources of information and compile it according to the parameters set by the user, but we already have a perfectly good word in computer engineering for such mechanisms: "server". Notice also that these servers also do not move far from the familiar database servers that answer carefully formatted questions. Calling these servers "agents" may be good marketing but obscures the technical understanding of the mechanisms. While I am against proscribing the use of the term "agent" in general, I find it helpful to understand that these are servers in the same sense that a (perhaps distributed) database server is: I send a query and get back a response. No other behavior is implied by "server" and none is exhibited by these "search agents".
Then there is the software claiming to be "intelligent agents" because the software is mobile and can go from machine to machine performing tasks on behalf of the human that spawned the agents. One of the more well-know examples of enabling technology for these kinds of agents is General Magic's Telescript. Sun's Java is often also touted as this kind of an agent development technology though its "applets" are even less likely candidates for agenthood than Telescript's remote processes and a characterization of these as "agents" is highly controversial among writers to the agents email list. However, at least one vendor has used the Java technology to build a competitor to Telescript: CyberAgent (not to mention research efforts such as Bill Li's Java-To-Go framework). Let us agree to call some software applications built upon this technology "mobile agents", but understand that the crucial technical meaning is an infrastructure (e.g., a "Listener" for CyberAgent) that allows processes to run securely on foreign machines. That this functionality has previously existed in other computer engineering mechanisms (e.g., RPCs, telnetting, and distributed computing) does not distract from the utility of these new mechanisms. Let us understand this is what is meant by "mobile agent" though "mobile process" would be less confusing.
We follow Genesereth's approach, but differ somewhat from the definition of this paper in light of our experience with Next-Link agents and comparison with other KQML-like agents, our Typed-Message Agents are defined in terms of communities of agents. (We may also call these "ACL Agents" after Genesereth.) The community must exchange messages in order to accomplish a task. They must use a shared message protocol, such as KQML, in which the some of the message semantics are typed and independent of the application. And semantics of the message protocol necessitate that the transport protocol not be only client/server but rather a peer-to-peer protocol. An individual software module is not an agent at all if it can communicate with the other candidate agents with only a client/server protocol without degradation of the collective task performance.
The typed-message requirement also differentiates agents from object-oriented programming technologies that also use message passing to collectively perform tasks. The difference is the commitment to an application-independent protocol of typed messages. This is not to say that, for example, KQML agents could not be implemented using OO techniques. But OO programming does not make any commitment to such a protocol. CORBA does not make such commitments, though it makes others. It is the commitments that define a technology. (Whether these are good or bad commitments is another issue.)
Such printer daemons as agents is still a counter example - they go against the intuition, especially that of autonomy, which continues to lurk in the background. Agents should be more, but more what? Imagine a printer daemon that not only sends you a "Sorry" message but remembers why the request did not work. Suppose the fault lay with the inability to fetch a file on a remote machine that was down. Suppose the next day, the remote machine comes up and the printer daemon sends you a "Reply" or a notification referring to your previously denied request and asking whether you would like that file printed after all? This daemon begins to feel more correctly labeled as an agent. What can one point to from a systems standpoint that makes a difference?
This last daemon/agent sent a message that was not a simple one-time response to a request. Instead, it seemed to volunteer information. It initiated a message. If it had been a mere server, it could not have done this. A client/server protocol, admitting one reply to one request, would not have permitted this transaction. The point is that client/server protocols do not allow servers to initiate messages, later volunteering surprising but useful information. The protocol must be peer-to-peer to allow this. Thus is a peer-to-peer protocol a necessary condition for at least typed-message agenthood.
Behind this emphasis on protocol is the intuition that "real agents" save the state of part of the collective problem solving task and contribute to the task by reasoning about changes in that state. This is a very real criterion, relevant to autonomy, but difficult to demonstrate. However, the initiation of multiple messages that are relevant to an earlier query is obvious and can be objectively determined. It may not be a sufficient condition for "autonomy", but we claim, as a thesis, that it is an important necessary condition. Moreover, it is a useful condition, because this protocol criterion also has computational consequences.
As a technical example, any one who has tried to use a CGI-bin program to communicate with a KQML agent will quickly discover the inadequacy of HTTP for this purpose. Such an example can be found in the Redux' Trip Agent demonstration. Multiple messages from the KQML agent are lost with the standard HTTP protocol of request/reply. The only remedy is to hold open the connection and use the advanced HTTP function of "server push". The addition of "client pull" and "server push" with HTML 3.0 effectively makes HTTP a peer-to-peer protocol and thus useful for agent communications, though this is an awkward "workaround" for a basically client/server protocol.
There may be further requirements on agenthood not covered by this necessity for peer-to-peer protocols. For example, the use of a public explicit ontology that allows term usage to be reasoned about for collaboration makes for stronger agenthood. Perhaps others will develop more counter examples that will sharpen the intuition about surprise and message exchange. For instance, database servers that have publish-subscribe notions with peer-to-peer message protocols, if implemented with message types such as "Subscribe", could be considered typed-message agents by this definition, yet database systems are usually considered to be too simple to be agents. Whether this is because of ignorance of advanced database functionality or whether further sharpening is required, the omission of mere "servers" is sufficient for our purposes. We also note that our criterion is a continuum: an agent is an agent to the degree that it collaborates with other agents using volunteered messages.
Engineering agents are typically research projects such as the Lockheed COSMOS system and the MACE application agents, the CONCUR examples, the CIFE agent projects (especially the ACL effort), the STRAND finite element analysis system of agents, and the Next-Link agent framework. All of these examples are typed-message agents; in fact, they are KQML agents, though the "flavor" or KQML varies. These agents also make varying degrees of commitments to a content language, such as KIF, and ontologies. Perhaps most important, they all allow engineers in multiple disciplines to collaborate with other engineers and software services. The engineer is generally aware of these other agents and exchanges messages with them.
Most of these engineering agents are focused on a particular engineering project application, following the PACT example that generated the KQML infrastructure. The MACE and Next-Link applications have similar features to PACT. All of the engineering agents of these examples have one thing in common - none of them use web-based interfaces or agents, unlike the engineering servers. That is, the distinction between servers and agents is not merely academic - there has so far been a real schism between web-based servers and agents in at least the engineering domain.
A major reason for this is the requirement for peer-to-peer protocols as discussed in the previous section. Server push capabilities and Java are new and were not originally part of these projects. But there is another major factor as well.
Consider the successful MADEFAST experiment in collaborative design. Several universities and companies collaborative designed, from scratch, and built a prototype missile seeker in six (6) months. Several Internet-based collaborative tools were tried and the WWW was clearly the most effective common technology. But no agents were used, nor have they been since.
Rather than protocol, in this case, the problem is the structure of information. Web pages are structured to the extent that HTML is, but none of the HTML tags correspond to the type of structure required by the engineering typed-message agents. HTML tags describe format. Agents need task-based semantics. It does no good to know that a word is bold-faced to an agent that needs a task-level computable structure. The extensive web pages that document the MADEFAST design cannot be read by agents and so they cannot participate.
The reverse is also true. Agents typically don't produce web pages. It's not that they could not. But doing so is not part of the commitment that defines the typed-message agent technology and is not needed for computation. Thus the web and these agents tend not to interact, but rather go their separate ways.
Servers use forms and values to obtain structure (and do generate web pages). But, in general, this approach is not very different from the OO programming paradigm. The messages, consisting of named values, being sent are entirely domain-specific. There is no shared common protocol, such as KQML, with some typed-semantics that are domain-independent. Thus the web environment, with its fundamental client-server nature and unstructured data is not conducive to agents - it might even be called "hostile".
Some non-engineering agents have been connected to the web and don't have the problem of multiple messages from agents. The initiation of messages, for collaborative tasks, will be found in AI-style intelligent agents such as collaborative email filtering. The distinction between previously discussed web servers and collaborative agents initiating messages shows up strikingly in the recent commercialization of some of the MIT Autonomous Agent research such as "Firefly", an intelligent music recommendation service, Webdoggie", a personalized document filtering system, and the Cyber Yenta. Each of these is built upon the notion that agents that help people can help better if they learn from each other. So an agent recommending music for you, The Similarity Engine, for example, will try to find agents that are recommending music for other people like you. Thus, in the background, messages are being exchanged, and presumably, volunteered in an autonomous manner, though this is not clear from the documentation. However, for the user, this appears to be just a server. A query is made and an answer is returned.
The nature of web communications, in some sense, trivializes the agent collaboration process. The web is client/server-based and the autonomous agents require peer-to-peer communications. Thus agents become an an underlying technology for a web server, but it is easy to distinguish the behavior on the basis of message exchange. The result is that there is no agent behavior observable to the user. Users are not aware of other users or of other agents. Thus there is no problem with multiple messages and clients.
The web-based problems remain with engineering agents that do need to connect users in a collaborative task. Java seems to offer a more advanced, flexible approach for web-based agents, especially Rob Frost's Java Agent Template (JAT) that facilitates writing Java agents that send KQML messages. This approach is very promising in that it will allow people to interact with agents through browsers with nice interfaces. (NOTE: JAT was superceded after publication of this article by JATLite.
However, our group is already working on version 3.0 because we've found fundamental problems, mostly with respect to the need to have open peer-to-peer connections between the agents and browser clients. In particular, the client nature of browsers is reflected in the limitation that an applet can only open a connection to the server that spawned it. In order to send messages to multiple agents elsewhere on the Internet, it is necessary to write an agent router. This agent router must also keep open connections, another deviation from the web connection paradigm of one connection per request that requires substantial changes to the first versions of the JAT.
After the JAT allows multiple client browsers to connect to multiple agents, we will have a first integration of the web and agents at the level of access to agents. We still will have no access by agents to web pages. That is, we will have solved most of the protocol problem and will have some nice demonstrations, but web pages will still be unreadable by agents. Something more is needed.
There are two promising approaches to the structure problem. One is the conversion of web pages to relational databases in the Stanford Infomaster project. (There is related work in the reverse direction [Dossick and Kaiser 96].) Another approach starts at the data authoring. The ABSML approach is to extend HTML to include tags that include semantics. ABSML is currently used in Lockheed's MECE engineering design documentation system. We are working with Lockheed to extend this approach to include tags that can also be meaningful to Next-Link agents. It is not clear that a mark-up language like HTML is sufficient for this purpose and we may have to convert to a more specialized design language.
The same method of extending HTML is being used to document decision argumentation in Zeno. A similar, but more flexible way to add semantics to HTML documents is to add tags that refer to ontologies. This seems the most promising approach for providing access for agents to web pages.
These two fundamental sources of incompatibility must be addressed for each to leverage off the other. The JAT work seems to offer hope for overcoming the protocol problem. The lack of semantic structure in HTML documents is an even larger problem but may be addressed in the future by advanced authoring tools and by programs that can read extract semantics from web documents. Relatively simple examples of both approaches exist today. However, much work remains before useful engineering agents emerge on the web.
Finally, we note that many of the agents being developed for engineering applications are of the "weak" kind in that there is no commitment to powerful reasoning by the individual agents. In fact, "dumb" legacy systems can be accommodated by the typed-message approach that only commits to an application-dependent protocol. The protocol may be derived from a "strong" theory of agents, as advocated by Haddadi [Haddadi 96], or from a theory of design as with the Next-Link agent protocol. In both cases, the result is that typed-message agent-based systems can add value to engineering systems and even integrate heterogeneous services though no individual agent might be characterized as "intelligent". In short, weak agents can be powerful as well as being well-defined.
[Griswold 96] Griswold, S., "Unleashing Agents The first wave of products incorporation software agent technology has hit the market. See what's afoot," Internet World, 7:5, May 1996.
[Haddadi 96 Haddadi, A., _Communication and Cooperation in Agent Systems : A Pragmatic Theory_, Springer Verlag, Lecture Notes in Computer Science, No. 1056, 1996,
[Reddy 96] Reddy, R., "To Dream the Possible Dream," 1996 Turing Award Lecture in Communications of the ACM, 39:5, May, 1996.