This white paper is a (DRAFT) companion to the more formal
"Agent-Based Project Management" paper.

Process Coordination

Introduction

Businesses today understand the importance of process to the extent that they will invest in process reeingineering, based upon enterprise modeling and organizational models such as VDT and the Process Handbook and PIF. These address static process and organizational design but do not provide process execution management.

Industry understands integration to the extent that it is willing to invest in SAP, PeopleSoft and similar systems for linking software applications but which do not provide runtime coordination of these systems.

Companies understand information to the extent that they invest in data warehousing and similar knowledge management techniques in order to facilitate access to information by people who needed. But these techniques do not provide any support for runtime notification of people who need information.

There is another, active, level of process integration that is missing. It is one that spans organizations and directly connects peers and provides change to notification exactly to those who need it the most, when they need it. Process Coordination is the management of dependencies among tasks and agents in an enterprise in order to reduce time and costs and improve the outcome. People who do the work and have the most knowledge are placed into contact with one another on an as-needed basis.

Isn't that Workflow? Project Management?

For a good overview of traditional workflow, see http://cne.gmu.edu/modules/workflow/. Basic workflow systems manage documents and their processing. A designer determines what kinds of actions need to be taken under what conditions by whom and encodes this into a workflow process. People assume roles in processing the documents.

Some of the newer systems can be entirely web-based such as the Meteor system, while older ones are based on proprietary software clients. An example of the move toward the web is Fujitsu's incompatible change from TeamWARE to I-Flow, the later being based on Java and HTTP. Most vendors now offer web-based products, but they have been largely proprietary and not inter-operable.

Sun, Netscape, Hewlett-Packard and others, including many workflow vendors originally supported an emerging standard called SWAP (Simple Workflow Access Protocol) as part of the Workflow Management Coalition. This proposed protocol was a simple extension of the forthcoming HTTP 1.1 and XML standards. This standard was intended to provide a vendor-independent way to define processes, actions, and documents, allowing heterogeneous workflow systems to inter-operate. However, it was eventually recognized that a) doing this in the HTTP protocol was a bad idea, and b) that XML provided a much better platform for such a standard. In 1999, the Coalition has released a first draft of an XML-based interoperability standard.

Moving Documents

All of these systems facilitate the invocation of various actions on the documents and their processing. This emphasis on documents comes from the origions of workflow in the digitizing of documents and forms routing. It is a very powerful notion: paper can be electronically processed according to a general plan of who should do what under what conditions. But this is more of "collaborative" document sharing than real-time coordination of tasks.

One step up the chain of management technology is the BidCom that focuses on integrating construction project participants with an elaborate and carefully-designed system for sharing information and views using Internet technologies. This is not a task-based execution management system, but it does help people share documents and information, and seems to be a dramatic jump in standard practice. But there is no systemic technology for the coordination of tasks beyond that of standard project management and workflow technologies.

Flores/Winograd Loop

Action Technologies and other have elaborated on a workflow system that moves more in the direction of coordination, based upon an insight by Fernando Flores and Terry Winograd in the late 1970's. This is the idea that one can analyse processes in terms of loops of requesting, making, and fulfillment of commitments between people. Again, this is another simple and powerful idea that has shown to be very useful.

A Static Process Model

But like the previous workflow systems, this coordination-biased workflow methodology is a static analysis of the process that generates a runtime support system based upon relatively limited conditions. Moreover, static process analysis has come along way since then and is now much more sophisticated. For instance, consider the design matrix work by S. D. Eppinger at MIT.

A Simple Project Task Flow

If one looks at the state of the art in managing the execution of a process of people and software trying to accomplish tasks, one moves into the domain of project management. Such systems permit a more thorough job of analysing, planning, and scheduling the process before it begins.

Complex Project Task Flow with Roles

Project Scheduling

These support systems generally take a task-oriented approach rather than the workflow document-oriented approach. This means that documents are only one kind of input/ouput exchanged between tasks. And tasks have important attributes like duration, precedence, and start and stop times. This information is most often used to optimize the process, espcially if the organization can be taken into account.

Task Monitoring

However, both workflow and project management techniques can be used not only for static analysis, but also, to a limited extent, to monitor the actual execution of the process. At least the status of tasks can be tracked, the consumption of resources, and whether tasks have been accomplished on time. One knows to whom the results of an accomplished task need to be sent. Such runtime support for managing the process execution is the minimum for Process Coordination.

But these technologies still concentrate on anaylsis of the process prior to exection. The real complexity of projects comes in trying to carry out a plan under chaning conditions, expecially when the project has many players. Even though the model of tasks in Project Management is more sophisticated than that of Workflow, it is still insuficient.

To see this, consider a situation in which it was planned that the roof of a building had to be put on before the wall could be plastered. This is an example of task precedence, used in project management. But suppose the reason for this is that the planned roof tiles were so heavy that they would buckle the wall, cracking the plaster if it was put on first. And then suppose the architect changed the roofing tiles to a much lighter material, that might not have this effect. The plan should now be re-evaluated for the possibility of working on the roof and wall in parallel. No project management tool, much less workflow, will take care of such cases of process execution change.

Moreover, tasks change as companies and people change. Traditional workflow systems do not allow people to decide on the fly how best to accomplish the current task, including who should be notified next and what documents to provide to them.

The key innovation that is needed for real Process Coordination is change propagation for distributed projects.

Process Coordination

Process coordination is not process control. Rather than a process being tightly or centrally controlled, process coordination allows individual agents to make good decisions and be notified of changes. Process coordination is more than simple workflow management that determines to whom the document should be next routed.

Process coordination is the runtime determination of who should be notified of what effect of the last change in the project. Computer support for this means some automation of the notification and tracking of task properties as they are created or change. Process Coordination can be as simple as informing someone that as a result of a new order, the shipments are being aggragated in another warehouse and therefore a forklift is not needed in the previously assigned shipping dock.

Distributed Integrated Project Management (DPIM) is an extreme form of Process Coordination in which design, planning, scheduling, and execution are interleaved accross distributed organizations and engineering disciplines as well as computer tools.

The Trouble with Large Projects

Process coordination is most complex in domains that require artifact design and construction planning, especially when the artifacts are large and many people and software tools must be coordinated and managed.

When an artifact, such as an airplane, is designed and constructed by a federation of companies and individuals who work in different places, and perhaps even time zones, the problem of determining whom should be notified of what change becomes more complex. Originally, airplanes were designed by a group of engineers working in a single room. This is no longer the case. In fact, while thousands of companies may work on a plane, fewer than a hundred people may ever touch it prior to completion.

The international space station shows the problem in its extreme form. Multiple companies in different countries, in different engineering disciplines, must cooperate and coordinate their plans and activities. Typically this is done under the agis of a master contractor, such as NASA. All engineering changes filter up through an extensive management hierarchy until they arrive at a sufficient level of abstraction that either by prescribed procedure or by the intelligence of some engineering manager, a connection with some other part of the project is peceived and their approval can be requested.

Such processes and procedures are time-consuming, costly, and generally inefficient. This way of doing buisiness places a severe upper bound on the complexity of projects that can be successfully completed, even with time and cost overruns.

When design changes cause plan and schedule changes, the problem is worse than simply modifying the design. Somehow, all the people assigned to the affected tasks, and no one else, should be notified of the change and how it affects them. If the task of procuring spare parts is negated by a change in design so that the part is no longer necessary, then procurement should be notified. Such lack of notification sometimes occurs, causing wasteful expenditures, especially in large projects.

This difficulty is reflected in the expense of coordinating such projects and in suboptimal results. The former is revealed by a measurement showing that %51[McKinsey] of labor is interactive and at Boeing a survey shows that %60 of the labor is devoted to coordination work[Benda].

The problem is not lack of connectivity. With the Internet, and intranets, combinations of email and groupware have enabled everyone to reach everyone else with ease. In fact, one could make a case that the ability to task each other so easily is actually making the problem worse. The problem is that there insufficient structure supporting the distributed task interactions of modern enterprises, especially for project management.

Let us examine three characteristics of design engineering, planning, development, and maintenance of modern planes, buildings, ships, and similar complex artifacts that make them especially difficult:

the inevitibility of change in both design and planning,
the complexity of task interactions,
the distribution of knowlege and tasks).

Change Inevitibility

In general, most project plans can be tossed out the door about a week after they are first made. Only the largest tasks can be tracked and rescheduled. And often project planning becomes constant rescheduling because the interactions among the tasks cannot be seen and managed by all the participants. And there is no escaping such change.

Even if one plans very carefully, and makes the process as efficient as possible, and even if all possible task interactions were correctely and completely analysed, change will be necessary because of incomplete information and events that occur in the world that are not controlled by the planner. Planning is always a form of "guessing"; the plan and schedule will have to change as execution goes on, and more information becomes available.

In addition to simple uncertainty, the sources of plan change are typically changes in the design, unexpected conditions in the environment such as poor or delays in transportation, and tasks being executed so as to complete faster or more slowly than scheduled. Resources may be unexpectedly unavailable or, on the other hand, may be unexpectedly better than expected. If the crane subcontractor has bought a new crane that can lift the steel to the higher floors faster than planned, the plan should be optimized to take advantage of this.

But even if one could conceive of a large project without change orders, change is still inevitable for large artifacts.

If we consider artifacts such as planes, space stations, and even office buildings, we can see that they are long-lived and important to maintain. However, the longevity makes maintenance and change even more difficult as new people replace original team members. The longer-lived the artifact, the worse the problem.

Thus, we can summarize the sources of change as:

initial uncertainty,
incomplete or undesirable design,
unexpected environmental conditions,
unexpected execution completion,
unexpected resource availability, and
long-term design changes.

Interaction Complexity

Plans also change in order to resolve conflicts arising from unexpected task interactions, such as the discovery that planned work will completely cutoff access to an office that must remain in service.

The complexity of task interaction makes change in design alone difficult to manage. A change in operating voltage necessitates a change in transformer size which necessitates a change in mechanical design which then propagates further to other engineering disciplines in the project. That is, just at the level of design, artifact feature interaction requires management.

Actually developing an artifact, even a prototype much less a finished product such as a building, further complicates the problem. One of the problems is that it is almost always the case that one must plan, schedule, and execute the plan simultaneously.

Planning generally refers to task definition, including precedence relationships and duration. Scheduling involves setting start and stop dates for the tasks. If the tasks change at the plan level, then some rescheduling must occur. Conversely, changes in the schedule may cause the planner to consider different ways to accomplish the task. One cause of rescheduling is that the time of completion of the task was unexpected. If the time was later than expected, this causes a problem that must be solved. If the time was earlier than expected, it may be possible to shorten the entire project. Thus, there are interactions between task planning, scheduling, and task execution.

Design changes also feed back into the plan. This can occur simply because conditions force design changes prior to completion. Or the design may not complete by the time construction begins.

``Fast track'' construction attempts allow the architect to finish or change the design after construction has started. These design changes frequently necessitate a change in the construction plan and or schedule. As a simple example, suppose that the design calls for concrete roof tiles. Then the wall plaster must be applied after the roof is built, or the heavy tiles will deform the walls causing the plaster to crack. But if it is decided later that lighter fiberglass tiles should be used instead, it is no longer necessary to wait to plaster the walls - the two tasks can be more concurrent and the plan schedule shortened. Thus there are interactions between the design and the plan and schedule.

Distributed Knowledge and Tasks

The problem is worse than simply changes due to unexpected contingencies and task interactions.

For example, suppose that we are designing and building a prototype of a new gyroscope for use in navigation equipment. This involves designers with different engineering expertise as well as those who know how to plan and construct parts of the device. Suppose further that an engineer decides that high resolution encoders is not a better design than the rate sensors plus low resolution encoders in the current design. Certainly the machinist who is designing the frame to hold the components should be notified of this change.

Furthermore, there can be no central planner/schedule in charge of all tasks as the machinist alone has the necessary expertise, and knowlege of machining resourcs, to plan and schedule the complex machining jobs necessary to complete the prototype. So somehow we have to allow for distributed planning and scheduling in which each player can refine locally assigned abstract tasks. This means also that the the overall plan and schedule need to be modified.

Planning simply cannot be done by a single person in a single place for large distributed projects. No one person is has all the knowledge needed to refine the plan and manage changes. Planning of tasks and subtasks must be done by distributively by the individuals best qualified, just as design is distributed among engineers with expertise in different disciplines.

So the problem is how to manage the interactions among the design, plan, schedule and execution when all four are distributed throughout the life of the project.

What is Required

In looking at these difficulties, the reader may see a pattern emerging: in all cases, what is lacking is a structured way to notify the right people about the effects of changes. This is what a DPIM system would provide.

The key functionality required to manage development of distributed projects is the proper notification of participants given any change in design, planning, scheduling, or execution of the artifact development. This, in turn, requires a good model of the dependencies between various elements of the design, plan, schedule, and state of execution.

Notice that this does not mean just sending someone a message that something has changed. The model must help the participants understand the effect of the change. It is important to send people the right message. For instance, suppose the machinist has been given the task of including in the gyroscope frame the receptacles for parts of the frameless electric motors. It is insufficient to send this machinist the message that the selection of motors has changed. If framed motors are to be used instead, then the machinist should be told that the task of machining such receptacles is now redundant, and why.

What is Missing

Traditional project management methods are not sufficient for managing the many tasks in the design and development process and fall short of providing DPIM. They do not take into account all the sources of change, the task interactions, and the necessity for distributed planning. They do not provide proper change notification: notifying the right agents (people or software) of the effects at the right time in the process.

If one considers single-user project management tools, such as MacProject or MS Project, one can see immediately that they are completely inadequate. The model is that a single general contractor makes decisions and changes and then somehow notifies the people involved. Notifications to the general contractor that cause changes and the notification of the people affected by the change are simply not part of the computer support, though very much a part of any project. The only distributed support from such products is that emails may be sent when a task is assigned and completed.

AutoPlan is in contrast a distributed project management tool that provides an electronic blackboard for change notification. This allows people to receive email when a pre-specified type of event occurs, such as assignment of a task. This definitely takes the computer suport of Distributed Project Management (DPM) one step further, but it is still based upon a single-user model of planning and the change notification is still primative. That is, the general contractor is still responsible for all changes and all of the change notification must be pre-specified, usually by the users.

The current lack of technology for coordinating design decisions and managing change over the life of a product creates higher costs, longer cycle times, and poorer quality than is currently possible. Occasionally, this lack of technology is even dangerous, whether one is maintaining an older passenger plane or decommissioning a nuclear weapon.

We do not have today good computer support for such design change notification. Instead, in small or medium sized projects, a general contractor may track the hundreds of informal change orders with a cork bulletin board and notes. That we do not provide better support for such projects limits the complexity of the project to that which can be managed by a single person.

But worse is the cumbersome formal change order process required on large engineering projects, which multiple levels of management approvals, with increased time and cost, in an attempt to catch most interactions. Usually, many more people are notified for a given design change than are actually necessary, burdening the whole design process.

As a result, plans are kept simple and rigid and many opportunies to improve the plan, or even to avoid mistakes, are missed.

Now imagine building the international space station with hundreds or even thousands of companies and engineers and contractors of all sorts. How can this mass of design decisions and changes be coordinated, efficiently and effectively?

In imagining the value of having a real DPIM system, one could ask this basic question in different ways. One is How much time and money would we save with DPIM in building today's cars and airplanes and spacecraft?. But another is What could we build with real process coordination that we can't build today? Initially, we could not see the applications for the early computers - we did not imagine the credit card and airline reservation systems we have today, much less the tertiary effects such as electronic commerce. DPIM involves bookkeeping also, but of a kind we have not yet explored.

We call for a new technology for managing engineering and construction projects. We have made a start with the ProcessLink Plan Manager research but this novel work is still a small part of a very large effort needed to address this crucial problem that limits the complexity of the projects we can attempt now as the century turns.

References

[Benda] Benda, M., internal survey of Boeing managers, 1998.

[McKinsey] McKinsey Quarterly, No. 1, 1997.

cjp

Last modified: Thu Sep 9 16:31:17 PDT 1999