Apollo On-Board Flight Software Effort: Lessons Learned , Development Before the Fact and Development Before the Fact Theory .

Apollo On-Board Flight Software Effort: Lessons Learned

The original purpose of the empirical study, which had its beginnings in 1968, was to learn from Apollo’ s flight software and its development in order to make use of this effort for future Apollo missions as well as for the then up and coming Skylab and Shuttle missions. A better way was needed to define and develop software systems than the ones being used and available, because the existing ones (just like the traditional ones today) did not solve the pressing problems. There was a driving desire to learn from experience; what could be done better for future systems and what should the designers and developers keep doing because they were doing it right. The search was in particular for a means to build ultra reliable software.

The results of the study took on multiple dimensions, not just for space missions but for systems in general; some of which were not so readily apparent for many years to come. The Apollo software, given what had to be accomplished and the time within which it had to be accomplished, was as complex as it could get; causing other software projects in the future to look (and be) less daunting than they might have been in comparison. The thought was if problems from a worst case scenario could be solved and simplified, the solutions might be able to be applied to all kinds of systems.

On hindsight, this was an ideal setting from which to understand systems and software and the kinds of problems that can occur and issues that need to be addressed. Because of the nature of the Apollo software there was the opportunity to make just about every kind of error possible, especially since the software was being developed concurrently with the planning of the space missions, the hardware, the simulator, and the training of the astronauts (demonstrating how much the flight software was part of a larger system of other software, hardware and peopleware); and since no one had been to the moon before there were many unknowns. In addition the developers were under the gun with what today would have been unrealistic expectations and schedules. This and what was accomplished (or not accomplished) provided a wealth of information from which to learn. Here are some examples:

It was found during the Apollo study that interface errors (errors resulting from ambiguous relationships, mismatches, conflicts in the system, poor communication, lack of coordination, inability to integrate) accounted for approximately 75% of all the errors found in the software during final testing, the Verification and Validation (V&V) phase (using traditional development methods the figure can be as high as 90%); interface errors include data flow, priority and timing errors from the highest levels of a system to the lowest levels of a system, to the lowest level of detail. It was also determined that 44% of the errors were found by manual means (suggesting more areas for automation) and that 60% of the errors had unwittingly existed in earlier missions—missions that had already flown, though no errors occurred (made themselves known) during any flights. The fact that this many errors existed in earlier missions was down-right frightening. It meant lives were at stake during every mission that was flown. It meant more needed to be done in the area of reliability. Although no software problem ever occurred (or was known to occur) on any Apollo mission, it was only because of the dedication of the software team and the methods they used to develop and test the software.

A more detailed analysis of interface errors followed; especially since they not only accounted for the majority of errors but they were often the most subtle errors and therefore the hardest to find. The realization was made that integration problems could be solved if interface problems could be solved and if integration problems could be solved, there would be traceability. Each interface error was placed into a category according to the means that could have been taken to prevent it by the very way a system was defined. It was then established that there were ways to prevent certain errors from happening simply by changing the rules of definition. This work led to a theory and methodology for defining a system that would eliminate all interface errors.

It quickly became apparent that everything including software is a system and the issues of system design were one and the same as those of software. Many things contributed to this understanding of what is meant by the concept of “system.” During development of the flight software, once the software group implemented the requirements (“thrown over the wall” by system design experts to the software people), the software people necessarily became the new experts. This phenomenon forced the software experts to become system experts (and vice versa) suggesting that a system was a system whether in the form of higher level algorithms or software that implemented those algorithms.

Everyone learned to always expect the unexpected and this was reflected in every design; that is, they learned to think, plan and design in terms of error detection and recovery, reconfiguration in real time; and to always have backup systems. The recognition of the importance of determining and assigning (and ensuring) unique priorities of processes was established as part of this philosophy. Towards this end it was also established that a “kill and start over again” restart approach to error detection and recovery was far superior as opposed to a “pick up from where you left off ” approach; and it simplified both the operation of the software as well as the development and testing of the software. The major impact a system design for one part of the software could have on another part further emphasized that everything involved was part of a system—the “what goes around comes around” syndrome. For example, choosing an asynchronous executive (one where higher priority processes can interrupt lower priority processes) for the flight software not only allowed for more flexibility during the actual flights but it also provided for more flexibility to make a change more safely and in a more modular fashion during development of the flight software.

It became obvious that testing was never over (even with many more levels of testing than with other kinds of applications; and with added testing by independent verification and validation organizations). This prompted a search for better methods of testing. Again, traditional methods were not solving the problem. With the realization that most of the system design and software development processes could be mechanized (and the same generic processes were being performed throughout each of all phases of development), it became clear they could be automated. This suggested an environment that would do just that, automate what traditional systems still to this day do manually. In addition, for the space missions, software was being developed for many missions all at once and in various stages, leading the way to learn how to successfully perform distributed development.

It was learned during this process of evaluation of space flight software that traditional methods did not support developers in many areas and allowed too much freedom in other areas (such as freedom to make errors, both subtle and serious) and not enough freedom in other areas (such as areas that required an open architecture and the ability to reconfigure in real time to be successful). It further became clear that a new kind of language was needed to define a system having certain “built-in” properties not available in traditional languages; such as inherent reliability, integration, reuse, and open architecture capabilities provided simply by the use of the language itself. In addition it was realized a set of tools could be developed to support such a language as well as take away tedious mechanical processes that could become automated, thus avoiding even further errors, reducing the need for much after the fact testing. It was only later understood the degree to which systems with “built-in reliability” could increase the productivity in their development, resulting in “built-in productivity.”

Lessons learned from this effort continue today. Key aspects are that systems are asynchronous in nature and this should be reflected inherently in the language used to define systems. Systems should be assumed to be event driven and every function to be applied should have a unique priority; real time event and priority driven behavior should be part of the way one specifies a system in a systems language and not defined on a case by case basis and in different programming languages with special purpose data types.

Rather than having a language whose purpose is to run a machine the language should naturally describe what objects there are and what actions they are taking. Objects are inherently distributed and their interactions asynchronous with real time, event-driven behavior. This implies that one could define a system and its own definition would have the necessary behaviors to characterize natural behavior in terms of real time execution semantics. Application developers would no longer need to explicitly define schedules of when events were to occur. Events would instead occur when objects interact with other objects. By describing the interactions between objects the schedule of events is inherently defined.

The result of this revelation was that a universal systems language could be used to tackle all aspects of a system: its planning, development, deployment, and evolution. This means that all the systems that work together during the system’s lifecycle can be defined and understood using the same semantics.

Development Before the Fact

Once the analysis of the Apollo effort was completed, the next step was to create (and evolve) a new mathematical paradigm from the “heart and soul” of Apollo; one that was preventative instead of curative in its approach. A theory was derived for defining a system such that the entire class of errors, known as interface errors, would be eliminated. The first generation technology derived from this theory concentrated on defining and building reliable systems in terms of functional hierarchies (Hamilton, 1986). Having realized the benefits of addressing one major issue, that is, reliability, just by the way a system is defined, the research effort continued over many years (and still continues) to evolve the philosophy of addressing this issue further as well as addressing other issues the same way, that is, using language mechanisms that inherently eliminate software problems. The result is a new generation technology called development before the fact (DBTF) (see Fig. 19.12) where systems are designed and built with preventative properties integrating all aspects of a system’s definition including the inherent integration of functional and type hierarchical networks, (Hamilton and Hackler, in press; Hamilton, 1994; Hamilton, 1994; Keyes, 2001), http://www.htius.com/.

Development before the fact is a system oriented object (SOO) approach based on a concept of control that is lacking in other software engineering paradigms. At the base of the theory that embodies every system are a set of axioms—universally recognized truths—and the design for every DBTF system is based on these axioms and on the assumption of a universal set of objects. Each axiom defines a relation of immediate domination. The union of the relations defined by the axioms is control. Among other things, the axioms establish the control relationships of an object for invocation, input and output, input and output access rights, error detection and recovery, and ordering during its developmental and operational states. Table 19.1 summarizes some of the properties of objects within these systems.

Combined with further research it became clear that the root problem with traditional approaches is that they support users in “fixing wrong things up rather than in “doing things the right way in the first place.” Instead of testing software to look for errors after the software is developed, with the new paradigm, software could now be defined to not allow errors in, in the first place; correctness could be accomplished by the very way software is defined, by “built-in” language properties; what had been created was a universal semantics for defining not just software systems but systems in general.

Once understood, it became clear that the characteristics of good design can be reused by incorporating them into a language for defining any system (not just a software system). The language is a formalism for representing the mathematics of systems. This language—actually a meta-language—is the key to DBTF. Its main attribute is to help the designer reduce the complexity and bring clarity into his thinking process, eventually turning it into the ultimate reusable, wisdom, itself. It is a universal systems language for defining systems, each system of which can be incorporated into the meta-language and then used to define other systems. A system defined with this language has properties that come along “for the ride” that in essence control its own destiny. Based on a theory (DBTF) that extends traditional mathematics of systems with a unique concept of control, this formal but friendly language has embodied within it a natural representation of the physics of time and space. 001AXES evolved as DBTF’s formal universal systems language, and the 001 Tool Suite as its automation.

The creation of the concept of reuse definition scenarios during Apollo to save time in development and space in the software was a predecessor to the type of higher level language statements used within the systems language. This included horizontal and vertical reuse that led to a flexible open architecture within the DBTF environment.

Understanding the importance of metrics and the influence it could have on future software has had a major focus throughout this endeavor. What this approach represents, in essence, is the current state of a “reusable” originally based on and learned from Apollo followed by that learned from each of its evolving states. It reuses and builds on that which was (and is) observed to have been beneficial to inherit as well as the avoidance of that observed not to have been beneficial. That learned from history was and will continue to be put into practice for future projects. Someone once said “it is never surprising when something developed empirically turns out to have intimate connections with theory.” Such is the case with DBTF .

Development Before the Fact Theory

Mathematical approaches have been known to be difficult to understand and use. They have also been known to be limited in their use for nontrivial systems as well as for much of the life cycle of a given system. What makes DBTF different in this respect is that the mathematics it is based on, has been extended for handling the class of systems that software falls into; the formalism along with its unfriendliness is “hidden” under the covers by language mechanisms derived in terms of that formalism; and the technology based on this formalism has been put to practical use.

With DBTF, all models are defined using its language (and developed using its automated environment) as SOOs. A SOO is understood the same way without ambiguity by all other objects within its environment—including all users, models, and automated tools. Each SOO is made up of other SOOs. Every aspect of a SOO is integrated not the least of which is the integration of its object oriented parts with its function oriented parts and its timing oriented parts. Instead of systems being object-oriented, objects are systems-oriented. All systems are objects. All objects are systems. Because of this, many things heretofore not believed possible with traditional methods are possible, much of what seems counter intuitive with traditional approaches, that tend to be software centric, becomes intiutive with DBTF, which is system centric. DBTF’s automation provides the means to support a system designer or software developer in following its paradigm throughout a system’s design or software development life cycle. Take for example, testing. The more a paradigm prevents errors from being made in the first place the less the need for testing and the higher the productivity. Before the fact testing is inherently part of every development step. Most errors

are prevented because of that which is inherent or automated.

Unlike a language such as UML or Java that is limited to software, 001AXES is a systems language and as such can be used to state not just software phenomenon but any given system phenomenon. UML is a graphical software specification language. 001AXES is a graphical system specification language. They have different purposes. At some point UML users have to program in some programming language, but 001AXES users do not; a system can be completely specified and automatically generated with all of its functionality, object and timing behavior without a line of code being written. The intent of UML is to run a machine. The intent of 001AXES is to define (and if applicable, execute) a system software or otherwise.

Because 001AXES is a systems language, its semantics is more general than a software language. For example, 001AXES defined systems could run on a computer, person, autonomous robot, or an organization; whereas the content of the types (or classes) in a software language is based on a computer that runs software.

Because 001AXES is systemic in nature it capitalizes upon that which is common to all systems; including software, financial, political, biological, and physical systems. For software the semantics of 001AXES is mapped to appropriate constructs for each of a possible set of underlying programming language implementations. Unlike the DBTF approach, in general these more traditional development approaches emphasize a fragmented approach to the integrated specification of a system and its software, short changing integration and automation in the development process. A more detailed discussion of 001AXES and UML can be found in (Hamilton and Hackler, 2000).

Search This Blog

8051 microcontrollers