There is something to be said about a low-voltage fan that can run for years without any maintenance. They often come in 12-24 Volt flavors and cool our
Computer Power supplies that convert AC Power to DC, cool CPU and GPU. They cause airflow in the computer casing. These Fans usually cost no more than a
dollar. Their Mean time to failure is Three to Four of years. I think If There was a Reliability Hall of Fame, an electric Motor that runs at near constant
RPM deserves to be the first inductee.
We historically have not seen many software Packages that does this. That is software that has a simple interface where other components can talk to it, Be
Reliable and Run for years without any kind of baby-sitting.
Perhaps the only way to come close to this sort of reliability in software with mean-time to failure being multiple years in future would be to limit how
many things that software package does. Software should be designed to do one thing and one thing only and later integrated with other software components
to achieve more.
But we can point to pieces of software that do emulate this high reliability by doing as little as possible? Think about an Operating System Boot Loader, It tries to perform one function which is placing a piece
of code in reserved section of hard-drive. So then it interfaces with The Hardware which loads the Boot Loader in a specific Section of the Memory and runs
it. As long as the underlying HardDrive does not fail the Operating system Boot Loader shall continue to work.
Some Examples of non-reliable software would seem to be a Mail-Server. It can stop malfunctioning when too many
people connect to it, When the Throughput by Spammers increase, when the system runs out of space …
Think about the piece of Software that puts your Computer or phone display in low-power mode by either lowering the contrast or putting on stand-by. This
software is highly reliable because it does one specific thing. When your system inactivity (defined by Keystrokes or Mouse Activity) is prolonged it
activates itself and partially shuts of the screen.
I think a lesson we can draw is that if a software is expected to do much such as implement an accounting business logic is very prone to achieving
Robustness. That’s because in fact most pieces of Software are reliable. That is they behave correctly under expected conditions which for an accounting
software would be for you to enter all the figures correctly. But unfortunately most pieces of software are not exactly Robust which means when some
unexpected condition occurs they miss-behave. The worse news is that they go in persisted mode of miss-behaving once a failure occurs. For instance in the
case of our Mail-Server it will definitely fail to deliver mail if it is out-of-space. But can our software at least remain reliable when it has failed to
be robust. What do I Mean? Well my Mail-Server has run out of Space and it can’t deliver SMTP traffic. But can it should still support theIMAP, POP or ActiveSync Protocols so I can access my existing mail.
If we go back to accounting software example. If I entered and incorrect figure someplace in Third Quarter, It should not affect my ability to edit my
Third Quarter Data.
So one thing we like to shoot for is this: If our software enters a state where it is malfunctioning it should easily come out of that mode when the
condition that has caused this error is removed.
For Instance if a Browser crashes when it loads “www.someBadSite.com”, then it would make sense for it to have
logic that does not load the crashed site next time it loads. The Browser should have some warning system built-in that notifies me that this site has
previously caused a mal-function.