Thursday, February 5, 2015

Inheritance Models

Inheritance models are a controversial subject in Object Oriented Programming language design.  The big controversy is multiple inheritance.  Python adds a minor controversy on the value of non-public variables and methods.

Multiple inheritance is controversial because multiple parents with methods of the same name implemented can result in naming collisions.  Most OOP languages deal with this using priorities.  The order of inheritance determines which method is used.  The problem with this occurs when the method needs to do different things depending on context.  The programmer can always add flexibility, but this is not always desirable.  The argument in favor of multiple inheritance is that this is very rare, and when it does occur, it is probably a sign that the program is poorly designed and should be re-factored before continuing.

Java argues that multiple inheritance is rarely necessary, so it eliminates it with a solution that proves that multiple inheritance is indeed frequently necessary.  Java solves the multiple inheritance problem by adding a new kind of class to inherit from, called an interface.  Any number of interfaces can be attached to a class.  Interfaces have some important restrictions.  They cannot contain any implemented methods, and all of their methods and member variables must be public.  They are essentially fully abstract classes, where all methods are abstract, with a few added restrictions.  In reality, Java does support multiple inheritance, it just limits it.  Any class can inherit from any number of other classes, on the one condition that only one of those classes is allowed to have implemented code in it.  In theory, this solves the naming collision problem.  In reality though, it only solves part of it.  Two implemented methods with the same name will never be inherited by the same class in Java, so it will never have to determine priority.  One implemented method can collide with any number of abstract methods though, and any number of abstract methods can collide with each other.  The problem here occurs when two methods with the same name are expected by the interface to do completely different things.  This, perhaps the more common problem, is still unresolved.  Java not only fails to eliminate multiple inheritance, it fails to eliminate the biggest problem with it.

The evidence that multiple inheritance is necessary and valuable is seen clearly in the fact that Java classes frequently implement one or more interfaces, in addition to inheriting from a parent class.  Java's solution to the collision problem does reveal one thing about multiple inheritance: It is probably better to only inherit from one class with implemented methods, and even when violating this, it is almost always better to choose a design that does not require inheriting in ways that cause naming collisions.  Java reveals another thing though: There is no reasonable way to entirely eliminate the possibility of naming collisions caused by inheritance.  Naming collisions caused by inheritance could be handled by always requiring some kind of prefix for accessing inherited methods and variables, but this would compromise polymorphism, dramatically reducing the value of OOP.

Java's solution to the problems of multiple inheritance is not necessarily bad.  It is misleading to claim that it solves all or even most of the problems, but it does solve one, and it encourages the use of naming conventions that are less likely to result in collisions.  Overall, it mitigates the problems more than offering a real solution.  The problems with multiple inheritance are inherent.  They are not related to how it is implemented.  Java's mistake was in trying to solve an unsolvable problem, but there is significant learning value in the attempt, so it was not a waste of time or effort.


The value of non-public variables is a different but related matter.  What creates the controversy is that it is an entirely artificial limitation.  Private and protected variables exist solely to enforce best practices.  There is no platform level support for them, and there never will be, because it is already expensive enough to design processors with one protected mode (and sometimes a third access level exists for hardware interrupts).  Adding widespread access control to hardware just to enforce good programming practices would be absurd.  As such, variable access control in OOP languages is enforced by the compiler or the interpreter.  The reason the artificiality matters is that this kind of access control only restricts accessing variables by name.  A private variable is no different from any other variable.  The only difference is that it can only be accessed by name when it is in scope, and it is only in scope in the methods of the class it belongs to.  In any language that allows direct memory manipulation (including through external libraries), accessing private variables directly is fairly simple, if the programmer has a half decent understanding of the object containing it.  In reference to the inheritance discussion above, this means that requiring interface methods and variables to be public is merely a semantic thing, not a significant difference between fully abstract methods and interfaces.

The artificiality of variable access restriction by no means devalues it though.  Offering programmers some guidance by making it difficult to violate best practices does have value.  Access restriction is not the only way to accomplish this though.  C++ and Java use access restriction, but Python does not.  Nothing in Python is ever strictly private.  If you know its name, you can access it.  Python does have some ways of making it hard to violate best practices though.  A single leading underscore in a variable or method name in Python indicates that it should not be accessed directly, because it is an implementation detail that is subject to change.  Two leading underscores, with one or no trailing underscores in Python invokes name mangling that makes direct access from outside the class difficult.  Every method or variable reference inside the class definition with the two leading underscores and less than two following will be mangled, which will allow internal references to work properly, but it will break external references.  Even this does not make the variable or method private though.  The mangling is predictable, so a determined programmer could still access mangled references from outside the class, by using the mangled name.  This requires very deliberate action though, and thus will never occur accidentally.  A programmer that is willing to go this far to violate best practices would probably also use direct memory manipulation, so the fact that it is easier this way hardly matters.

To be clear, method and instance variable access restriction should never be regarded as a security measure.  In any well designed language, there will always be a way around it.  Access restriction should be regarded as a means to encourage best practices and provide a warning for attempts to violate them.  It should never be treated as anything more than this.


The connection between these two is the fact that access restriction is not a significant difference between two meta data-types.  Despite the difference in access restriction between fully abstract classes and interfaces in Java, they are functionally the same thing.  Java differentiates to hide the multiple inheritance, and that is it.  In fact, considering the value of access restricted variables and methods in fully abstract classes (hint: there is no value), it makes perfect sense to disallow anything but public variables and methods in interfaces.  To do anything else would be to restrict or dictate the implementation of the abstract variables and methods, which would directly violate the purpose of fully abstract classes as well as interfaces.  This is not a major difference between the two.  It is merely another case of the language enforcing best practices.


In the end, Java does not solve the multiple inheritance problem, and access restricted variables and methods are not strictly necessary.  In both cases, this does not mean that time or energy was wasted though.  They both have value, but they are related in that they are both misunderstood, often even by professionals.