Sunday, July 1, 2018

Soft Safety in Flexible Programming Languages

For almost half a century now, there has been an ongoing war between type safety and flexibility in programming languages.  It is time for that war to end, because Python 3.5 has demonstrated the mechanics for a solution.

The argument goes like this: One side says that static typing it critical to writing correct programs, because without compile time type checking, it is impossible to guarantee type consistency.  In practice, this means that serious bugs can be easily missed and come up rarely enough to not get noticed and fixed but often enough to cause serious problems.  In addition, type mismatches are frequently a symptom of the programmer not fully understanding the problem.  The most strictly and statically typed languages often produce almost or even completely bug free programs, once the program will compile.  On the other hand though, dynamic typing provides some extremely powerful advantages.  The ability to add attributes and methods to objects dynamically in Python is an example of dynamic typing that has great value.  Likewise, the ability to give a collection of unrelated objects methods with the same names and use them as if they were similar, without using some inheritance mechanic to invoke polymorphism, is also a function of Python's dynamic typing.  Dynamic typing is essential to meta-programming, which can allow for writing very complex programs in very little code.  Static typing advocates assert that static typing is necessary to avoid obvious errors, while dynamic typing advocates point out research showing that skilled programmers do not generally make those errors even without static typing.  It is all a trade off, but the consequences are very real.  Static typing is significantly less powerful, often making it take significantly longer to program in statically typed languages than dynamically typed languages, but the stronger guarantee of correctness has the potential to cut significant time out of debugging.  The fact is, there is no clear consensus whether one or the other is objectively better, but there are definitely cases where the cost of one is significantly higher than the other.  This is especially true in cases where meta-programming would be beneficial, because static typing does not allow for meta-programming.

The ideal language would probably be one that can handle both dynamic and static typing.  Unfortunately, no such language exists.  Yes, it is technically possible to use something like dynamic typing in C and C++, with liberal use of type casting, unions, structs, and void pointers, but it is ugly and time consuming to do this.  In languages like Python, one might enforce a sort of static typing by constantly checking types and throwing exceptions when the wrong type is used, but this misses the benefits of compile time checking, and it comes at a very high performance cost.  Like dynamic typing in C/C++, it also makes for ugly, time consuming coding.

The important thing about static typing is that there is a compile time type consistency check.  Haskell uses implicit typing.  In other words, it looks at how things are used, and it guesses their types based on that.  Sometimes it is necessary to explicitly specify types, but most of the time it guesses correctly, and type errors are typically a sign of deeper bugs.  C and C++ require the programmer to always specify the type when a variable or function is declared.  The similarity, however, is that all of these languages verify type consistency at compile time.  A dynamically typed language like Python instead keeps track of types during runtime, only verifying types when there are interactions.  Strong runtime type consistency checks are called strict or strong typing (not static typing).  Lose or weak typing is when a language uses implicit type casting to avoid type mismatches.  There is a general consensus that strict or strong typing is better than lose or weak typing, because lose typing frequently leads to unexpected and undesired behaviors (PHP is a widely criticized language, in part due to the negative side effects of loose typing).  Aside from JavaScript (which has also faced significant criticism over this), pretty much all common modern languages have fairly strict typing, limiting implicit casting almost exclusively to casts to floats in math where floats and ints are mixed.  (Python also allows some list-like objects to be multiplied by scalars, to repeat the list that many times, but this involves no actual type casting.)

Python 3.5 added type annotations to the language.  Prior to this, some Python programmers would place comments after initial variable declarations, specifying the type, to help them keep track of types.  The creator of Python decided that this kind of informal annotation was valuable enough to some people that it deserved to be formalized.  Type annotations were made a formal part of the language in Python 3.5.  Note however, that type annotations are nothing more than a new, formal comment syntax for noting types.  They don't look like comments, but they function like comments.  According to the document introducing type annotations, the language does not and never will enforce them.  They are also known as "type hints".  They are not intended for runtime type checking (you still have to do this manually if you want/need it), and aside from making sure the syntax is correct, the Python interpreter totally ignores them.

What is the point of formal type annotations that are ignored?  How are they better than comments?  The reasoning mentioned in the PEP document introducing the feature, is that comments annotating types make the code less readable and can interfere where other comments are needed.  Python has been carefully designed to be as self documenting as possible, and adding a formal type annotation syntax helps with this.  That said, there is perhaps a more important use for type annotations.  The Python interpreter does not and will likely never enforce static typing.  All static typing is, however, is a compile time check, to ensure type consistency.  In standard Python, creating such a check would require some type inference mechanic,  which is complex and difficult to write.  With type annotations, however, one could easily write a linter that will use the annotations to determine type and then verify that types remain static throughout the program.  There are two important things though.  The first thing is that compile time checks do not actually have to happen at compile time.  Compile time is merely the latest they can happen.  The second thing is, we do not actually need annotations to be part of the language.  This merely makes the code more readable.  If the Python community had come up with a consistent comment annotation syntax for this, it would have been just as useful as formal annotations for pre-compile static type checking.  It turns out that dynamic typing has never been a problem.  Anyone with the skill could have developed a comment syntax and written a linter to verify that typing remains static, for any language with dynamic typing.  If no one cared enough to do that, then clearly static typing in any particular language was never a priority for them.

There is a massive advantage to this soft type safety though, and that is that it is possible to mix static and dynamic typing.  Most programs written in dynamically typed languages only rarely take advantage of the dynamic typing.  Most variables keep their initial type for their entire lifetime.  Enforcing static typing on most variables is a valuable strategy for catching and eliminating bugs.  But, when you need dynamic typing, the cost of working around that in a statically typed language can be enormous.  When this happens, the additional debugging time that might be required in a dynamically typed language pales in comparison to the extra time spent working within static typing instead of using meta-programming.  So, what if we could make some variables static and some dynamic?  This is precisely what a soft type checker in a dynamically type language can allow.  With Python's type annotations, we can use annotations on variables where we know the type should never change, but we can also not use annotations where we need dynamic typing to use our time more economically.  This provides the advantages of both!

Type safety is not the only place where this might have value though.  One of the things that makes C such a powerful language is its freedom to allocate and use memory dynamically and freely.  Unfortunately, this freedom also makes it prone to memory leaks and segmentation faults.  Rust introduces an ownership mechanic that allows for most of the power available in C, with the minimal restrictions required to ensure memory safety.  This mechanic also ensures thread safety.  Except, there are a few places where this safety is not necessary or where it would be better for the programmer to deal with the memory safety instead of the language.  Like dynamic memory, these places are rare, but they do exist, and when they come up, it can make a huge difference.  Could we perhaps implement a kind of soft memory safety for C, just like type annotations allow for soft type safety in Python?  I believe so.

C already has a built-in annotation mechanic.  Pragmas are used by the C preprocessor as a sort of annotation that tells C to alter its behavior in a particular way.  They can be used to silence certain warnings, for example warnings saying that a variable is never used may be irrelevant in embedded systems programming.  In GCC, compiler plugins can also use pragmas.  For example, pragmas are used to tell OpenMP where and how to use threads in the program.  Pragmas that are not used by anything are generally just ignored.  It should be fairly simple to define a pragma syntax that can be used to indicate where and how ownership mechanics should be used, and then a linter could easily be made to interpret those pragmas and verify that the ownership mechanic is being used coherently.  Of course, the program will compile whether ownership is being violated or not, but if the linter is used consistently and all errors fixed, a C program using this mechanic should be just as memory safe as a Rust program using language enforced ownership.  The best part is that if you have some memory where you know memory safety is not an issue or where you need to ensure memory safety manually for whatever reason, you can just indicate that ownership mechanics should not be applied to it.  With this, C itself does not have to guarantee memory safety for your program to have a memory safe guarantee.  With a little extra syntax, memory safety can be guaranteed before the compiler even touches the code.

In my opinion, soft safety is actually better than hard safety, because it does not eliminate the benefits provided by more flexible and powerful programming languages.  It allows us to take the best aspects of very strict languages and apply them to more dynamic and flexible languages, without losing any of the flexibility.  What we need is not type, memory, and thread safety built into languages.  What we need is syntax and software that can analyze our source code and guarantee exactly the amount of safety we want and need, without eliminating the features that make many languages so powerful.  This kind of soft safety could be another step in the evolution of quality for software development.

2 comments:

  1. So it's the TOOL of type safety checking and not the RULE imposing the check that is useful. If the rules for applying tools can be customized for particular projects, productivity or quality could be increased without sacrificing much of the other.

    ReplyDelete
  2. Precisely. This cannot necessarily be applied in 100% of cases, for example, the reason C cares about type is that it does not dynamically track it, so it needs to know beforehand what instructions it needs to use for each variable. But there are plenty of valuable applications for this in other places and other languages.

    ReplyDelete