Saturday, January 23, 2021

Dynamic Typing in Python

Python is an extremely powerful language, but with great power comes great capacity to really screw things up.  Overall, Python is an extremely well designed language, but there are a few aspects of it that get attacked a lot.  Using style for control flow is a big one, though some of us actually really like this feature.  Another common complaint is with Python's dynamic typing.  Dynamic typing allows programmers to really screw things up in ways that can be really hard to debug.  Supposedly, it is common for less experienced Python programmers to accidentally have variable naming collisions, and because Python has dynamic typing, there is no type safety system to warn the programmer (though, I should note that no one has ever presented me with any evidence that this is a common problem, and even when I was new to Python, I never encountered this problem in my own code, and when I taught Python to undergrad students none of them ever reported having problems or frustration with this).  Some opponents of dynamic typing even suggest that the ability to change variable types has no practical value and thus should not exist in any programming language.  As someone who has made extensive use of Python's dynamic typing, to great advantage, I would like to present some highly valuable applications of it that have saved me enormous amounts of time.

The one that always comes to mind for me, is the ability to dynamically change objects.  Wait, objects?  How do objects have anything to do with it?  Objects are (composite) variables.  They have a type, that is defined by their class.  Wait, so you can change the class of an object dynamically?  Not exactly, and this is the wrong way to think about it (in this context...).  Most programmers think of types in terms of primitives, which is why many opponents of dynamic typing see no value in it.  If you see objects as collections rather than variables with their own types (defined by the names and types of their contents), it's easy to miss this.  (In Python, functions are also variables, so objects in Python are literally types defined by the variables they contain and their names.  Classes are merely formal definitions of those types, but it's trivial to create undefined object types as well.)  What does it mean to change the type of a primitive?  It means you are storing a value of a different type in it.  If you have a variable containing a string, and you store an integer in it, the string is completely replaced.  Objects are not primitive types though.  They are composite types, that can contain both data and code (or, to be more precise, pointers to code, in nearly all implementations), and multiple elements of each.  While writing a primitive type to a variable containing an object will overwrite the object (or rather, the pointer to it), dynamic typing is more than just overwriting the existing data with data of a different type.  Changing the data type of a primitive, without overwriting it with new data doesn't make a lot of sense.  Changing the type of an object, without overwriting it, does make sense, so long as you don't think in terms of changing the class of the object to another fully defined class.

In Python, most objects (excluding built-in objects) can be dynamically changed, by adding methods and member variables.  Thus, if you have created an instance of a user defined class (aka, an object), and it needs to have an additional member variable and/or method, in Python it is easy to just add it.  Now, it might be tempting to think even this is useless, but in Python it is incredibly powerful.  It allows for macro-like ability in the language, and it is fairly well known that macros are what makes Lisp such a powerful language.  Python's dynamic typing isn't quite as powerful as Lisp macros, but it allows for many of the same kinds of advanced metaprogramming.  In addition, because it is also possible in Python to create empty objects, by instantiating objects that inherit only from the object() class (object() itself is built-in and thus its direct instances are not dynamic), it is also possible to build completely arbitrary objects at runtime.

One of my favorite uses of dynamic typing in Python is adding methods to existing objects, to give those objects additional abilities.  In the context of video games, this means I can add a "render()" method to existing objects, if I want.  Of course it is more practical to just include render() in the original class definition, which is what I actually do.  The practical value of this is much greater, when I am using 3rd party libraries with objects that I want to change, without changing those libraries.

Another handy use of dynamic typing in Python is to assign extra metadata to existing objects.  For example, say I need an identification system for a collection of objects.  I could easily add a universal identifier to all objects put into that collection.  If I need to find or verify a particular object by UID, I can easily do that now, without having to change the class definitions (which I may not even have access to) of all of the objects.  And, now my system can take arbitrary object types, tack on the UID, and use them, without having to know anything about the objects.  Since Python can have collections of different types of objects (which we will talk about in a moment), this makes certain kinds of dispatch systems really easy, and adding UIDs and other metadata to objects can facilitate very advanced security protocols far more easily than languages that don't have dynamic typing.  Sure, I could write a wrapper object, that holds an object being passed in, and contains all of the metadata, and in some cases, this would be the optimal approach, however, this is less efficient (more layers of objects), and it would be significantly harder to add to an existing system.  In addition, a special authentication wrapper could lead to increased coupling between the authorization model and other modules that would have to unwrap objects when accessing them.  If I need to add an authorization protocol to an existing system, or if I need a very cheap authorization protocol that takes minimal time to develop, taking advantage of Python's dynamic typing will easily allow it, with no risk of potentially harmful coupling between the security system and the other systems.

The most complex way I have ever applied Python's dynamic typing system was in automated web site QA testing with Selenium.  This is an especially complicated application, because each page has its own unique set of properties and actions.  Initially, I used a non-object oriented approach, but this proved problematic, as each test needs to be painstakingly created, entirely from scratch, often with complicated CSS selector strings to interact with all of the elements on the page.  Even with three other employees writing tests, we were only writing two or three tests a day, to test usage paths that would only take us a couple minutes each testing manually.  In addition, logging is problematic, as either each individual test needs a bunch of custom logging code, or a global logger only logs minimal data about test failures.  The company I worked for at the time had used PHP for Selenium testing a little before I was hired, and they had attempted to get around these problems by writing a framework to handle a lot of this, but it constantly needed modification, because everything was hardcoded.  Using Python, I wrote a semi-complicated program, using Python's metaprogramming capabilities to dynamically generate a custom object for each page, with methods for all of the actions on the page.  The initial iteration still required test developers to go through each page, adding actions and any input elements to a JSON file, that my program would generate the objects from, but the long term plan was for the program to scan the HTML, looking for interactive elements and their input fields, and generate the page objects from that.  The finished product also would have scraped the page for information to include in logs, when tests failed (the page title, error messages, and such).  This way, a test writer could go to a web site, get the first page object, and then navigate purely by calling action methods on page objects, and then using the returned page object (representing the new page), and when a test failed, the log would automatically include detailed error information, rather than just the URL, the assertion that failed, and whatever description the test writer put in the assertion.  If a dev got to a new page and didn't know what actions were available, Python's introspection capabilities could be used to see all of the actions associated with the page object, to see what options the page presented.  While I quit that job when they decided to migrate their automated testing to Java (making test development much slower and eliminating any metaprogramming capabilities), my program would have allowed amateur programmers (we hired a lot of interns in their first few semesters of CS undergrad studies) to write tests very quickly, without having to even know HTML or CSS, and without having to spend hours going through web pages to construct complicated CSS selectors.

The truth is, most Python programmers use dynamic typing quite often.  It is not typically used for changing data types dynamically though.  It is used in heterogenous lists.  Heterogenous lists (Python's default array/list/collection type) can hold objects of any type, in any combination.  This is really handy, because it means metadata can exist as type, rather than as an explicit part of a variable.  For example, say you are creating an event queue.  It needs to hold KeyboardEvent objects, MouseEvent objects, and a bunch of other event types.  In C or C++, you will have to make an Event type (object or struct) and then have the subtypes inherit (or use a union), and then you will need to hold metadata about what type each object is, within each object (or struct), and then you have to cast Events going into and out of the queue, and from there you need separate code for each type.  (The degree of run-on-ness of that sentence is a good indicator of the complexity of the work involved.)  In Java, type data is handled automatically, but you still have to inherit, cast, and handle each type separately.  In Python, I don't need to inherit from a common class (less coding and greater maintainability), and I don't strictly need to check type.  I can use separate code for each type if I want, but I can also just check if properties exist directly and act appropriately.  For example, if I get an event, I might check if it has a "keydown" property, and if it does, I can check which key it was.  I don't need to know that the object is a KeyboardEvent object, and I don't need to cast objects coming out of the queue to their original type.  And in fact, say I want to use the event system to control an AI opponent.  I can make an AIEvent object with keyup, keydown, and whatever I am calling mouse event parameters, and toss that onto the queue, and so long as I have some way for clients of the queue to tell what events belong to them, I can use the same code I am using for the playable character for AI units!  Now, my AI units can take only AIEvent objects off the queue (yeah, they have to check type for that, and that's fine), and then they can feed the event into the same unit control code the playable characters use.  If this use sounds similar to Java's interfaces, it can be used similarly, to add similar capabilities to a collection of otherwise completely different objects.  So sure, this can be done with other object oriented languages (indeed, this is a common game programming design pattern), but with Python, I can do it much faster, in far less code, because I don't need a special queue, type casting, or as much type checking.  For comparison, SDL, a video game programming library, uses structs and unions, with type metadata variables (that the user must check), to achieve similar behavior.  (C unions are really powerful, and they can be used to create fully dynamic types in C, but with more hazards than Python, because it's easier to overflow a memory location using unions.)

The fact is, dynamic typing is far more than just being able to change the type of a variable at runtime.  It's being able to dynamically redefine objects.  It's being able to store elements with different data types in the same list construct.  It's being able to use objects of different types with related functionality with the same code, without having to do any complicated type casting.  Sure, dynamic typing increases the potential for bugs, but it also decreases potential for bugs, by making code simpler and shorter.  The hazards of dynamic typing are generally far outweighed by the benefits of increasing simplicity and decreasing code volume.

No comments:

Post a Comment