Wednesday, June 21, 2023

The Exclusion of the Original Makers

Making goes back a long time.  Lone inventors have popped up here and there throughout history, and one might even argue that the very first makers existed well before recorded history.  After all, iron and copper work appear to have existed so far before written language that the only record we have of their discovery is archaeological examples of early copper and iron work.

That's not really what "makers" means though.  The term "makers" is fairly recent.  Makers, in the modern definition, is actually a replacement for the term "hackers", which meant exactly the same thing until the 1990s and 2000s, when it was stolen by outsiders and redefined to refer to malicious infiltrators of computer technology.  By the 2010s, the "hacking" community had rebranded under the name "makers", to avoid association with criminals.  The thing that distinguishes makers from early inventors is community.  No, this does not include workplace community, like Edison's invention factory.  Making has always been hobby first.  People do it for the love of doing it, and maybe there's profit or maybe not, but profit isn't the goal.  The first real maker community in the U.S., and probably the world, was actually government sanctioned, and potential for profit was restricted and regulated.

HAM radio was the first maker community.  In the U.S. it was and still is governed by the FCC.  When the FCC was founded to regulate the air waves to prevent them from becoming so cluttered and noisy that they would not be useful for any practical applications, it was recognized that the invention of radio technologies required a certain degree of free access to the radio spectrum, and very tight regulation would ultimately halt progress and development of better radio technologies.  So Congress determined that a collection of bands scattered across the usable EM spectrum would be reserved exclusively for hobby use.  These bands could not legally be used for any commercial purposes, on pain of heavy fines (which are still regularly levied against unlicensed and commercial users).  To use these bands, one must obtain a license, which requires learning enough about radio physics and technology to at least potentially contribute and "make".  The licensing process was delegated to volunteer radio associations.  They did not just conduct the tests, they also developed the tests.  When an applicant passed a test, they would notify the FCC and request a license for the applicant.  This made HAM radio inherently a community.  To obtain a license, one must interact with a volunteer organization that is a HAM radio community in its own right.  Testing proctors come from that same community, and the test graders must be licensed members of that community.  The early HAM radio licensees generally made their own radios.  It took some time before radios designed for HAM bands became commercially available.  Many people who decided to become licensed were inventors or hardware "hackers" in their own right, and thus the first real maker community was born.

Fast forward to today though, and HAM radio has largely been excluded from maker communities.  The interest in electronics and hardware in general is still there, but HAM radio, the original makers, largely are not.  Why is this?  Why have the founders of making been excluded?

I suspect the answer is commercial incentive.  To be clear, I'm not talking about the makers themselves.  I'm talking about the companies selling products to the makers.  HAM radio kind of got old as it went into the 1990s.  Commercial radio technology had become too complex for the average hobbyist to DIY.  FCC regulations have gotten tighter and tighter, and that has made it a lot more expensive to built a homemade radio, because now you need elaborate filters and expensive testing equipment to ensure that you do not accidentally produce signals that interfere with other frequencies.  As the hobby has gotten more expensive, fewer and fewer young people have been able to even afford to build or buy radios.  HAM radio began to be viewed as an "old man" hobby, because there were more older people involved than younger people, due to the high cost threshold.  Through the 2000s, the situation stayed largely the same.  Radios were expensive, building radios was impractical, and  while more younger HAMs started to enter the scene, they were less interested in the community aspect, probably because of cultural alienation of the young from the old in most developed countries.  As HAM became less popular, it grew in profitability significantly less rapidly than other things.  Radioshack, originally built around the HAM radio hobby, was mostly out of the radio business by the 2000s and has all but disappeared since.  Other companies that sold hardware intended for HAMs have either gone out of business or dropped or marginalized their HAM product lines.  It is hard to find HAM hardware nowdays, and many of the companies that do still sell it charge absurdly high prices, because there is very little competition.

The 1990s-2000s dearth of affordable hobby radio technology has flipped now though.  With the advent of companies like Adafruit, producing very low cost microcontrollers with radios built in, radio has become a lot more affordable.  Unfortunately, the offerings are very limited, and there is nothing for HAMs, just things that HAMs can hack to fit their uses, if they can afford the expensive bits required to meet their needs.  Audio radios are pretty cheap.  A decent dual band, 8 watt Baofeng can be bought off Amazon for under $30.  The range is not great.  You can get ten to twenty miles in a flat, fairly low population area.  In a big city, you would be lucky to get even a few miles.  With repeaters, you can extend your range significantly.  HAMs are authorized to transmit up to 1,500 watts on many bands though, and at least 200 watts on the majority.  That's enough to communicate with people several states away, and under the right conditions, it is enough to communicate around the world.  Even 200 watt transmitting hardware is pretty expensive though.  Most HAMs spend most of their time under 100 watts, even if they do have equipment capable of far more.  (In fact, one of the legal restrictions for HAM radio is that you only use as much power as necessary for your current purpose.  If you are hoping to reach stations around the world, that completely justifies 1,500 watts or whatever the maximum for the particular band is, but for regular short to medium range communication 100 watts is generally plenty).  Adafruit's microcontrollers, on the other hand, transmit mainly in the millwatts range.  They are designed for very short range communication, not for HAM level communication.  They use unlicensed bands restricted to very low output power, but those unlicensed bands often overlap with HAM bands, where licensed operators can transmit with much higher power.  For example Adafruit's Radio Featherwing 900MHz (not a sponsor) can transmit at 868 or 915 MHz.  915 MHz is right in the middle of the 33cm HAM band, which allows transmission up to 1,500 watts (I cannot find anything saying there is a lower restriction on this band, and the default is 1.5kHz) or 10 watts for spread spectrum (which may be necessary around industrial, medical, and scientific complexes that radiate on this band).  To do this though, you will need an amplifier, and decent amplifiers are hard to find, especially when looking for 5v ones that can run off the same USB power as the device itself.  Adafruit also has offerings that operate at 433 MHz, which is in the middle of 70 cm HAM band, but again, while we can legally transmit up to 1.5 kW, amplifying the output requires devices that are just hard to find commercially, especially in the context of small, low cost projects.  In addition, these are just two bands, and very narrow parts of them on top of that.  915 MHz is great, unless hundreds of HAM licensed makers are all trying to use the exact same channel or narrow range of channels at the same time.  (I am actually not sure if "868 or 915 MHz" means a few channels around those or only those exact frequencies, because if it is the second, that is lame and makes these practically useless for HAM, because if more than a few people are transmitting at even a few watts at a time within a few miles, the interference will just ruin the whole thing.)

What do HAMs need for this technology to be useful to them?

First, we need hardware that can be tuned to a range of frequencies within the desired bands.  If that 915 MHz and the 433 MHz are single monolithic frequencies that the devices can transmit on, as I said before, they are practically useless for HAM applications.  At their default power, the potential for cross talk is limited to operating mostly in the same building or maybe one building over.  For unlicensed use, this might be reasonable.  For HAMs, we are looking at transmitting at at least 5 watts at the absolute minimum.  That will cover an entire neighborhood and the next one over, at the absolute minimum.  Under the best conditions, we are talking miles.  Even in a moderate sized town, there are likely to be enough HAMs for that to be a serious problem.

Second, we really need low voltage amplifiers.  I mean 5 volts.  Adafruit's microcontrollers almost entirely run off of USB power.  Requiring an additional 12 volt power supply for a 5 to 10 watt RF amp is just plain unreasonable, especially for small devices designed to accommodate mobile applications.  Ideally these devices should be able to run off of USB "phone charger" batteries.  In fact, some Adafruit devices run off of 3.3v lithium batteries, so it might be even better if they can run off of 3.3v and have a regulator onboard to allow 5v input just like the Adafruit boards that use 3.3v internally.  Now sure, I know that this is a tall order.  There's a reason you cannot find 5 watt RF amps that run off 5 volts.  Voltage is the critical part of RF amplification, because the voltage is what essentially forces the current out of the antenna.  It's very likely that RF amplifier ICs do not support less than 12v for 5 watt output.  In theory it should be possible to design and fabricate ICs that do support this, but companies like Adafruit are not in the IC development game, so it is probably not reasonable to ask them to do this.  They might be able to find such a chip if it already exists though.  They are quite good at that!  And 10 watts at 5 volts is probably even less reasonable.  But, there is a solution that companies like Adafruit can manage.  While they do not do their own ICs, their bread and butter is doing PCBs, and a PCB could easily be made that integrates a 12v 5 watt or 10 watt RF amp with a 5v to 12v boost converter.  Now, of course this will reduce efficiency, and it will take a considerable amount of current at 5v.  That's a given, but it will at least provide some good medium range power output options for licensed HAMs, that will work directly with Adafruit's other hardware.  Imagine a 5 watt 5v RF amp with a U.FL cable that can be connected to the Adafruit Feather RP2040 RFM69 Packet RadioNow you have a very HAM friendly option.  If Adafruit wants to avoid selling things to people who are not authorized to use them, they can require purchasers to have an Adafruit account, and they can request and verify the buyer's HAM call sign to make sure they are legally licensed.  I am totally willing to give Adafruit my call sign, so that they can verify that I am legally allowed to use what I am buying, and I am sure most HAMs are.  I am not certain how this would work for buyers outside of the U.S., but if their countries have online registries where you can look up someone by their call sign, the same mechanic could be used.  And because Adafruit specializes in hobby volume production, they would be ideal for serving a smaller hobby market (which frankly is a lot of what they already do).  Of course, Adafruit is just one example.  Sparkfun could do this as well, as they have a similar business model as well as products in the same domain.  I especially like the idea of something like the above RP2040 packet radio though because it has a built in U.FL port for an antenna or RF amp and a STEMMA port for simple interface with other peripherals.  It also has plenty of GPIO pins on top of that.

Third, we need tunable modular filters.  Many devices can produce RF signals with PWM, but PWM is incredibly dirty.  PWM is a sort of phase modulated square wave signal, and square waves are composed of infinite harmonics.  This means that a signal generated using PWM will interfere with an enormous number of other frequencies, which is illegal.  Radio in general, but definitely HAMs, have regulations requiring such spurious signals to be kept below certain thresholds, to avoid harmful interference both within and outside of the band being transmitted on.  A great example is the Raspberry Pi.  At least from Pi 2 onward, PWM can be used to generate signals from 5KHz to 1,500 MHz.  That covers the vast majority of the practical HAM bands.  The problem, as mentioned before, is that PWM is incredibly dirty in terms of signal purity.  To legally transmit with a Raspberry Pi using PWM to generate the signal, it is necessary to have a very good filter between the Pi and the RF amplifier, to filter out all of the undesired harmonics.  But, you cannot just have a filter for say, the 70 cm band.  I have not looked at the frequency output range for a Pi being used this way, but I would guess that there are also some spurious emissions in the transmission band, so the filter would likely need to be narrow and tunable, to filter out spurious emissions on frequencies near the transmission frequency.  (It is possible I'm wrong here.  Square waves generally only contain higher frequencies than the base frequency, so there may not actually be spurious emissions in the HAM bands below the PWM frequency.  That said, because the duty cycle is changing to generate the RF signal and does not change with infinite precision, it is likely that there is some within or very near the transmission band.  Maybe a manually tunable filter that is narrower than the band could be used, allowing the user to tune it to the desired portion of the band, while being narrow enough to filter out any nearby harmonics.)  Anyhow, what it comes down to is that for microcontrollers with PWM, it is possible to generate RF signals without needing any radio at all, and this is generally more flexible than a dedicated radio IC, which typically only transmits in a very narrow range, but a fairly good filter is required.  These are not terribly difficult to build if you have the right equipment for very precisely measuring capacitance and impedance, but if companies like Adafruit sold high quality RF filter ICs, that would really help the HAM community.  And if they had some very narrow band I2C tunable versions, that would be even better.  It would not make building radios entirely from scratch much easier, but it would definitely make it feasible within existing FCC regulations, and it would significantly reduce the need for expensive analysis hardware that is normally required to make sure your hardware adheres to regulations.

I think this is important and valuable.  The 1990s and 2000s were not a great time for HAM radio, but we have pushed through that, and modern technology is better than ever before for HAM.  Licenses are much easier to get now, with the removal of the Morse code requirement.  Even just a Technician license (the lowest level available) grants permission to use the cheapest bands to operate on (the HF bands are heavily restricted for Technicians, but they require very large antennas that can be very expensive to install).  In addition, all of the VHF and higher frequency bands available to Technicians allow Technicians to transmit data, making them ideal for technological applications.  Companies like Adafruit could really help the HAM community by putting some resources into creating products lines exclusively intended for HAM, but they could also help revive HAM radio and make it interesting for younger people by bringing it into the realm of modern hobby technology.  HAM was once cutting edge radio technology.  It lost that because companies supporting that disappeared.  Companies devoted to serving makers and hobbyists not only have a perfect market just waiting for someone to serve it, they also have an opportunity to turn HAM into a much bigger and more interesting hobby.

I personally became a HAM radio operator because I was interested in the technology.  I am far less interested in the personal communication aspect of the hobby.  I recently got my Amateur Extra license, and shortly after that I got my first radio.  I got my General license in 2010 (I skipped Technician, passing the Technician and General tests in a single sitting), and it took me over 10 years just to get my first radio, largely because my primary interest was in the technology and less in the personal communication, and no one was serving the market I represent.  Sadly they still are not, but 13 years later there are a lot more people like me, who are interested in HAM for the technology (including a surprising number of older HAMs, like the one who taught the Technician licensing course I recently attended with some of my children), but who have little motivation to get involved, because the technology is just is not available.  It does exist, but even companies who claim to be devoted to serving the maker market are failing to serving the original makers.

I want to see HAM radio return and take its place at the head of the maker movement.  Old people are not just washed up has-beens.  They are troves of valuable knowledge.  HAM radio itself could evolve into far more than just radio.  I want to go to HAM club meetings and talk about programming radio enabled microcontrollers.  I want to go to HAM meetings and talk about using radio communications for more than just talking to each other, accessing very limited portions of the internet, or gathering weather telemetry.  HAM clubs should be learning about adding RF technology to your home and your appliances, so that you can control them remotely.  But when the majority of HAM technology is handheld radios that are only good for talking to people you seem to have little in common with, it frankly sounds and is kind of boring.

So I guess this is my challenge to Adafruit, Sparkfun, and anyone else to claims to be dedicated to serving the maker community: Don't forget the HAMs.  Start producing products designed for them.  Start using whatever learning platforms you have to encourage makers to get their HAM radio licenses and start learning to use more advanced radio technologies.  You can turn HAM radio into the hobby it was always intended to be.  All you need to do is serve the original maker community as well as you serve the rest, and we can bring them together as an incredible power for the development and progress of not only radio technology but also the broader maker movement and technology in general.  HAM should not just be about radio.  It should also be about applications.  If we do not have the appropriate hardware though, we cannot build applications around HAM radio.  Please, please fix this problem.

Thursday, March 17, 2022

Modern Clustering

 Back in the late 2000s, I discovered OpenMOSIX, a Linux software suite and kernel patch for making clusters.  In case you aren't already aware, a cluster is a group of computers networked together in a way that allows heavy workload tasks to be shared across the systems in the cluster.  OpenMOSIX was fairly unique, being a Single System Image (SSI) cluster.  The kernel patch provided OS level support for load sharing, and it produced a composite system that appeared to be a single computer to user space programs running on an OpenMOSIX cluster.  The patch allowed the kernel to automatically migrate processes between systems, for automatic load balancing, and the user space software allowed the user to set priorities and manually migrate processes.  This made OpenMOSIX extremely powerful.  A high performance computing application could merely be coded to run across multiple processes, and OpenMOSIX would handle the rest.

In contrast, other clustering software requires programs to use special libraries to explicitly farm tasks out to other nodes, to trigger migrations, and so forth, and they don't provide software that allows users to easily control migration.  And some earlier clustering software even requires all of the systems in the cluster to have nearly identical hardware.  OpenMOSIX made clustering trivially easy, but in 2007, the project was officially ended.  Of course, it hadn't seen work for several years, so in reality, it ended around 2004, the year of the last release.  Only three years did one of the founders announce that OpenMOSIX was officially done.  He justified this with the assertion that multi-core CPUs made clusters obsolete.  This severely annoyed me.  I respect open source developers, and I understand that they won't always have time to continue work on complex projects, but I lose most of that respect when they lie about things like this.  The real reason OpenMOSIX was terminated was that the developers did not have the time or motivation to port it to Linux 2.6.  To their credit, they tried.  There's a beta for one of the first 2.6 kernels.  If that's their reason for giving up, I'm fine with that.  Yeah, I'm sad and disappointed, but it's their time and their lives, and I can't expect them to keep working on it for free, merely because they used to do so.  But, clusters were not made obsolete by multi-core CPUs.  In fact, they were made far more valuable, because now we have processors with far stronger computational resources, allowing for cheaper clusters that are also immensely more powerful.  Before multi-core CPUs became common, clusters were often severely limited by network bandwidth.  Beyond a certain number of systems, network congestion would cause lag in the cluster, resulting in rapid diminishing returns.  Increasing network speeds helped to mitigate this, but it was still a serious issue.  The solution was custom system with a huge number of separate CPUs, but even then bus congestion could become an issue quite quickly.  In addition, the master node could be a bottleneck, when node traffic piled up and many nodes were stuck waiting for the master node to become free to service their requests.  Multi-core processors solved many of these issues.  Each network connection could be serving 2, 4, 6, or more cores, reducing congestion and collisions dramatically.  A master node with several cores can service nodes faster by spreading the load between cores.  And of course, more computing power is available from significantly fewer systems, reducing the cost of the cluster and even reducing the need for expensive custom systems.  OpenMOSIX on a cluster of multi-core systems could have been incredibly powerful, and the need for high performance computing has only increased.  Multi-core systems made clustering more valuable, not obsolete.

Without OpenMOSIX, modern clustering is a mess once again.  OpenMPI and OpenMP provide clustering over SSH, but programs have to be custom made for these, and they have to manage significant parts of the load balancing themselves.  Processes can't be migrated.  Tasks can be terminated and restarted somewhere else, but there is no migration.  Other SSI clustering software has popped up, but it's all dead as well.  A promising replacement for OpenMOSIX, Kerrighed, did make it to the 2.6 kernel, but it died in 2010, with the last release being in June of that year.  MOSIX, the proprietary clustering software OpenMOSIX was originally forked from, does still exist and has seen continued development into 2017, with the latest release being in October of that year.  The latest version doesn't even require a kernel patch, but it uses some virtualization which increases overhead.  In addition, it isn't open source, which creates significant security concerns.  The LinuxPMI project picked up OpenMOSIX development when Moshe Bar announced it was being abandoned, but while it appears like it has made significant progress, there are still no pre-packaged kernels or distros using it, and if you want to use it, you have to patch and compile the kernel yourself, which is no trivial task.  It really seems like the cancellation of the OpenMOSIX project ultimately killed SSI clusters, the most useful and easiest to use kind of cluster.

What now?  Is clustering just a dead technology that is mainly only used by nostalgic nerds?  Not quite.  It's still possible to do informal clustering, by writing programs that handle the elements of clustering themselves.  This is only better than OpenMPI/OpenMP because you have more control, and you don't have to setup all of the OSs for clustering.  Distributed computing has remained alive and well, even as clustering has largely collapsed, but unlike OpenMOSIX and SSI clustering, the developer of the software still has to know how to do it.

The most popular use of clustering is probably distributed rendering.  Back in the day, Linux distributions like Dyne:Bolic used a 2.4 Linux kernel with OpenMOSIX, to make distributed rendering with Blender trivial.  Blender already had the capability to distribute rendering across multiple processes, and an OpenMOSIX kernel would automatically migrate those processes across the cluster to balance the load.  Modern Blender has network rendering plugins that can be used to accomplish the same thing, but unlike an OpenMOSIX cluster, Blender has to be installed on all of the systems and setup to connect to the master instance.  And of course, someone had to write all of the networking code for the plugins to communicate.  This is not as good as OpenMOSIX, but it is better than nothing, and it is 100% portable.

Modern clustering doesn't require special operating system support.  SSI clusters are still incredibly valuable, and the do things that are difficult or impossible to do without centralized management, but for your average distributed workload, they aren't strictly necessary.

During my undergrad training, I wrote a distributed prime number finder for my networking course.  If this sounds like it is way beyond undergrad level work, that's because it is.  I already had 20 years of experience, so I had to find ways to keep things interesting.  I often did this by doing far more advanced projects than were required, and in addition to keeping things interesting, this also allowed me to learn at my own level, giving my degree more meaning than a mere piece of paper saying I knew what I already knew before I even started it.  Anyhow, this distributed prime number finder was a challenge.  The concept was simple.  Counting by twos starting from 3 (aka, skipping even numbers), it used trial division up to the square root of each number, storing primes in a file.  I had written the single threaded version of this around 8 years earlier, so the concept was very simple for me.  I used Python this time (I used C the first time, but with the data limits of C), which has solid built in networking libraries.  This made the network communication fairly trivial.  So I wrote two programs, using a client/server architecture, where the server is a cluster master node program, and the client is a worker node program.  The master sends out ranges of numbers to test, and the clients return lists of the primes in those ranges.  And this is fast, despite being written in Python.  (That said, if I was to do it again, and I might, I would write the testing math in C and turn that into a Python module, to accelerate the math.)  This was very successful, and I ended up with a solid prime finding system that worked in a clustered fashion without needing a formal OS level cluster.

This wasn't without some challenges though.  There was one in particular.  That challenge was ordering.  The primes were still just written to a file, but the program was no longer linear.  It might farm out different ranges to five different systems, but the results might not come back in order, and the file needs to be ordered.  The master created client threads similar to modern HTTP servers, and those threads would store the results in objects passed into them.  When a client completed its range, a new range needed to be sent out to it, so I couldn't just make a queue, wait for the head client to finish, give it a new range, and send it to the end.  I needed to be able to give finished clients new work, even if they finished before the next one in order.  I also needed to be able to handle cases where a client connection closed unexpectedly, so that ranges wouldn't get skipped in those cases.

To solve this problem, I started with the queue.  The queue contained objects that tracked their range and the output of their associated threads, as well as the client for that thread.  When a client completed its section and returned the results, the thread recorded the results and terminated.  The master went through a loop, scanning the entire queue and creating new threads and assignments for any idle client.  When this was complete, it would check the first object in the queue, to see if it was done.  If it was, the results would be written to the file, and the object removed.  Of course, this required mutexes, to avoid race conditions with reads and writes, but this was trivial in Python.  Compared to my single threaded, non-distributed version, this was incredibly fast on just a handful of machines.  In addition, this model was able to take advantage of multi-core CPUs in a highly controllable way.  If your system has multiple cores, you run one instance of the client for each core you want to use.  If you only want to use half of your cores, so that you can use the computer for something else without lag, run half as many client instances as you have cores.  This is something I did in testing, and it worked very well.

My final cluster was a quad core desktop machine, a Dell Xeon server with two CPUs (Pentium 4 class, so both single core), my dual core laptop, and a few Raspberry Pis.  This was before multi-core Pis, so I ran the server on the Pi along with one client node.  On my desktop, I ran four nodes.  On my laptop, I ran two nodes, and I ran two nodes on my dual CPU server.  On the other Pis, I also ran one node each.  This worked exactly as expected.  Overnight I ended up with several GB of prime numbers in a file on my Raspberry Pi master node, with a microSD card with very limited capacity.

What did I learn from this?

First, formal clusters aren't strictly necessary for doing high performance computing.  It's not that hard to write a client/server application for distributed computing.  And even load balancing isn't that hard.  My distributed prime finder automatically load balances, because it sends small tasks and only assigns new ones when the previous ones are complete.  This inherently gives more powerful systems more work and less powerful ones less work.

Second, client/server distributed computing models can be complex, when order matters, but while this can be a challenge, there are simple solutions that most experienced developers should be able to figure out.

Third, SSI clusters can simplify things by eliminating the networking stuff, but things like order dependence will still be a challenge whenever you are distributing work across multiple processes.

Fourth, if I was to do this again or wanted to make this into a polished product, there are a few things I would do different.  Using Python for the master node is perfect, but I might use C or a C module for Python if I was to do this again, to further improve performance.  What's the point of distributed computing, if you aren't going to optimize the program itself?  Python is an excellent language, but I would almost certainly get a big performance boost using C instead, for the heavy lifting.  (That said, for this particular application, part of the reason I used Python was for its infinitely scalable integers, which C does not have.  I suppose I could use GMP for that though, and 64 bit systems provide a much larger range, especially with GCC's support for 128 bit variables and math.)  I would also want to make this more easily tunable.  Currently, if I want to control the number of cores used, I have to run new instances or kill running instances.  It would be better if the clients were threaded and the number of threads used could be controlled from the master node, so that only one client instance runs in each machine, and the whole thing can be controlled from a node.  And maybe for use on untrusted systems, a redundancy level could be specified, such that each range is calculated on multiple different machines and then the collective results are used to verify accuracy.


The advantage SSI clusters have over this is that people with limited programming experience and knowledge can use them to distribute tasks of existing programs that are already multi-processed, while this informal clustering requires specially designed applications.  This strategy is certainly better (in my opinion) than OpenMPI, but it still requires copying the client program to all of the worker nodes and running them, and it requires the application to be specially designed for this.

But, there might still be a solution.  Python can parse and execute code at runtime, which means it should be possible to write a simple library that has a client portion that can just be run as-is with an IP address and port input, and a server portion that has ordered, unordered, and perhaps other modes that take an "assignment" function that gives out tasks and a string that is sent to the clients as the compute program.  The downside is that you wouldn't want to be a client to an untrusted master node, because the program sent might be malicious, and Python sandboxing is notoriously difficult.

Maybe the value of SSI clusters is diminished, but if so it's not because multi-core CPUs reduces its value.  It's because distributed computing isn't that hard even without SSI clusters.  That said, SSI clusters are still incredibly valuable, and their value is only increased with multi-core CPUs and other improvements to hardware.  There will always be problems that take very large amounts of time to compute, where SSI clusters are significantly easier to use than networked distributed applications.

Saturday, January 23, 2021

Dynamic Typing in Python

Python is an extremely powerful language, but with great power comes great capacity to really screw things up.  Overall, Python is an extremely well designed language, but there are a few aspects of it that get attacked a lot.  Using style for control flow is a big one, though some of us actually really like this feature.  Another common complaint is with Python's dynamic typing.  Dynamic typing allows programmers to really screw things up in ways that can be really hard to debug.  Supposedly, it is common for less experienced Python programmers to accidentally have variable naming collisions, and because Python has dynamic typing, there is no type safety system to warn the programmer (though, I should note that no one has ever presented me with any evidence that this is a common problem, and even when I was new to Python, I never encountered this problem in my own code, and when I taught Python to undergrad students none of them ever reported having problems or frustration with this).  Some opponents of dynamic typing even suggest that the ability to change variable types has no practical value and thus should not exist in any programming language.  As someone who has made extensive use of Python's dynamic typing, to great advantage, I would like to present some highly valuable applications of it that have saved me enormous amounts of time.

The one that always comes to mind for me, is the ability to dynamically change objects.  Wait, objects?  How do objects have anything to do with it?  Objects are (composite) variables.  They have a type, that is defined by their class.  Wait, so you can change the class of an object dynamically?  Not exactly, and this is the wrong way to think about it (in this context...).  Most programmers think of types in terms of primitives, which is why many opponents of dynamic typing see no value in it.  If you see objects as collections rather than variables with their own types (defined by the names and types of their contents), it's easy to miss this.  (In Python, functions are also variables, so objects in Python are literally types defined by the variables they contain and their names.  Classes are merely formal definitions of those types, but it's trivial to create undefined object types as well.)  What does it mean to change the type of a primitive?  It means you are storing a value of a different type in it.  If you have a variable containing a string, and you store an integer in it, the string is completely replaced.  Objects are not primitive types though.  They are composite types, that can contain both data and code (or, to be more precise, pointers to code, in nearly all implementations), and multiple elements of each.  While writing a primitive type to a variable containing an object will overwrite the object (or rather, the pointer to it), dynamic typing is more than just overwriting the existing data with data of a different type.  Changing the data type of a primitive, without overwriting it with new data doesn't make a lot of sense.  Changing the type of an object, without overwriting it, does make sense, so long as you don't think in terms of changing the class of the object to another fully defined class.

In Python, most objects (excluding built-in objects) can be dynamically changed, by adding methods and member variables.  Thus, if you have created an instance of a user defined class (aka, an object), and it needs to have an additional member variable and/or method, in Python it is easy to just add it.  Now, it might be tempting to think even this is useless, but in Python it is incredibly powerful.  It allows for macro-like ability in the language, and it is fairly well known that macros are what makes Lisp such a powerful language.  Python's dynamic typing isn't quite as powerful as Lisp macros, but it allows for many of the same kinds of advanced metaprogramming.  In addition, because it is also possible in Python to create empty objects, by instantiating objects that inherit only from the object() class (object() itself is built-in and thus its direct instances are not dynamic), it is also possible to build completely arbitrary objects at runtime.

One of my favorite uses of dynamic typing in Python is adding methods to existing objects, to give those objects additional abilities.  In the context of video games, this means I can add a "render()" method to existing objects, if I want.  Of course it is more practical to just include render() in the original class definition, which is what I actually do.  The practical value of this is much greater, when I am using 3rd party libraries with objects that I want to change, without changing those libraries.

Another handy use of dynamic typing in Python is to assign extra metadata to existing objects.  For example, say I need an identification system for a collection of objects.  I could easily add a universal identifier to all objects put into that collection.  If I need to find or verify a particular object by UID, I can easily do that now, without having to change the class definitions (which I may not even have access to) of all of the objects.  And, now my system can take arbitrary object types, tack on the UID, and use them, without having to know anything about the objects.  Since Python can have collections of different types of objects (which we will talk about in a moment), this makes certain kinds of dispatch systems really easy, and adding UIDs and other metadata to objects can facilitate very advanced security protocols far more easily than languages that don't have dynamic typing.  Sure, I could write a wrapper object, that holds an object being passed in, and contains all of the metadata, and in some cases, this would be the optimal approach, however, this is less efficient (more layers of objects), and it would be significantly harder to add to an existing system.  In addition, a special authentication wrapper could lead to increased coupling between the authorization model and other modules that would have to unwrap objects when accessing them.  If I need to add an authorization protocol to an existing system, or if I need a very cheap authorization protocol that takes minimal time to develop, taking advantage of Python's dynamic typing will easily allow it, with no risk of potentially harmful coupling between the security system and the other systems.

The most complex way I have ever applied Python's dynamic typing system was in automated web site QA testing with Selenium.  This is an especially complicated application, because each page has its own unique set of properties and actions.  Initially, I used a non-object oriented approach, but this proved problematic, as each test needs to be painstakingly created, entirely from scratch, often with complicated CSS selector strings to interact with all of the elements on the page.  Even with three other employees writing tests, we were only writing two or three tests a day, to test usage paths that would only take us a couple minutes each testing manually.  In addition, logging is problematic, as either each individual test needs a bunch of custom logging code, or a global logger only logs minimal data about test failures.  The company I worked for at the time had used PHP for Selenium testing a little before I was hired, and they had attempted to get around these problems by writing a framework to handle a lot of this, but it constantly needed modification, because everything was hardcoded.  Using Python, I wrote a semi-complicated program, using Python's metaprogramming capabilities to dynamically generate a custom object for each page, with methods for all of the actions on the page.  The initial iteration still required test developers to go through each page, adding actions and any input elements to a JSON file, that my program would generate the objects from, but the long term plan was for the program to scan the HTML, looking for interactive elements and their input fields, and generate the page objects from that.  The finished product also would have scraped the page for information to include in logs, when tests failed (the page title, error messages, and such).  This way, a test writer could go to a web site, get the first page object, and then navigate purely by calling action methods on page objects, and then using the returned page object (representing the new page), and when a test failed, the log would automatically include detailed error information, rather than just the URL, the assertion that failed, and whatever description the test writer put in the assertion.  If a dev got to a new page and didn't know what actions were available, Python's introspection capabilities could be used to see all of the actions associated with the page object, to see what options the page presented.  While I quit that job when they decided to migrate their automated testing to Java (making test development much slower and eliminating any metaprogramming capabilities), my program would have allowed amateur programmers (we hired a lot of interns in their first few semesters of CS undergrad studies) to write tests very quickly, without having to even know HTML or CSS, and without having to spend hours going through web pages to construct complicated CSS selectors.

The truth is, most Python programmers use dynamic typing quite often.  It is not typically used for changing data types dynamically though.  It is used in heterogenous lists.  Heterogenous lists (Python's default array/list/collection type) can hold objects of any type, in any combination.  This is really handy, because it means metadata can exist as type, rather than as an explicit part of a variable.  For example, say you are creating an event queue.  It needs to hold KeyboardEvent objects, MouseEvent objects, and a bunch of other event types.  In C or C++, you will have to make an Event type (object or struct) and then have the subtypes inherit (or use a union), and then you will need to hold metadata about what type each object is, within each object (or struct), and then you have to cast Events going into and out of the queue, and from there you need separate code for each type.  (The degree of run-on-ness of that sentence is a good indicator of the complexity of the work involved.)  In Java, type data is handled automatically, but you still have to inherit, cast, and handle each type separately.  In Python, I don't need to inherit from a common class (less coding and greater maintainability), and I don't strictly need to check type.  I can use separate code for each type if I want, but I can also just check if properties exist directly and act appropriately.  For example, if I get an event, I might check if it has a "keydown" property, and if it does, I can check which key it was.  I don't need to know that the object is a KeyboardEvent object, and I don't need to cast objects coming out of the queue to their original type.  And in fact, say I want to use the event system to control an AI opponent.  I can make an AIEvent object with keyup, keydown, and whatever I am calling mouse event parameters, and toss that onto the queue, and so long as I have some way for clients of the queue to tell what events belong to them, I can use the same code I am using for the playable character for AI units!  Now, my AI units can take only AIEvent objects off the queue (yeah, they have to check type for that, and that's fine), and then they can feed the event into the same unit control code the playable characters use.  If this use sounds similar to Java's interfaces, it can be used similarly, to add similar capabilities to a collection of otherwise completely different objects.  So sure, this can be done with other object oriented languages (indeed, this is a common game programming design pattern), but with Python, I can do it much faster, in far less code, because I don't need a special queue, type casting, or as much type checking.  For comparison, SDL, a video game programming library, uses structs and unions, with type metadata variables (that the user must check), to achieve similar behavior.  (C unions are really powerful, and they can be used to create fully dynamic types in C, but with more hazards than Python, because it's easier to overflow a memory location using unions.)

The fact is, dynamic typing is far more than just being able to change the type of a variable at runtime.  It's being able to dynamically redefine objects.  It's being able to store elements with different data types in the same list construct.  It's being able to use objects of different types with related functionality with the same code, without having to do any complicated type casting.  Sure, dynamic typing increases the potential for bugs, but it also decreases potential for bugs, by making code simpler and shorter.  The hazards of dynamic typing are generally far outweighed by the benefits of increasing simplicity and decreasing code volume.

Wednesday, January 20, 2021

Complex Evens and Odds

 I have sometimes found myself enjoying experimental math with questionable practicality.  For example, fractional bases can be fun to play with, and negative bases can be really confusing.  The practical value of fractional bases probably does not exist, and negative bases may have some practical uses, but the complexity is high enough that there probably aren't many practical uses for them.  In writing the previous article on the terms "even" and "odd", I ended up being forced to consider the implications of those terms in complex numbers, and I have discovered some interesting things.  So, I am going to write about it!


Initially, I assumed even and odd cannot apply to complex numbers, however, it turns out I was wrong, and I even managed to prove my wrongness with an example.  There are a few problems that make this hard to understand.  First, complex numbers aren't just two independent numbers.  A complex number is essentially a 2D number, represented as a real and an imaginary component.  It's still one number though.  Now, the magnitude of a complex number isn't just its component added together.  It's more complicated than that.  It's actually the length hypotenuse of the right triangle that would be made, if the other two sides were the length of the two components.  This means that a complex number with magnitude 2 might be represented as sqrt(2) + sqrt(2)i.  The components don't necessarily have to be integers for the actual magnitude to be an integer value, and that really complicates things.  Another problem is that even and odd only make sense in the context of discrete math, thus we can't call sqrt(2) + sqrt(2)i even, because its components are not integers and thus don't fall into a domain governed by discrete math.  This is severely limiting, because it means that complex numbers that fall into the integer domain must have integer components and must have integer magnitude.  That excludes a ton of complex numbers that have integer components, because most don't have integer magnitude.

I am not sure how useful this is.  I think it clearly falls into the realm of experimental math, with questionable practicality, but I find this interesting, and if it even might have practical value, it is worth putting some time into exploring.  The rest of this will be some examples and exploration, to get a better feel for what we are looking at, and to see if anything interesting arises from it.


So first, 1 + 1i doesn't have an integer magnitude, so it isn't a valid complex integer.  Now, on a graph, complex integers can only exist on grid vertices.  This will help narrow down the possible options in a way that can be understood visually.  Since we know 1 + 1i isn't an integer (it's magnitude is 1.41), we know that not all vertices are complex integers.  I am thinking we could take a numbered grid, and then put circles on it at integer distances from the origin, and anywhere a circle precisely intersects a grid vertex, there is a complex integer.  We would, of course, want to verify any integers we find, as we may see an apparent intersection that is merely extremely close and only appears to intersect due to resolution limitations.

There are a few intersection points I know off the top of my head.  For example, the last article used 3 + 4i as an example.  This is the common Pythagoras theorem example, with a magnitude of 5.  We can also swap components, for 4 + 3i = 5, and we can swap signs as well.  This means we have rotational symmetry at 90 degree (quadrant) intervals, and within each quadrant we have reflective symmetry across the 45 degree quadrant bisector.  This is similar to algebraic rules in real math, like the Commutative property, except that because magnitudes are always positive, negative and positive values can be switched without changing the magnitude.  (This does bring up the question of if and how angle plays into this.  Not sure I want to deal with that right now, and I suspect that adding angles will break the whole integer thing, because I seriously doubt any/many complex integers will land on integer angles.  That said, the scale for angles is arbitrary, so "integer" really doesn't apply in any real way...unless maybe this exercise reveals a discrete scale of angles that has thus far remained undiscovered...  Maybe I need to print off this graph with circles and draw lines from complex integers to the origin, and see if that yields any interesting patterns...)  Any integer multiple of either of these is also going to have an integer magnitude.  And that leads to the realization that we are actually looking at vector math here.  We can also represent complex numbers as vectors, <3, 4> or coordinates, (3, 4), and realizing this, it is now obvious to me that the complex domain uses vector algebra for its operations, and this makes me wonder how vector algebra would work in the complex integer domain.  Honestly, I am beginning to think I may have just gotten myself way deeper than I did with negative bases.  I am hoping this isn't more complex than non-reciprocal fractional bases...  (Reciprocal bases, ie., fractional bases where the numerator is always 1, are trivially easy.  The one time I tried a fractional base with a numerator higher than 1, however, was a disaster.  That's one of the most complicated mathematical things I have tried and failed to do.  And I should note, I got an A in multivariable calculus.)

So, I think I am going to make the graph with circles at integer distances from the origin and see if I can find more complex integers.  From there, I can look for patterns.  I am sure I will find some, because that's just what happens with math, but the question now is whether I will find anything significant or not!

I guess I know where I am going from here.  I have convinced myself, through this line of thought, that this might have some practical value after all.  It might only serve to move math theory forward, but that is practical value, as most elements of math theory seem to eventually lead to something of significant value being discovered or created.  I am seeing a lot of potential in discovering interesting and perhaps undiscovered patterns from this, so maybe...

Evens and Odds

Many years ago, a friend of mine questioned the common practice of always rounding halves up.  That lead me to write this article, just a few minutes ago.  It also led me to question elements of math theory and education in general.  That let me down a line of reasoning that motivating me to write this article, and I ended up writing that one first as an example.  (If that logic is confusing, it's because it is.  Basically, I was going to precede the following with a paragraph about the question my friend raised, but by the time I finished, I found I had written a whole article...  Yeah, that happens to me sometimes.)  Anyhow, on to the topic at hand.


In elementary school math, at least in the U.S., there is a point where a lot of emphasis is put on classifying numbers as even or odd.  We are taught, implicitly if not explicitly, that oddness and evenness are fundamental properties of numbers, and that telling the difference is a critical math skill.  Rather than exploring these ideas, let's just cut to the chase: This is all wrong.  Oddness and evenness are not fundamental properties of numbers, and there is very limited value in being able to classify numbers as even or odd on sight.

The first problem is that the vast majority of numbers are neither even nor odd.  I hear you saying, "But half of all numbers are even, aren't they?"  Nope, not even close.  In fact, so few numbers are even, that one might reasonably claim that evenness doesn't exist, statistically.  How can I say this, when you can just count by twos indefinitely or even provide a mathematical proof that there are an infinite number of even numbers?  Consider, how many numbers are there between 0 and 1?  The answer is an uncountably infinite number.  0.1, 0.01, 0.001, 0.2, and so on, and not a single one is even.  In addition, they are not odd either!  One the other hand, even numbers (and odd numbers) are only countably infinite, and for all practical intent, countably infinite divided by uncountably infinite is zero, thus the percentage of all numbers that are even or odd is 0%.  It's almost (but not quite) like even and odd numbers don't even exist.

Now, if we add some qualifiers, we can make even and odd numbers relevant.  Instead of saying that evenness and oddness are fundamental properties of numbers, let's instead say they are fundamental properties of integers, aka whole numbers.  This is actually true, and every whole number is either even or odd.  Well, that may not be precisely true...  Let's further qualify that, every whole real number is either even or odd.  (Imaginary numbers...  There are a few ways you might try to qualify complex numbers as even or odd, but they are unintuitive.  Purely imaginary integers, with no real component, could be even or odd the same way real integers are.  For example, 2i is even.  Complex numbers, however, really break when it comes to even and odd.  For example, technically sqrt(2) + sqrt(2)i is even, because it's magnitude is 2, and thus, when divided by 2, its magnitude is 1.  Also, 6 + 8i has magnitude 10, and dividing it by 2 is 3 + 4i, with magnitude 5, and if those look familiar, it's because they are the common example for Pythagoras Theorem.)  So, if we limit our set of possible values to real integers, even and odd are relevant, and every possible value is either even or odd.  Now it is a fundamental property.

We can make even and odd relevant, by limiting our set of values, but are these designations useful?  We first need to define "even" and "odd", before we can really determine their value.  In schools, we are taught that numbers that divide "evenly" by 2 are even, and those that don't are odd.  What does this mean though?  In real numbers, I can divide say, 3 by 2 and get 1.5.  Where's the problem?  3 seems to divide by 2 evenly.  Again, we have to limit our domain to integers for this to make sense.  At the deepest level, what "even" means in this context is that when we divide a number by 2, all groups this creates have an equal (or even) number of elements.  So, if I divide 3 by 2, I get a group of 2 and a group of 1, thus 3 isn't even, but if I divide 4 by 2 I get two groups of 2, which are of equal size, this 4 is even.  We can define evenness more mathematically though, using discrete math concepts.  Discrete math is merely math using only whole numbers, aka "discrete" values rather than continuous values (real numbers).  Discrete math has different mathematical operations than continuous math.  The prime example is division, where in continuous math, you just split numbers in smaller parts, for example, 3 / 2 = 1.5, but in discrete math, division gives two outputs.  One is the number of even groups the numerator gets split into, and the other is the size of the remaining uneven group, otherwise known as the remainder.  Thus, in discrete math, 3 / 2 = 1R1, that is, 1 with a remainder of 1.  Discrete math also has an additional operation that is a partial division, that yields only the remainder, which is called the "modulus" operation.  In programming, the modulus operation is often represented with the percent sign, thus 3 % 2 = 1.  Now we have a foundation for defining "even" and "odd".  An even number is a number where the modulus is 0, when it is divided by 2, or, x is even if and only if x % 2 = 0.  A number x is odd if and only if x % 2 = 1.  So even and odd are actually precisely defined by the remainder of dividing a number by 2.  Odd and even are thus a really quick way to see if a number is divisible by 2, and since we divide by 2 so much more often than any other number, is really useful...right?  Do we though?  Sure, I think we do divide by 2 more often, but not that much more often.

So, here's the problem with this, in my mind: What about division by 3 or 4 or 5...?  How often do we find we need to split things into 3 groups or more?  Well, plenty often!  So why don't we have an equivalent of even and odd for division by 3?  First, there are infinite numbers we could divide by, so it isn't possible to have special terminology for all potential divisors.  Second, the possible outputs of the modulus operation scales with the magnitude of the divisor.  For example, 3 % 3 = 0, 3 % 4 = 1, 3 % 5 = 2.  So 3 wouldn't just have even and odd. It would have an analog of even, for x % 3 = 0, and then it would need two terms for x % 3 != 0, one for x % 3 = 1, and one for x % 3 = 2.  Sure, we might just combine the two uneven ones, but that's what we already do, when we say something is or isn't divisible by 3, so there really wouldn't be any value in it.  And the kicker: If this is true, then how does "even" and "odd" have more value than just saying something is or isn't divisible by 2?

The answer is this: Even and odd is just terminology for saying whether a number is divisible by 2 or not.  It's not consistent terminology though, because 2 isn't the only number we ever want to divide by, and we don't have any equivalent terminology for any other number.  Further, even and odd only exist within the domain of integers.  In the domain of real numbers, even numbers that are even or odd in the domain of integers are evenly divisible by everything within that domain.  Thus, in the domain of real numbers, either 2 and 4 aren't even, or 3 and 5 are even, because fractional parts allow them all to divide into perfectly equal groups always.  Mathematically though, we can assert that evenness and oddness cannot exist, within a domain that does not have a modulus operation.

What it all comes down to is that even and odd are merely terminology limited to the domains within which all operations and values are discrete.  Outside of computer science, discrete math is limited mostly to casual, day to day math involving coherent objects, and these terms are only useful when dividing by 2, which is common, but not so much more common it needs special treatment.  Given that, I would assert that "odd" and "even", while legitimate properties of whole numbers, are not unique properties that justify their own terminology.

The Problem of Rounding Halves

While the vast majority of posts on this blog are about computer science topics, it was created as a technical blog, not specifically a programming blog.  As such, I may from time to time, write posts on other technical topics.  For example, the topic of this post is math theory.


I have a friend who questions the common practice, taught in public schools in most, if not all, Western countries, of rounding up halves.  For example, say you are rounding to the closest whole number, and you are presented with 0.5.  You round up.  Why?  Is it because 0.5 is closer to 1 than it is to 0?  No, because it isn't.  It's exactly equally distant from both.  The bias for rounding down is exactly equal to the bias for rounding up, so why do we round up on halves?  The answer is that someone decided it should be so.

In theory, math is based on pure, logical rules, but in the case of rounding, the boundary rules are purely arbitrary.  If you are on the boundary of equidistance between your rounding options, you always round up, not because it is a logical rule defined by math itself, but because someone decided it should be done that way, probably to avoid answering what turns out to be a pretty hard question.  But it doesn't really matter does it?  The direction we round on the boundary doesn't need to be logical does it?  Well, let's find out.

Say we have a data set with 10 samples, between 0 and 1.  We want to take an average of the data set, so we can use it to make a decision, and we are treating the samples as votes.  The samples were generated by rating like/dislike for the proposal between 0 and 10, and the results are stored as decimal fractions based on percentage scale (1 is 0.1, 2 is 0.2, etc...).  Because we are treating them like votes, we are rounding them to the nearest whole number.  Any ranking below 0.5 is treated as a vote opposed to the proposal, and any ranking 0.5 or higher is treated as being in favor of the proposal.  This makes sense right?  It's almost exactly how we vote in government elections.  The votes are taken as whole numbers, even though the actual voters are almost never 100% for or against any given candidate.  The vote has to fit into a discrete value.  Now, let's generate a sample set: 0.2, 0.5, 0.3, 0.5, 0.1, 0.5, 0.5, 0.7, 0.2, 0.5.  Reducing these to votes, we get 0, 1, 0, 1, 0, 1, 1, 1, 0, 1.  That's 6 votes for the proposal versus 4 votes against, so we would say the proposal is supported by a 60% majority.  But, the raw data actually shows a very different result.  Four responses are opposed to the proposal, one is in favor of it, and five are balanced.  The average of the raw data is 40%.  So wait, when we round the data, to turn the responses into vote, using the arbitrary round-up-halves rule, we get 60% for, but when we average the unrounded data we get 60% against?  (This is not contrived, I just came up with values where half are on the boundary and the other half are mostly against, and this exact juxtaposition just happened.)  The rounding rule actually flipped the results, so they are the opposite of the true outcome.

So, is there way to overcome this issue?  Is there a mathematically logical solution for rounding halves?  There is, but you aren't going to like it.  The answer is, half of the time round down, and the other half round up.  You might be tempted to say we should just throw out the halves, but when we do that, we get 4 to 1 against with rounding (80% against, rather than the canonical 60%).  Sure, rounding will never perfectly match the raw data, and at least this way makes the most democratic decision, but 20% is a huge deviation from the raw data.  It looks like an overwhelming majority supports the proposal, rather than a moderate majority.  On top of that, there are cases where throwing out the halves will make the wrong decision (for example, in decisions that are more than two ways).

We have a new problem now though: How do we decide which to round which way, and what happens when we have an odd number of halves?  We can't just alternate, because when we do have an odd number of halves, we run into the same problem.  If we always round the first one down, we are favoring rounding down (because on odds one more rounds down than up), and if we always round the first one up, we are favoring rounding up.  Sure, the problem only arises when there is an odd number of halves, and even then the vast majority (all but the final one) are rounding evenly, which significantly reduces the problem, but it doesn't solve it.

There is only one solution: Non-determinism.  Wait, that's just randomness, and randomness isn't logical is it?  Professional mathematicians may be unaware of this or just plain reject it, but at the quantum level, the universe runs on randomness, and thus if randomness isn't logical, the universe isn't logical, and math is an abstraction that doesn't even apply to the universe.  In other words, if math can describe the universe, then yes, randomness is a logical part of math.  The correct answer to the problem of rounding halves is that the direction of rounding should be random, with equal probability.  In education we could simulate this with die rolls, and on very simple rounding problems, students could be expected to provide all possible solutions.  In well crafted simulations, we already do this, often without even realizing it, by using integer data types or abstracting numerical properties in ways that eliminate the rounding problem.  In places where the results are critical, we would want to use true quantum randomness, when rounding halves.  And yeah, sometimes we would still get the wrong outcome, but this is the best math can do, when it comes to rounding halves, and in the long run, at least it will all average out.

Thursday, October 3, 2019

Advanced Password Security

A while back I wrote an article about password security, discussing common password requirements and how to create better passwords.  The takeaway of that article is that common wisdom concerning password security is wrong, and the best passwords are long (16 characters or more) and use memorable made up words.  Patreon recently sent out a link to some password advice, which includes more common wisdom on passwords and authentication in general that I want to challenge.

Patreon makes a number of claims about password security.  One is that the more random a password the more secure it is.  This is actually false.  Increasing randomness of a password only makes a difference if it changes classes.  Suppose a password uses the birth date of the account owner.  This is a highly insecure class of password, because anyone who knows sufficient identifying information about the account owner can gain knowledge of his or her birthday.  Name may be sufficient if it is fairly unique, and name and state/country of residence can be sufficient even if the name isn't terribly unique.  Name and place of birth may be enough to gain access to birth certificates.  Even if this is not enough, an attacker might be able to narrow it down to the birth dates of only two or three people, which is still easy enough.  Birthday and other personally identifiable information is one class, which also includes things like names of children and pets, other dates of personal significance, personal catch phrases, and so on.  This is an incredibly weak class of password, as those familiar with you or who have access to content you have published will likely have access to the information used in your password.  Another class is common words.  This category is much larger and thus more secure, though it is not more random, and it is still very insecure.  Both of these categories are easy to crack, because the total entropy in the password is less than the entropy of the sum of the characters.  Increasing randomness of these classes won't make a significant difference.  It does not matter how random the common words or elements of personal information are, passwords in these classes will be approximately equally insecure to all passwords in the class, given that they have about the same number of elements.  (In this instance, elements are not characters but rather words or coherent pieces of information.)  Going from common words to uncommon words is a solid jump into a more secure category, but entropy is still lower than the sum of the entropy of the individual characters.  The impact of increasing randomness is still negligible here.  This might be represented as the difference between a phrase made of uncommon words and a series of random uncommon words.  If the phrase is significantly more common than the words, increasing randomness will make the password more secure, but if the phrase is less common, it will not make any difference.  Security is based on the most common coherent element.  A phrase is more coherent than a series of words, but if the phrase is less common than the words, an attacker is far more likely to attack the words than the phrase.  Once you get into classes where there is no coherency above the character level though, increasing randomness will have no affect.  In other words, a 16 character password composed of made up words is not going to be any more secure than a random password made of characters from the same classes.  Thus a 16 character password of made up words made of all lowercase characters is no more secure than 16 randomly selected lowercase characters, but it is many times harder to memorize.  Adding additional classes of characters (numbers, special characters, uppercase letters) does increase security, but only negligibly, at the cost of making it even harder to memorize.  Increasing randomness only increases the security of a password if it pushes the password into a more secure class.  Otherwise it just makes the password harder to remember without having a significant impact on security.  Once a password is in a class where strategic attacks either will not work or are not worth the cost, increasing randomness will not increase security even if it does push the password into a theoretically more secure class.  Suggesting that increasing randomness increases security of a password is rather irresponsible, because in real life applications doing so makes far less difference than merely increasing the length of the password by a few characters, and making a password less memorable motivates the user write it down, making it far less secure.

Another recommendation of Patreon is to use a password manager.  Patreon claims that password managers are highly secure, but this is not actually true.  A password manager is no more secure than the authentication it uses, which is typically no more secure than the password a user would make up for another account.  Password managers have three advantages.  One is that they can generate more secure passwords than the average user would make up, without the user having to memorize them.  This cannot make passwords significantly more secure than long passwords of made up words though, because there is no universal way to do that.  Even going from all lower case to both cases, number, and symbols increases password entropy by less than 2 bits per character.  Adding an additional character or two will have a far larger impact.  In short, this is not a significant advantage over just making up a few words for a password.  Another advantage is having different passwords for different accounts.  This is a significant security advantage, but it comes at a heavy cost.  Reusing passwords is a significant security issue.  In theory, if one account is compromised, reusing passwords can compromise all accounts with that password.  In practice, it is more complicated than that.  To leverage this, an attacker would have to learn the usernames for all accounts sharing that password.  That might be easy if the compromised account is an email account that the other accounts report to, with emails that mention the account usernames.  It might also be easy if the attacker has gained access to your computer, though this is less relevant, because in this case the attacker has already compromised something more valuable than a single account.  The attacker could just track your logins directly.  Further though, compromising one account is only sufficient to obtain a password if the security on that account is already poor.  Any competent organization will use obfuscation techniques to protect passwords, such that even if an attacker did gain access, your password would be in little danger.  There are still companies that store passwords in plaintext, but this is becoming far less common, and major companies have pretty much all been scared into using good password security by the massive liabilities doing otherwise would entail.  The third benefit of a password manager is not having to memorize passwords.  Typically, a password manager has a single password used to access all of the others.  The passwords are all encrypted, to maintain security, and they are only decrypted when being used.  Tracking passwords for you is quite convenient, but it does not increase security.  The only place where password managers improve security more than just using easily memorized made up words is making it easier to avoid password reuse.  This is certainly valuable, but as internet security in general improves, the value of this diminishes rapidly.  The vast majority of people reuse passwords all over the place.  If this was as big of an issue as companies like Patreon make it out to be, large scale compromises would be happening constantly, and every large scale security breech would be followed by large scale financial fraud.  The fact that we periodically see large scale compromises without immediate large scale financial fraud suggests that current password security measures are sufficiently mitigating the threat of password reuse.  Not to suggest that password reuse is entirely without risk, but rather, the evidence suggests that it is no more risky and perhaps significantly less risky than merely using the weakest allowed password.

Password managers come with a serious downside though.  If you use a password manager, that means you do not know any of your passwords.  It is easy to forget that security has two goals.  One is to protect assets from those who should not have access to them.  The other is to grant access to those who should have it.  Imagine you have a trusted secretary who memorizes all of your passwords for you.  What are the risks involved with that?  The risks for a password manager are the same.  If your secretary dies, you lose access to your accounts.  If your password manager quits working, maybe because your hard drive was corrupted, you will also lose access to your accounts.  If your secretary is compromised, all of your passwords and usernames could get leaked.  If your password manager is compromised, the same could happen.  A password manager is a single point of failure in both cases.  A password manager that backs up your passwords to the cloud might be able to avoid the first issue, but that is like your secretary telling someone she trusts your passwords or writing them down and storing them somewhere.  This mitigates the problem of losing your passwords at the expense of increasing the attack surface, making them more likely to be obtained by someone who should not have them.  This can actually increase the attack surface a lot.  Commercial password managers store the password data on commercial servers managed by multiple people.  This means that backing up the data to the internet could give a lot of people you have never even met access to your data.  True, it may be encrypted, but encryption is inherently reversible.  Secure web sites typically hash passwords, which is close to impossible to reverse.  A password manager needs to be able to reverse it though, to use the passwords, meaning they have to store the passwords in an inherently less secure form.  Password managers can be quite valuable, but suggesting that they significantly increase security is also rather irresponsible, without at least discussing how they do that and where they can reduce security.

The final recommendation that needs some discussion is multi-factor authentication.  This legitimately increases security, fairly significantly.  Multi-factor authentication uses two or more independent authentication methods.  Typically one is a password.  Entering the correct password will generally trigger a second step, where some other strategy is used to verify that the person logging in is the person who created the account.  The most common form of multi-factor authentication is through text messaging.  The user will enter the username and password, and the site will send a text to the user's phone and prompt the user to enter some data included in that text.  Another form of multi-factor authentication is the use of a separate device that generates temporary keys, which can then be used for a limited time to prove possession of the device.  All common forms of multi-factor authentication rely on the possession of some physical device associated with the user, however this is their weakness.  If the user loses the device or the device is damaged, access to the account can be lost.  Any form of authentication used to recover an account using this sort of multi-factor authentication is either less secure than the multi-factor authentication, rendering it useless, or it is so complex and time consuming that it denies access to the legitimate owner of the account for a potentially unacceptable length of time.  Multi-factor authentication can be a very powerful means of increasing security, but the risks associated with it are quite high.



One of the biggest security recommendations for passwords is to never write them down or to keep them in a highly secure location if they are written down.  All three of these recommendations either create motivation to write down passwords or are equivalent to writing down passwords.  Making a password more random makes it harder to remember, which is why people write passwords down in the first place.  A password manager literally writes down your passwords in a digital form.  In multi-factor authentication, your phone or key fob becomes a tool for accessing a password.  (The key texted or generated is nothing more than a second password.)  The risk with all of these is the same as well.  Every single one has a high risk of loss.  A random password is difficult to remember thus easy to forget.  A password manager relies on the integrity of the system or systems the passwords are stored on.  A multi-factor authentication device can easily be lost or destroyed.  And of course, password managers increase the attack surface, actually decreasing overall security, compared to a strong password that used only on trusted accounts.


Password security is far more complex than even most internet companies understand.  Security should both protect from unauthorized access as well as guarantee authorized access.  This is why passwords were chosen as the default form of digital security in the first place.  A well memorized password is very difficult to lose or steal, but it can grant very quick access when needed.  The problem is that "experts" who are too clever for their own good have been recommending bad password generation practices for decades, and this has lead to security issues that have prompted the creation of products that really should not be necessary for anything short of national security, where denial of access may be vastly favorable to a leak of information to unauthorized parties.

A good password has two properties.  One is that it is hard for someone who is not authorized to know it to figure it out.  The other is that it is easy for those who are authorized to know it to remember.  Security is not just preventing unauthorized access.  If that was the case, perfect security could be achieved merely by destroying the assets we want to protect.  Providing timely and convenient access to those who are authorized is no less important than preventing unauthorized access.  This means that any form of security that significantly hinders or compromises authorized access isn't very secure, even if it does perfectly prevent unauthorized access.  A password that is hard to memorize cannot reasonably be considered secure, because it does not provide access the way it should.  A tool that makes accessing secured assets significantly more difficult is a compromise of security.  Any security strategy that creates a significant risk of denial of access to authorized parties cannot reasonably be considered secure.  When assets being protected have extremely high value, it may be worth trading some of the accessibility side of security for protection, but this is not the same as increasing security.  It is exchanging security for protection.  And this should be a decision made by the owner or owners of the assets, not a decision that the owners are pressured into by second or third parties.  Patreon should not be telling its users that they should adopt these security measures.  Patreon should be telling its users about these security measures, including the risks associated with them, and then it should leave the decision up to its users, without pressuring them into it.  Patreon should also be informing its users of alternatives, for example, making your password longer is a far more effective strategy than making it more random.

Password security is a poorly understood topic, especially among those who have the biggest voice on the topic.  It is a lot simpler than many people seem to believe, and many of the strategies that are recommended actually reduce security, either by increasing odds of denial of access or by motivating users to circumvent more legitimate protections.