Monday, June 19, 2017

Why Are Pipes Hard?

I learned something amazing today.  I have done a few small multi-processing tasks in the past, some using Python and one using Java.  The most appropriate form of interprocess communication for these was pipes.  The experience was always very painful.  I had to figure out how to cobble together the right objects to produce a usable communication system.  I had to learn some special uses of objects to allow the parent process to continue executing, instead of waiting for the child to finish.  Working out the bugs was very difficult, because it was hard to tell if the problem was my communication protocol, if I had connected my objects wrong, or if maybe I had given an advanced function or method some incorrect parameter.  Of course, I expect this sort of difficulty.  I had heard from others just how hard interprocess communication is, especially with pipes.  That is just the way it is, I guess.

Today, a friend sent me a link to this.  That document discusses several types of interprocess communication.  There is one chapter on pipes, and the author starts the chapter, "There is no form of IPC that is simpler than pipes."  I found this rather amusing and slightly confusing, as this was not my experience.  Shortly following that, he adds that the document will not spend much time on pipes because they are so easy.  This was confusing.  By the end of the section, I knew two important things.  First, the author is right; pipes are incredibly easy.  Second, everyone else is doing pipes wrong.

In Python, you have to create and pass pipes into the function call that starts a new process.  I forget the exact details, but I seem to recall having to use a different function or at least put some obscure parameter into the regular one to get it to return before the child process was completed.  Then I had to use some method of the pipe object to interact with the child, and this was rather difficult to get right.  In one case I did have some issues with my communication protocol, but I thought it was yet another issue with how I was trying to use the pipe.  It took half an hour to figure out that the problem was some trivial thing with how was formatting the data I was passing through the pipe.  In short, the system was completely opaque and very difficult to use.  Java was not much better (in fact, I think I gave up on communication while the child process was still running, and just made the program call multiple times).  The friend who sent me the link told me he had the same experience when he read that section, except with the Go programming language.

Now, that document describes how to use pipes in C.  First, you create a pipe.  This requires merely making a single function call, passing in a two integer wide array.  The array is populated with a pair of file handles for the reading and writing end of the pipe respectively.  Then, you treat those file handles like you would any regular file!  The main difference is that they are more like standard in and standard out than a file on disk, but any programmer worth his or her salt should know how to deal with that!  The document goes on to use read and write calls to send and receive data, but you could easily use any function that reads or writes files for this.  If there is anything complicated about pipes, it is only that they have limited space, but 10kb is a pretty big limit.  Once you have a pipe, you just fork the process, which will provide the child process with copies of the file descriptors to the pipe.  Now the two processes  can communicate with read and write calls (or other file I/O calls) on the pipe.  Of course, if you need two way communication, you will need two pipes, but that is trivial.

The fact is, pipes are incredibly easy!  For some reason though, only low level languages can manage to get them right.  I don't often have something bad to say about Python, but this is one place where Python got it totally wrong!  I have plenty of bad things to say about Java, so it does not really surprise me that Java missed the mark on this one.  So why are pipes so hard in language that are higher level than C or C++?  In fact, I am almost certain that pipes are easier in assembly language than in higher level languages (and I know what I am talking about here).  Clearly the operating system has it right.  Why can't high level languages get it right?  They are supposed to make programming easier if I am not mistaken, but when I start considering writing a C module for Python, because that would be so much easier than just using the built in, you know something is wrong.

The problem is not terribly difficult for me to identify, but much of the programming world is so in love with Object Oriented Programming that they don't want hear anything bad about it, let alone admit that it might have a serious problem.  The problem here is that the OS gets it right in the first place, so it does not need complicated wrappers to make it easier.  At the OS level, pipes are about as simple as it gets.  C is wise to just leave it alone and let it do its thing.  Python, Java, and other languages, however, feel obligated to wrap everything of objects, because hey, object oriented.  The fact is pipes don't need objects.  They are perfectly fine just as they are.  Wrapping them in objects makes them absurdly more complicated and difficult to use, without adding anything of value.  I was actually stunned when I saw the C code in that document.  I had a hard time believing that it was correct, because my experience did not support that conclusion.  Using objects where they are unnecessary is bad design, and we really need to teach programmers to stop doing it.  There are good reasons to use objects, but there are also good reasons not to.

I have talked about this before, but the problem here is OOP, and that is just as absurd as Integer Oriented Programming (or Bit Oriented Programming, as a friend suggested).  When we decide to orient all of our programming around a single data structure (or metadata structure, in this case), we limit ourselves.  We destroy our ability to make good decisions about what tools will be the best for each problem.  Objects are a great tool, but we don't use a chain saw to screw in a screw or hammer in a nail.  Honestly, I feel like multi-processing in Python or Java is just like following some really complicated instructions for how to push a peg into a hole with a table saw without ever touching the peg or injuring yourself.  It would be really nice if those languages would just let me pick up the peg and push it into the hole with my finger, instead of trying to build a huge complex machine around such a simple process.

No comments:

Post a Comment