Thursday, March 31, 2016

Building a Basic Render Farm


Parallel computing is really neat. Obviously, it's important in science where you can break computational tasks into smaller parts and distribute the work among many discrete systems. But for rendering, it's incredibly important. Take Pixar, for example. They have a massive rendering farm and yet it sometimes takes them more than a day to render a single frame because of the complexity of the scene (hundreds of lights, shadows, etc.) This project clearly won't reach those levels, it's just to show that it can be done fairly easily to tween images across another using ImageMagick on multiple computers.


Our goal will be to make a 10 second 30fps 640x480 video from image parts and tweening. It's possible to do it larger (we could push this to however much memory and storage space we have) but it's faster to do it smaller. We will use the power of multiple computers on a network to process each frame and assemble them at the end using ffmpeg. Here's how the pipeline is going to work.

  1. The head node will parse a file that describes what moves where and make sure that all of the assets exist.
  2. The head node will parse a list of nodes that are available for processing (manually generated because we're not using a library or anything and we're using HTTP) and decide how to split up the job between the nodes. In this case it'll be evenly distributed in chunks.
  3. The head node submits the job and all of the assets to the nodes and the processing begins. The head node will give each node a callback.
  4. Each node will render the frames and video fragments and send it back to the callback provided by the head node.
  5. Once each fragment is accounted for, the head node will feed them through ffmpeg to create a final product.
  6. All of the nodes clean up and remove the temporary frames and fragments.
The pipeline is fairly straight forward, but we're not done with design.

To define how images should move across the screen, I've developed a JSON format that describes just that. Here's an example:
{
  "background.png": {"z":0},
  "happyface.png": {
    "z":1,
    "motion": [
      {
        "st": 0,
        "sx": 0,
        "sy": 0,
        "ex": 1000,
        "ey": 500,
        "et": 10
      },
      {
        "st": 10,
        "sx": 1000,
        "sy": 500,
        "ex": 20,
        "ey": 30,
        "et": 20
      }
    ]
  }
}
Each asset is listed in the root object and contains an object describing what it does. The z property describes the ordering. The lower the number, the farther back the object will be. 0 is the background. Next, a motion object describes the path an object takes. We're going to assume linearity here and not deal with easing. We define a start time, start x, start y, end x, end y, and end time. The workers will interpolate the position from this data given the frame's location in time.

To do this interpolation, we use the linear easing equationwhere t is the time elapsed in the animation, d is how long the animation lasts, b is where the object started, and c is how much it will change (end-start). This equation then returns the position of the element. This will need to be calculated for both x and y dimensions.

So let's write the head node first. We'll call it "masternode.php". It has to decide how to delegate given the number of nodes it has. So we'll give it a plaintext list of where the endpoints can be contacted. It'll append "rendernodeinterface.php" on the end of it, so it can be located virtually anywhere in the world. You'll see what I mean in a second.

We'll have four code files. "masternode.php" will be what you execute to start the job. It'll send all of its stuff to "rendernodeinterface.php" which collects all of the data and starts "rendernode.php" which does the actual rendering. On completion of the fragment, it posts the file back to "masternodeinterface.php" which collects the frame and stores it. "masternode.php" waits until all of the frames are collected then it assembles the movie and returns when it's done.

An important thing to change is the maximum file upload size PHP will support. I set mine to a whole GB on each node and also allowed 200 files per request. There's going to be a lot of data being passed between these and in terms of setting this stuff up, we need to reduce overhead. However, being the hypocrite I am, I was going to send each frame back to the head node as it's made. It was here that I made a design decision. I'd render chunks of video on each node and send them back to the head node to be concatenated.

I wrote the files in order of the pipeline. I started with the Master Node, then the Render Node Interface, then the Render Node, then the Master Node Interface. Right now they only support motion. But it can move multiple objects at once and run multiple render jobs at once (it distributes everything to every node, so if you only want one node working on one job at a time, you need to change the node list).

Once I got it working with one node (the computer the master node code was running on), it was time to try it with multiple nodes and see if the division of labor messed with it. I started 5 micro instances on Google Cloud to test it. I added another element with a animation path that starts and stops at different times than the happy face and eventually overlaid it. That file is called "motiontest2". You can download it all from the github repo listed further down.

I'd like to make a note about this system's resource usage. As it's written right now, it uses almost all of the processor it can be given as well as whatever memory it needs. With my test video, it's not a lot at all. It tops out at about 50Mb on one node, but it really depends on the size of the render job. I've freed up all of the ImageMagick data the instant I'm done using it. That cuts down on the memory usage a lot because then we don't wait for PHP to deal with it. It also makes use of the file system to keep track of jobs and pass things between process occasionally. Probably the worst thing I did here was to check to make sure we have all of the video fragments every two seconds before assembling them.

Also, I've hardcoded the callback URL, so I'd definitely make it more of a configurable trait. Maybe make that part of the node list? There are a lot of things I'd change about this but I didn't want it to get too complicated.

The speed improvement is obvious using micro cloud instances. It went from 4 minutes with one node to 1 minute with five nodes.

Here are some things that I'd like to implement later on if I want to make this better:
  • Implement other tweenable things like opacity, rotation, size, etc.
  • Make the input format contain more information.
  • Actually respect the "z" attribute. It's never read, it just orders them in the same order as the file. 
So, without further ado, the code and the output!

And here's the github! Make sure to change the callback URL as well as the node list. You can look at my test list for an example of how to do that. Have fun extending this!

No comments:

Post a Comment