Using PyMek

This chapter will give a tutorial in the use of PyMek, aswell as introduce some of the concepts that are vital for the further understanding of what PyMek does. It will also double as a tutorial for distribution with PyMek, and can be read as a standalone document in the doc directory of the distribution.

A more detailed reference may be found in the appendices of the thesis, or in the file reference.rst in the doc directory of the distribution.

First, let us have a look at our interface to PyMek. PyMek can be used as a package from any Python program, but for most people, they will use PyMek as a standalone program. In this chapter we will be focus on use of PyMek, the program, but in so doing we will also take a look at how the various modules of PyMek work together.

Getting started

Let us invoke the built-in help command:

mortenlj@atlas pymek $ pymek.py --help
usage: pymek.py [options] <target1> <target2> ...

options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  --loglevel=LEVEL      Set level of logging to LEVEL. Level is one of:
                        OFF, DEBUG, INFO [default], WARNING or ERROR.
  --logfile=FILE        Write log to FILE, according to loglevel.
                        [default: No logfile]
  -f FILE, --file=FILE  Build project according to FILE. [default:
                        pymekfile.xml]
  -t NUM, --tasks=NUM   Execute at most NUM tasks at once. [default: 2]
  -s OPTION VALUE, --set=OPTION VALUE
                        Set an OPTION to a VALUE.

As we can see, PyMek does not have all that many options for use on the commandline, this is because most of its functionality is controlled using the PyMekfile. Still, there are a few options here, we should know what they do.

The --loglevel option is how you tell PyMek its verbosity level. DEBUG will print a wealth of information, telling you everything it is doing. On the other end of the scale is "OFF" where it will keep completely quiet, not even printing error-messages. The default is "INFO".

With the extreme amounts of debugging output, it would sometimes be good to save the output to a file. This is accomplished with the --logfile option, which sets a filename for PyMek to write the log to. The log will contain the same as the screen output that PyMek generates. If a task calls an external program however, the output from that program will not end up in the logfile, unless the task has taken special steps to capture it.

When PyMek is executed, it will look for a file called pymekfile.xml in its current directory, unless the --file option is used to tell it otherwise. The PyMekfile contains information about the project, and is where PyMek gets its instructions from. A valid PyMekfile is required for doing anything with PyMek besides printing the help and version.

If you use a multi-processor computer, the default settings for PyMek might not be to your liking. Normally, a single CPU can handle two simultaneous tasks at the same time for maximum efficency. On a computer with more than one CPU, you should adjust the number of tasks by using the --tasks option. Normally, you would set it to Number-of-CPUs + 1.

The final option is something of a workhorse. It allows you to set any configuration variable to any value. This is so that the commandline does not have to know about all the variables any given task will accept, it will only propagate the variable given to the main configuration, so that the task can pick it up from there.

Configuration

The --set option is a powerful way of adding arbitrary configuration variables to PyMek at each running, but if you want to save your settings, there is a possibility for that aswell.

PyMek will look in a few pre-determined locations for a configuration file. There is no platform-independent way of locating the default system configuration directory, and the same applies for a users configuration directory. For that reason, PyMek will simply use any and all files that match the following six locations on the current system, expanding the values of %(variable)s-expressions:

/etc/pymek.conf
%(home)s/.pymekrc
/Library/Preferences/pymek.conf
%(home)s/Library/Preferences/pymek.conf
%(profile)s/Application Data/pymek.conf
%(appdata)s/pymek.conf

The variables are expanded according to this table:

Variable Value
%(home)s The value of the HOME environment variable
%(profile)s The value of the ALLUSERSPROFILE environment variable
%(appdata)s The value of the APPDATA environment variable

This file uses the so-called INI-syntax that was made popular by Microsoft Windows. The file is divided into sections, where each section contains a number of variables and a corresponding value. PyMek itself only cares about the PyMek section, but tasks are allowed to have their own section. By convention, a task should use a section by the same name as itself, but there is no enforcement of this, which allows several versions of a Java tasks to share some configuration.

The options set in this configuration file will be overridden by the ones on the commandline, either by the regular options, or by the --set option. Unless specified by the use of dot-notation, the --set option sets variables in the PyMek section. In order to set a variable in another section, simply prefix the variable with the sectionname and a period, for eksample --set Java.compiler jikes.

As a last resort, some tasks will allow the use of parameters in the PyMekfile, which we will return to shortly.

A simple example

To see how PyMek works, let us look at a small example of a project that does a complex version of the "Hello World!" example for Java. We split our Java program in two files, so that PyMek can actually do some work.

First, the sourcecode for our example, this class takes care of printing:

class out
{
    public static void print(String txt)
    {
        System.out.println(txt);
    }
}

Second, the main program:

class hello
{
    public static void main(String args[])
    {
        out.print("Hello world!");
    }
}

In order to combine these two into a project, we have the following pymekfile to define the project and the dependencies between the two files:

<?xml version="1.0" ?>
<pymek xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://folk.uio.no/mortenjo/PyMek
        http://folk.uio.no/mortenjo/PyMek/pymekfile.xsd">

<node>
    <name>Java_hello</name>
    <filename>hello.class</filename>
    <children>
        <node>
            <filename>hello.java</filename>
        </node>
        <node>
            <filename>out.class</filename>
            <children>
                <node>
                    <filename>out.java</filename>
                </node>
            </children>
            <tasks>
                <task>
                    <command>Java</command>
                </task>
            </tasks>
        </node>
    </children>
    <tasks>
        <task>
            <command>Java</command>
        </task>
    </tasks>
</node>

</pymek>

Here we have the target of the project, in a <node>-element called Java_hello. It will have the filename hello.class when it has been built. This target also has some children, and a task, defined in the <tasks> part of the <node>.

Each childnode is a new target, defined in its own <node>. We see that the Java_hello target depends on two targets, hello.java and out.class. out.class is very similar to Java_hello, depending on out.java, and using the same task.

The task is defined as the name of a task, in this case the built-in task called Java. Both Java_hello and out.class can be used as targets when we invoke PyMek from the commandline, building the targets neccessary to create the given target. A test run of PyMek building the Java_hello target:

mortenlj@atlas pymek $ pymek.py Java_hello
INFO: Executing Java for target out.class...
INFO: Executing Java for target Java_hello...
INFO: Success!

..and we can run our program after a successful build:

mortenlj@atlas pymek $ java hello
Hello world!

A look at the PyMekfile

The previous example might look like it is overly verbose, and it is, for such a small project. The benefits of using XML does not shine through in such a small example, but once the project expands, there will be more advantages.

The complete XML Schema for the PyMekfile can be found in the appendices. We will however give an introduction to the important elements here.

The basic buildingblock of a PyMekfile is the <node>. A node describes a target for PyMek to work on. The minimal node has atleast one of either <name> or <filename>, like this:

<node>
    <filename>somefile.c</filename>
</node>

Before we explore the details of nodes, we should look at the only other way of refering to a target. The <noderef> element is also a way to refer to a target, but it has no contents. Instead, it simply points to a node that has been defined elsewhere in the PyMekfile.

In addition to a name and/or filename, a node can have a list of <children>, which lists the dependencies of this target. The children are other nodes or noderefs that describe targets. Each of the nodes in this list can have the same elements as any other node. This is how we define the dependencies throughout the PyMekfile.

Because of the way we can use noderefs, it might be more tempting to define each node at the toplevel and use noderefs to list dependencies. This has the sideeffect that whenever someone runs PyMek with this PyMekfile, all nodes are checked and updated, as PyMek will build all toplevel nodes if not given a specific target.

Obviously, a noderef that points to a node that directly or indirectly depend on the noderef is a cyclic dependency, and as previously discussed, for PyMek to be able to work, it needs a Directed Acyclic Graph, so doing that is not valid.

Most nodes will have a <tasks> element. This lists the tasks that are needed to update this target. Each task is executed once, in the order listed. We will return to how tasks are defined later.

The final element that may be present in a node is <MD5>. This is included by PyMek, and is a storage for the MD5 checksum used by PyMek to track changes. If PyMek does not find a MD5 element, it treats the target as changed.

By using all we have learned, we can now list most of a typical node:

<node>
    <name>Target</name>
    <filename>a.out</filename>
    <children>
        <noderef>somefile.c</noderef>
    </children>
    <tasks>
        <!-- some tasks here -->
    </tasks>
</node>

In order to create complete and useful PyMekfiles, we need to use tasks. A <task> element defines a task to be executed in order to update the current target. A task looks like this:

<task>
    <command>C_compile</command>
    <param>include_dirs</param>
</task>

Here, the contents of <command> is the name of the task to use, and the contents of <param> is a parameter to that task. A task has only one command, but can have as many params as it likes.

So what does it actually do?

When you run PyMek, it will read the PyMekfile, and create what we call a buildgraph, or buildtree. This is made up of the targets described in the PyMekfile, and each target is associated with tasks and names as you would expect.

If invoked without any targetname on the commandline, PyMek will try to build all toplevel targets in the buildgraph, which is different from the default behaviour of Make. If one or more targets are listed, PyMek will try to build those. We can call these targets the destinations.

Explicitly giving PyMek targets to build, or relying on the default behaviour, the process is the same. For each destination target, PyMek will first process any targets listed as a child of the current one, recursively repeating the process until it reaches a target that has no child.

At that point, PyMek will start working. For each target, it will check if any of the child-targets has changed, using an MD5 checksum. If a child has changed, PyMek will execute the tasks associated with this target, and proceed like this until reaching and eventually rebuilding the destination target.

PyMek is not stupid, so if it is started with multiple destinations, that somewhere down the line are dependent on the same targets, that target will only be processed once.