Time being the limited resource that it is, it took a little while to wrap up, but BenchPress is now open source.
BenchPress is intended to be able to be able to represent many different types of payloads via simple JSON configuration, but the project is still new and it doesn’t (yet) have a lot of flexibility in terms of what users can do with the existing task definition language. Fortunately, it’s pretty straightforward to make your own custom task types, so in this post I’ll show how to make a “hello world” custom task type. You can also check out the sample code on GitHub.
BenchPress basics
The basic structure of the JSON you submit to the job controller is simple.
{ "task": { "type": "HELLO-WORLD", "config": { # whatever you want } } }
The config
can be any JSON you wamt for your task type. The type
is a semi-magical string that is used to identify a few classes that comprise a specific type of task; you’ll see how that string is used later.
TaskFactory and friends
I’ll go from the bottom up to explain the task execution structure. There are two types of nodes in BenchPress: worker
and controller
. Typically there is only one controller, but theoretically there could be many if you want. A job is submitted to the controller, which splits its sole task among the available workers. Each worker gets its own partition of the overall work.
Fundamentally, the work that a worker does is just a collection of Runnable
instances. The Runnables are made on each worker by a TaskFactory
instance. This is the relevant method of the TaskFactory interface:
Collection<Runnable> getRunnables(UUID jobId, int partitionId, UUID workerId, TaskProgressClient taskProgressClient, AtomicInteger reportSequenceCounter) throws IOException;
The method parameters represent the generic information available to every task — its parent job id, the id of the worker it’s running on, and some necessities for reporting progress back to the controller. In order to keep TaskFactory simple, the work of creating a TaskFactory has been pushed off to another interface, the TaskFactoryFactory
(which is sure to drive Joel Spolsky nuts). The TaskFactoryFactory’s job is to create a TaskFactory given the JSON config, so its sole method is simply this:
TaskFactory getTaskFactory(ObjectReader objectReader, JsonNode configNode) throws IOException;
It’s up to you to read whatever you want out of the JSON and construct your flavor of TaskFactory.
TaskPartitioner
The JSON here is the JSON pertaining to one individual worker’s partition of the overall work. Since the task JSON is specific to each task type, the code to split up the original task into the per-worker partitions must necessarily be provided by the task type as well. So, we have the TaskPartitioner
interface:
List<Partition> partition(UUID jobId, int workers, String progressUrl, String finishedUrl, ObjectReader objectReader, JsonNode configNode, ObjectWriter objectWriter) throws IOException;
The workers
param is how many workers the task should be split for. The two URLs are needed to create a Partition, and the ObjectReader, JsonNode and ObjectWriter params let the implementation deserialize its configuration info, split as desired, and re-serialize.
Hooking up a custom task type
BenchPress needs to know which TaskFactoryFactory and TaskPartitioner to hand the config JSON to based on the contents of the type
JSON field. The way this is done is with the com.palominolabs.benchpress.job.id.Id
annotation and Guice multibindings. Annotate your TaskFactoryFactory and TaskPartitioner implementations (which might be just one class):
@Id("HELLO-WORLD") final class HelloWorldTaskFactoryFactory implements TaskFactoryFactory { ... @Id("HELLO-WORLD") final class HelloWorldTaskPartitioner implements TaskPartitioner { ...
and add the Guice bindings:
public final class HelloWorldModule extends AbstractModule { @Override protected void configure() { Multibinder.newSetBinder(binder(), TaskFactoryFactory.class) .addBinding().to(HelloWorldTaskFactoryFactory.class); Multibinder.newSetBinder(binder(), TaskPartitioner.class) .addBinding().to(HelloWorldTaskPartitioner.class); } }
Note that this means that since your classes are instantiated by Guice, you are free to use @Inject
on your TaskFactoryFactory and TaskPartitioner constructors if you need anything beyond the provided ObjectReader, etc.
Finally, you’ll need to tell BenchPress to use your custom module. You can do so by adding the jar for your custom code to the lib directories in the worker and controller tarballs and starting with an extra system property that is set to a comma-separated list of extra module names:
-Dbenchpress.plugin.module-names=com.foo.benchpress.helloworld.HelloWorldModule
Since both the controller and worker need the custom code (for the TaskPartitioner and TaskFactoryFactory, respectively), you’ll need to do this for both services.
Once that’s all done, you should be able to submit your job JSON to the controller and have it work. In the case of the sample “HELLO-WORLD” task type, you should see a logging message like this:
2012-08-17 14:32:43,111 [pool-5-thread-2] INFO MDC[] c.p.b.e.h.HelloWorldTaskFactory - Greeting: Hello, world!