I’ve come across an awesome combination of tools for managing PHP Workers, and thought I’d share.
Why Workers?
Sometimes there are situations when you want to parallel process things. Other times you might have a list of tasks to accomplish, and you don’t want to make the user wait after pressing a button. This is where “Workers” can come in. They are independent scripts that run along side of your application, performing tasks, or “jobs.”
An example is with Dating DNA and our score system. We generate scores between users to show how compatible they are with each other. When a user signs up, or makes a significant change to their profile questionnaire, we need to run a job to query our database, build a list of potential users, and generate scores. This takes 10-20 seconds, and while it is pretty fast, we don’t want to make the user wait for that. So we queue up a job for the user, divide up the work among several workers, and process the work.
General Concept
For this post, we’ll use the example of generating reports. Lets say on your internal website there is a button that you can click and it will email the user a report, and the report takes 2-3 minutes to generate. When the button is clicked, your code will insert the job into the queue. Meanwhile, workers are monitoring the queue. A worker script will pull the job off the queue, process the report, and send the email when its done.
For the queue management, we’ll use Redis. To let PHP read and write data to Redis, we’ll use the PHP Library predis. In our examples we’ll use PHP 5.3, however predis has a PHP 5.2 backport if you are not running 5.3.
Adding Jobs
To add jobs, we’ll need to connect to our Redis server:
[php]
/*
* Connecting to Redis
*/
const REDIS_HOST = ‘127.0.0.1’;
const REDIS_PORT = 6379;
$predis = new PredisClient(array(
‘scheme’ => ‘tcp’,
‘host’ => REDIS_HOST,
‘port’ => REDIS_PORT,
));
[/php]
We’ll assume in all of our examples that we’ve done the following above & connected to Redis.
Now, to manage our queues we’ll use the Redis Datatype LIST. Whats awesome about lists is that regardless of size, adding or removing at the start or end of a list is extremely fast. So if your queue has 10 items, or 10,000,000 items, Redis wil be able to push and pop entries quickly.
We’ll have three queues, one for each priority: high, normal, and low. For the Redis key names, we’ll use queue.priority.high, queue.priority.normal, etc. When interacting with lists, you work with the ends, one called right, the other called left. So we’ll add items on the right with the RPUSH (Right Push) command, and we’ll pull items off the left with the BLPOP (Blocking Left Pop) command. We won’t worry about the pulling items just yet.
You store strings as the values for the list. My personal preference is to store JSON objects so you can easily pass variables needed to perform the job.
[php]
/*
* Adding items to the queue
*/
$job = new stdClass();
$job->id = 1;
$job->report = ‘general’;
$job->email = ‘test@example.com’;
// Add the job to the high priority queue
$predis->rpush(‘queue.priority.high’, json_encode($job));
// Or, you could add it to the normal or low priority queue.
$predis->rpush(‘queue.priority.normal’, json_encode($job));
$predis->rpush(‘queue.priority.low’, json_encode($job));
[/php]
Simple enough! Having different queue priorities is very beneficial in managing which jobs should get done first. For example, you might have an Executive’s request go into the high priority queue so they get the report quickly. You might also have a weekly cron that queues up reports to be sent automatically, so those can go in the low priority as to not disrupt people trying to get a manual report.
Now, on to the worker’s code.
Processing Jobs
For now, lets say we have a script running in the PHP CLI (Command Line Interface) that you started by running this command on the server:
php /path/to/worker.php
First thing is we want this worker to work continuously, so we can do a while loop:
[php]
/*
* Simple Continuous While Loop
*/
// Always True
while(1)
{
/* … perform tasks here … */
}
[/php]
We’ll worry about making them more intelligent later. Now, let’s have our worker check the queue. You can do so with the BLPOP command:
[php]
/*
* Checking the Queue
*/
$job = $predis->blpop(‘queue.priority.high’
, ‘queue.priority.normal’
, ‘queue.priority.low’
, 10);
[/php]
What we’re telling PHP to do is to check each queue in order of priority: high, normal, and then low. If it finds an item, it will immediately return an array with the name of the queue it came from, and the string of data that was pulled.
The B in BLPOP is “blocking.” What that means is that Redis will wait until either an item enters one of the queues, or the timeout is reached. In this case, the timeout is 10 seconds. So instead of polling (checking every few seconds in a loop), we check and wait, and after 10 seconds it will return null and we can check again.
What this gives us is near instantaneous queues. As soon as something is available, it is passed to the workers that are listening. You can also have multiple workers, and it will pass jobs to the first listening worker, and the next job to the next worker, so you don’t have to worry about multiple workers getting the same queued item.
After $predis->blpop() returns, if it has an array, it returned an item. If not, the timeout had been reached. We can check to see if a Job was returned, and if so to process the job:
[php]
/*
* Checking to see if a Job was returned
*/
if($job)
{
// Index 0 of the array holds which queue was returned
$queue_name = $job[0];
// Index 1 of the array holds the string value of the job.
// Since we are passing it JSON, we’ll decode it:
$details = json_decode($job[1]);
/* … do job work … */
}
[/php]
Now we can have multiple workers listening to the same queues and scale our workload. Redis is very fast & efficient, and you could have hundreds or even thousands of workers listening to a single redis server.
Continuously Running Workers
There are a lot of options when it comes to deploying these workers. You can use a framework like Gearman, but for simple things, I like very simple solutions. I came across a blog post by Joseph Scott about a little 10 line perl script called solo. What it does is it will run a command, and to ensure that no one else is running that same exact command, it will lock a configurable port. This is awesome because the you don’t have to work about lock files or filesystem tricks, the kernel handles it all.
So what you can do is create a cronjob using solo to execute your script. First copy solo somewhere, I put it in my /usr/local/bin on my linux server. Then add this to your cron job using the command “crontab -e -u (which user to use)”:
* * * * * /usr/local/bin/solo -port=5001 php /path/to/worker.php
What this will do is try to run this command every minute. Solo will check to see if the port is already in use, and if it is, it will exit. Otherwise, it will lock the port and then execute the command. The port will stay locked as long as the command is executing. Once the command terminates, the port will unlock.
Now, PHP is a great language, but it has been known to have some memory leaks while running a long time in a single instance. So we can have our scripts exit periodically to be restarted by our cron job. So lets make our “while(1)” statement a little smarter:
[php]
/*
* A Smarter While Statement
*/
// Set the time limit for php to 0 seconds
set_time_limit(0);
/*
* We’ll set our base time, which is one hour (in seconds).
* Once we have our base time, we’ll add anywhere between 0
* to 10 minutes randomly, so all workers won’t quick at the
* same time.
*/
$time_limit = 60 * 60 * 1; // Minimum of 1 hour
$time_limit += rand(0, 60 * 10); // Adding additional time
// Set the start time
$start_time = time();
// Continue looping as long as we don’t go past the time limit
while(time() < $start_time + $time_limit)
{
/* … perorm BLPOP command … */
/* … process jobs when received … */
}
/* … will quit once the time limit has been reached … */
[/php]
One key thing to note is randomly shifting the time limit for the script. I like to do this because you don’t want your workers all stopping and starting at the same time. So if I have 8 workers, one might, but the 7 will continue until the 8th starts back up again via the cron job.
Bells & Whistles
After using workers for awhile, here are a couple of ideas to enhance your workers & system managing them. First off, you can add some monitoring for your queues. Using Redis a HASH, you can use them to store the state of your workers.
[php]
/*
* Assigning Worker IDs & Monitoring
*
* Usage: php worker.php 1
*/
// Gets the worker ID from the command line argument
$worker_id = $argv[1];
// Setting the Worker’s Status
$predis->hset(‘worker.status’, $worker_id, ‘Started’);
// Set the last time this worker checked in, use this to
// help determine when scripts die
$predis->hset(‘worker.status.last_time’, $worker_id, time());
[/php]
Another problem with workers that run for a long time (several hours) is when you make a change to their code, they won’t reload that change until they exit. What I’ve found to successfully restart them is having a “version” number set in Redis that is checked at the end of every loop:
[php]
/*
* Using Versions to Check for Reloads
*/
$version = $predis->get(‘worker.version’); // i.e. number: 6
while(time() < $start_time + $time_limit)
{
/* … check for jobs and process them … */
/* … then, at the very end of the while … */
if($predis->get(‘worker.version’) != $version)
{
echo "New Version Detected… n";
echo "Reloading… n";
exit();
}
}
[/php]
You would simply INCR (increment) worker.version and after finishing their last job, the worker would exit, and solo would start it up again.
You can also kill specific threads by having them check for their value in a hash:
[php]
/*
* Using Kill Switches to Check for Reloads
*/
while(time() < $start_time + $time_limit)
{
/* … check for jobs and process them … */
/* … then, at the very end of the while … */
// Check to see if a kill has been set.
if($predis->hget(‘worker.kill’, $worker_id))
{
// Make sure to unset the kill request before exiting, or
// your worker will just keep restarting.
$predis->hdel(‘worker.kill’, $worker_id);
echo "Kill Request Detected… n";
echo "Reloading… n";
exit();
}
}
[/php]
Tweak to Solo & Logging
I made one small tweak in my version of solo, and that was to help it enable logging. Lets say I had three workers in my crontab:
# crontab for user to run workers * * * * * /usr/local/bin/solo -port=5001 php /path/to/worker.php 1 >> /tmp/worker.log.1 * * * * * /usr/local/bin/solo -port=5002 php /path/to/worker.php 2 >> /tmp/worker.log.2 * * * * * /usr/local/bin/solo -port=5003 php /path/to/worker.php 3 >> /tmp/worker.log.3
The “>> /tmp/worker.log.1” tells solo I want to log it’s output to a tmp file that I can tail and monitor their progress. This is great for debugging problems. However, when I did this, solo would write to the tmp file, and not the output from my script. To overcome this I changed the last line of solo:
[perl]
# old
exec @ARGV;
# new
exec "@ARGV";
[/perl]
This would ensure my script wrote out to the tmp file, and not just solo.
Examples
I’ve created an example on GitHub that you can clone on your own machine. All you will need is PHP 5.3 and Redis installed.
To install redis, simple run these commands on your unix based system:
wget http://redis.googlecode.com/files/redis-2.4.5.tar.gz tar -xzvf redis-2.4.5.tar.gz cd redis-2.4.5 make make install
It will copy the redis binaries to /usr/local/bin.
To get a copy of the code, you can download them here. HOWEVER, it doesn’t include predis! You’ll have to download and copy predis inside there via this link. It is much easier to clone it as so:
git clone git://github.com/JustinCarmony/PHP-Workers-with-Redis-Solo-Examples.git php_example/ cd php_example git submodule init git submodule update
Then, using different terminal windows (or using screen), you can run different worker.php instances, use creator.php to insert jobs, and monitor.php to watch the progress. This is all done from the command line.
If you’re using windows, I suggest installed a VM of Ubuntu and using that. If you really want to use Redis on windows, there are some Windows Binaries you can google and download. Good luck!
Here is a video where I demo the example:
(sorry for the poor mic quality)
Final Thoughts
I’ll post here shortly about how to run Redis in production with the init.d scripts and configuration files. One caveat to using solo is if your server has an application that randomly selects ports to use (i.e. VoIP, FTP), it might select one of your worker’s ports. But on a production server, you should have a good feel for which ports are available for locking.
If you want to learn more about Redis, check out their website.
Hopefully this will be helpful for anyone looking to use PHP Workers in an easy, simple way.
I am going to try this out on a new project. Thanks.
LikeLike
great aritcle,
But i am facing some problem setting up queue in real time
how my worker result available in creator, with multiple creator running
we can queue our worker result in another result and blpop it on from creator but with multiple worker its not always you get accurate result.
LikeLike
Justin,
This was unbelievably helpful. Thanks for posting!
LikeLike
This, Sir,
was very helpful,
Thank you
LikeLike
Thanks for this really nice article.
Just one question, what if the a worker crashes after he gets a job by BLPOP and before he finishes? Will this job disappear forever?
Do you have any idea how to ensure every job is delivered and done by workers? Thanks
(sorry if this post twice as my Internet seems having some problems)
LikeLike
@Vic: the cronjob will start every minute. So, when the worker fails and exit, it will run again within max one minute.
LikeLike
What is the point of the timeout on blpop? Isn’t the loop going to start this over anyway?
LikeLike