Monday, May 31, 2010

Mac OS X: No timed semaphore waits between processes

There are a few very clunky things that the average developer might run into when trying to use IPC primitives on OS X.

For one, There are gaping holes in the documentation - like some of the functions don't even exist. Even a google search won't turn anything up.

Second, it's extremely hard to figure out how to do a timed wait on a semaphore shared between processes. There is no timed wait implementation for named semaphores created using semget, and while the native mach semaphores do include a timed wait implementation, it's too hard to figure out how to share one between processes.

What's the deal Apple? Why am I forced to read off-topic documentation in detail just to get a timed wait between processes? When I realized I was reading and re-reading about bootstrap contexts and ports in the Kernel Programming Guide, I knew I'd gone too far.

Backing up, all I'm trying to do is signal my daemon when a message is ready, and have the daemon signal my parent process when the request is complete. Considering the response time will always be very small, I'd like to have a timeout on both sides to detect when either process has crashed.

I've tried installing a SIGALRM handler which works, but that's process-wide and extremely clunky when all I want is a timed wait.

Simple enough? Apparently not...

What's the deal Apple?

=================

30 minutes pass...

=====================

Sometimes all it takes is writing about a problem to help you solve it. Here's what I found, after reading all the mach documentation and the Jack source code (Thank you, once again, Paul):

It is possible to register a native unnamed mach semaphore (created with semaphore_create()) with a name that another process can use to attach to the same semaphore and do a timed wait (using sempahore_timedwait()). What you have to do is acquire the bootstrap context of the current process and register the semaphore with a name there so that another process that you start can see it. A bootstrap context is like a scope or namespace, and the context in question is the login context, which means that all processes that your user starts uses that namespace.

I created some example code that shows how to create a semaphore and do a timed wait.

/** parent.cpp: Create and register a named semaphore, and wait for 
    child.cpp to attach to and signal it, allowing this process to terminate.
*/
#include 
#include 
#include 
#include 
#include 
#include 

void sig(int){}

int main()
{
  semaphore_t sem;
  mach_port_t task = mach_task_self();
  mach_port_t boot_port;
  kern_return_t err;

  err = task_get_bootstrap_port(task, &boot_port);
  if(err != KERN_SUCCESS)
    {
      printf("BOOTSTRAP: %s\n", mach_error_string(err));
      exit(1);
    }

  err = semaphore_create(task, &sem, SYNC_POLICY_FIFO, 0);
  if(err != KERN_SUCCESS)
    {
      printf("semaphore_create: %s\n", mach_error_string(err));
      exit(1);
    }
  printf("Created semaphore\n");
  
  err = bootstrap_register(boot_port, "pksem", sem);
  if(err != KERN_SUCCESS)
    {
      //      printf("bootstrap_register: %s\n", mach_error_string(err));
      switch(err)
 {
 case BOOTSTRAP_SUCCESS :
   /* service not currently registered, "a good thing" (tm) */
   break;
 case BOOTSTRAP_NOT_PRIVILEGED :
   /* already exists */
   printf("bootstrap_register(): bootstrap not privileged\n");
   break;
 case BOOTSTRAP_SERVICE_ACTIVE :
   printf("bootstrap_register(): bootstrap service active\n");
   break;
 default :
   printf("bootstrap_register() err = %s\n", mach_error_string(err));
   break;
 }
    }


  printf("semaphore_wait()\n");
  //  semaphore_wait(sem);

  printf("semaphore_timedwait()\n");
  const int ms = 1750000;
  mach_timespec_t ts;
  ts.tv_sec = ms / 1000;
  ts.tv_nsec = (ms % 1000) * 1000000;
  bool wait = true;
  while(wait)
    {
      err = semaphore_timedwait(sem, ts);
      switch(err)
 {
 case KERN_SUCCESS:
   printf("signaled\n");
   wait = false;
   break;
 case KERN_OPERATION_TIMED_OUT:
   printf("timed out\n");
   wait = false;
   break;
 case KERN_ABORTED:
   printf("caught signal, trying again\n");
   break;
 default:
   printf("default: %s\n", mach_error_string(err));
   break;
 };
    }
}


/** child.cpp: Attach to the semaphore by name and release it.
 */

#include 
#include 
#include 
#include 
#include 


int main()
{
  semaphore_t sem;
  kern_return_t err;
  mach_port_t boot_port;

  err = task_get_bootstrap_port(mach_task_self(), &boot_port);
  if(err != KERN_SUCCESS)
    {
      printf("task_get_bootstrap_port(): %s\n", mach_error_string(err));
      exit(1);
    }

  err = bootstrap_look_up(boot_port, "pksem", &sem);
  if(err != KERN_SUCCESS)
    {
      printf("bootstrap_look_up(): %s\n", mach_error_string(err));
      exit(1);
    }

  semaphore_signal(sem);
  printf("success\n");
}

Unfortunately I can't find any documentation for the semaphore functions along with mach_task_self(), task_get_bootstrap_port(), bootstrap_register (), bootstrap_look_up(). In fact, boostrap_register() is deprecated! Unbelievable.

But, as far as I know, using these native unnamed mach semaphores is faster than the POSIX named semaphores created with semget() and managed via semctl(). The native mach semaphores also go away when you kill the process that created them. That means I can get rid of all of my code to manage and cleanup orphaned semaphores based on key files on the disk. What a waste of time that was...

8 comments:

Anonymous said...

Cool. But sitting on 10.6 I cannot find any header containing bootstrap_register/bootstrap_look_up. Your posting lacks the includes due to formatting I guess.

London Hotels said...

You will have to consider aspect inside an important challenge for one from all of the most beneficial blog sites for all of the net. I actually can suggest that web site!
London Hotels

chinkykapoor002 said...

I am a great fan of people... who say thing simply but clearly...one doesn't need to be heavy with words... thanks for taking out ur time... unitech crestview gives a warm regard to you

xlpharmacy coupons said...

IT has been for me that most clear information I've found in a blog, I was looking for something really good like this and I failed, I have to congratulate because of the work you're doing here.

Justin said...

I am trying to do the same thing, however it appears you are using deprecated functionality (which may explain poor documentation).

I am wondering if pthread_cond_timedwait with a mutex is also a viable solution. This Stack Overflow answer suggests it.

Patrick Stinson said...

Yep, that looks like it would do it. I wish I had that two years ago.

Anonymous said...

Grame (www.grame.fr) wrote this OSX JACK code, not Paul...

Dan said...

In Xcode 5, the boostrap.h header is in .