Sunday, April 29, 2012

The Cost of Mutexes By Darryl Gove

sometimes we wonder what will be the cost of a mutex lock and unlock while we are concerned about the system performance at micro-seconds level.

Darryl Gove has done some interesting experiments in sun Solaris systems. The same idea can be applied to other OS like Linux as well.

His blog is used to be http://blogs.sun.com/d/entry/the_cost_of_mutexes and now it is moved to https://blogs.oracle.com/d/entry/the_cost_of_mutexes

His conclusion is that Mutex is 3 times slower than atomic operations.

And there is another point to note that he tested the cost of a normal function call comparing to inline functions, which shows 10ns required for a normal function call. This cost must be less and less with new generations of CPU, RAM and compilers.

Here I just quote what is really interested to me, you can check his full post by above links.

Excl.     Incl.      Name  
User CPU  User CPU         
 sec.      sec.       
3.973     3.973      
1.341     3.973      count
1.331     1.331      mutex_unlock
0.781     0.781      mutex_lock_impl
0.490     0.490      atomic_add_32
0.030     0.030      mutex_lock

It shows clearly that using of mutex, time is spent on an atomic_add_32, mutex_lock_impl and mutex_unlock. As mutex_lock_impl and mutex_unlock assembly code is very similar to atomic_add_32, it is no surprise that mutex will take around 3x time than atomic_add.

Below are the assembly codes:

time spent for the atomic_add_32 call: 
   0.010     0.010              [?]    2ecb8:  ld          [%o0], %o2
   0.040     0.040              [?]    2ecbc:  add         %o2, %o1, %o3
   0.010     0.010              [?]    2ecc0:  cas         [%o0] , %o2, %o3
## 0.370     0.370              [?]    2ecc4:  cmp         %o2, %o3
   0.        0.                 [?]    2ecc8:  bne,a,pn    %icc,0x2ecbc
   0.        0.                 [?]    2eccc:  mov         %o3, %o2
   0.050     0.050              [?]    2ecd0:  retl        
   0.010     0.010              [?]    2ecd4:  add         %o2, %o1, %o0

time spent for mutex_unlock and mutext_lock_impl is similar:
   0.        0.                 [?]    beff8:  mov         %o1, %o3
   0.020     0.020              [?]    beffc:  cas         [%o0] , %o2, %o3
## 0.560     0.560              [?]    bf000:  cmp         %o2, %o3
   0.        0.                 [?]    bf004:  bne,a       0xbeff8
   0.        0.                 [?]    bf008:  mov         %o3, %o2