Tag Archives: Kernel Synchronization mechanisms

Synchronization without Locking

<< Previous Article

We have covered the various synchronization mechanisms in the previous articles. One of the thing common among them was that they put the process to sleep, if the lock is not available. Also, all those are prone to deadlock, if not implemented carefully. Sometimes however, we require to protect a simple variable like integer. It can be as simple as setting a flag. Using semaphore or spinlock to protect such a variable may be overhead. So, does kernel provide any synchronization mechanism without locking? Read on to explore more on this.

Atomic Operations

Atomic operations are indivisible and uninterruptible. Each of these compile into a single machine instruction as far as possible and are guaranteed to be atomic. Kernel provides a atomic integer type atomic_t for atomic operations. Below are the operations:

#include <asm/atomic.h>

void atomic_set(atomic_t *a, int i); // Set the atomic variable a to integer value i
int atomic_read(atomic *a); // Return the value of atomic variable a
void atomic_add(int i, atomic_t *a); // Add i to atomic variable a
void atomic_sub(i, atomic_t *a); // Subtract i from atomic variable a
void atomic_inc(atomic_t *a); // Increment operation
void atomic_dec(atomic_t *a); // Decrement operation

Atomic Bit Operations

Many a times, the requirement is to flag some condition. For this, a single bit may serve the purpose well. However, atomic_t type variable doesn’t work well for manipulating the bits. For this, the kernel provides a set of operations as listed below:

#include <asm/bitops.h>

void set_bit(int nr, void *a); // Set the bit number nr in value pointed by a
void clear_bit(int nr, void *a); // Clear the bit number nr in value pointed by a
void change_bit(int nr, void *a); // Toggle the bit at position nr

Conclusion

So, these are the simple, yet powerful mechanisms to provide the synchronization without locking. These can be quite useful while dealing with integer and bit operations, respectively, and involve far less overhead as compared to the usual synchronization mechanisms such as semaphore and mutex. However, these might not be useful in achieving the critical sections.

With this, we are now familiar with most of the synchronization mechanisms provided by the kernel. As you understand, synchronization mechanisms come with its own pros and cons and we need to be very careful while selecting the right one.

Next Article >>

   Send article as PDF   

Concurrency Management Part – 3

<< Previous Article

In the last two articles, we have discussed some of the commonly used synchronization mechanisms in kernel. It was observed that these synchronization mechanisms restrict the access to the resource, irrespective of the operation which thread/process wants to perform on the resource. This in turn, mean that even though one thread has acquired the resource for read access, another thread can’t access the  same resource for reading. In most of the cases, it is quite desirable to have two or more threads having the read access to the resource as far as they are not modifying the resource data structure.  This will result into the improved system performance. Read on to find out the mechanism provided by kernel to achieve this.

Reader / Writer Semaphore

This is the type of semaphore, which provides the access depending on operation which thread/process wants to perform on the data structure. With this, multiple readers can have the access to the resource at the same time, while only one writer gets access at a time. So, will reader be allowed if write operation is in progress? Definitely not. At a time, there can be either read or write operation in progress as usual, but there can be multiple read operations. So, let’s look at the data structures associated with the reader / writer semaphores:

#include <linux/rwsem.h>

// Data Structure
structure rw_semaphore rw_sem;

// Initialization
void init_rwsem(&rw_sem);

// Operations for reader
void down_read(&rw_sem);
void up_read(&rw_sem);

// Operations for writer
void down_write(&rw_sem);
void up_write(&rw_sem);

As seen above, initialization operation is similar to what we do with the regular semaphore, but key difference lies in the fact that we have separate operations for readers and writers.

Below is an example usage of reader / writer semaphore:

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/fs.h> 
#include <linux/cdev.h>
#include <linux/device.h>
#include <asm/uaccess.h>
#include <linux/semaphore.h>
#include <linux/sched.h>
#include <linux/delay.h>

#define FIRST_MINOR 0
#define MINOR_CNT 1

static dev_t dev;
static struct cdev c_dev;
static struct class *cl;
static struct task_struct *task;
static struct rw_semaphore rwsem;

int open(struct inode *inode, struct file *filp)
{
    printk(KERN_INFO "Inside open\n");
    task = current;
    return 0;
}

int release(struct inode *inode, struct file *filp)
{
    printk(KERN_INFO "Inside close\n");
    return 0;
}

ssize_t read(struct file *filp, char *buff, size_t count, loff_t *offp)
{
    printk("Inside read\n");
    down_read(&rwsem);
    printk(KERN_INFO "Got the Semaphore in Read\n");
    printk("Going to Sleep\n");
    ssleep(30);
    up_read(&rwsem);
    return 0;
}

ssize_t write(struct file *filp, const char *buff, size_t count, loff_t *offp)
{
    printk(KERN_INFO "Inside write. Waiting for Semaphore...\n");
    down_write(&rwsem);
    printk(KERN_INFO "Got the Semaphore in Write\n");
    up_write(&rwsem);
    return count;
}

struct file_operations fops =
{
    read:    read,
    write:   write,
    open:    open,
    release: release
};

int rw_sem_init(void)
{
    int ret;
    struct device *dev_ret;

    if ((ret = alloc_chrdev_region(&dev, FIRST_MINOR, MINOR_CNT, "rws")) < 0)
    {
        return ret;
    }
    printk("Major Nr: %d\n", MAJOR(dev));

    cdev_init(&c_dev, &fops);

    if ((ret = cdev_add(&c_dev, dev, MINOR_CNT)) < 0)
    {
        unregister_chrdev_region(dev, MINOR_CNT);
        return ret;
    }

    if (IS_ERR(cl = class_create(THIS_MODULE, "chardrv")))
    {
        cdev_del(&c_dev);
        unregister_chrdev_region(dev, MINOR_CNT);
        return PTR_ERR(cl);
    }
    if (IS_ERR(dev_ret = device_create(cl, NULL, dev, NULL, "mychar%d", 0)))
    {
        class_destroy(cl);
        cdev_del(&c_dev);
        unregister_chrdev_region(dev, MINOR_CNT);
        return PTR_ERR(dev_ret);
    }

    init_rwsem(&rwsem);

    return 0;
}

void rw_sem_cleanup(void)
{
    printk(KERN_INFO "Inside cleanup_module\n");
    device_destroy(cl, dev);
    class_destroy(cl);
    cdev_del(&c_dev);
    unregister_chrdev_region(dev, MINOR_CNT);
}

module_init(rw_sem_init);
module_exit(rw_sem_cleanup);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("SysPlay Workshops <workshop@sysplay.in>");
MODULE_DESCRIPTION("Reader Writer Semaphore Demo");

Below is the sample run:

cat /dev/mychar0
Inside Open
Inside Read
Got the Semaphore in Read
Going to sleep

cat /dev/mychar0 (In different shell)
Inside Open
Inside Read
Got the Semaphore in Read
Going to sleep

echo 1 > /dev/mychar0 (In different shell)
Inside Write. Waiting for semaphore...

As seen above, multiple reader processes are able to access the resource simultaneously. However, writer process gets blocked, while the readers are accessing the resource.

Conclusion

With this, we have covered most of the commonly used synchronization mechanisms in the kernel. Apart from these, kernel provides some atomic operations, which provides instructions that execute atomically without interruption. Atomic operators are indivisible instructions. These are useful when we need to do some operations on integers and bits.

Next Article >>

   Send article as PDF   

Concurrency Management Part – 2

<< Previous Article

In the previous article, we discussed about the basic synchronization mechanisms such as mutex and semaphores. As a part of that, there came up a couple of questions. If binary semaphore can achieve the synchronization as provided by mutex, then why do we need mutex at all? Another question was, can we use semaphore / mutex in interrupt handlers? To find the answer to these questions, read on.

Mutex and Binary Semaphore

Below is the simple example using the binary semaphore:

#include <linux/module.h>
#include <linux/fs.h>
#include <linux/cdev.h>
#include <linux/device.h>
#include <linux/errno.h>
#include <asm/uaccess.h>
#include <linux/semaphore.h>

#define FIRST_MINOR 0
#define MINOR_CNT 1

static dev_t dev;
static struct cdev c_dev;
static struct class *cl;

static int my_open(struct inode *i, struct file *f)
{
    return 0;
}
static int my_close(struct inode *i, struct file *f)
{
    return 0;
}

static char c = 'A';
static struct semaphore my_sem;

static ssize_t my_read(struct file *f, char __user *buf, size_t len, loff_t *off)
{
    // Acquire the Semaphore
    if (down_interruptible(&my_sem))
    {
        printk("Unable to acquire Semaphore\n");
        return -1;
    }
    return 0;
}
static ssize_t my_write(struct file *f, const char __user *buf, size_t len,
        loff_t *off)
{
    // Release the semaphore
    up(&my_sem);
    if (copy_from_user(&c, buf + len - 1, 1))
    {
        return -EFAULT;
    }
    return len;
}

static struct file_operations driver_fops =
{
 .owner = THIS_MODULE,
 .open = my_open,
 .release = my_close,
 .read = my_read,
 .write = my_write
};

static int __init sem_init(void)
{
    int ret;
    struct device *dev_ret;

    if ((ret = alloc_chrdev_region(&dev, FIRST_MINOR, MINOR_CNT, "my_sem")) < 0)
    {
        return ret;
    }

    cdev_init(&c_dev, &driver_fops);

    if ((ret = cdev_add(&c_dev, dev, MINOR_CNT)) < 0)
    {
        unregister_chrdev_region(dev, MINOR_CNT);
        return ret;
    }

    if (IS_ERR(cl = class_create(THIS_MODULE, "char")))
    {
        cdev_del(&c_dev);
        unregister_chrdev_region(dev, MINOR_CNT);
        return PTR_ERR(cl);
    }

    if (IS_ERR(dev_ret = device_create(cl, NULL, dev, NULL, "mysem%d", FIRST_MINOR)))
    {
        class_destroy(cl);
        cdev_del(&c_dev);
        unregister_chrdev_region(dev, MINOR_CNT);
        return PTR_ERR(dev_ret);
    }

    sema_init(&my_sem, 0);
    return 0;
}

static void __exit sem_exit(void)
{
    device_destroy(cl, dev);
    class_destroy(cl);
    cdev_del(&c_dev);
    unregister_chrdev_region(dev, MINOR_CNT);
}

module_init(sem_init);
module_exit(sem_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Pradeep");
MODULE_DESCRIPTION("Binary Semaphore Demonstration");

In the above example, we initialize the semaphore with the value of 0 with sem_init(). In my_read(), we decrement the semaphore and in my_write(), we increment the semaphore. Below is the sample run:

insmod sem.ko
cat /dev/mysem0 - This will block
echo 1 > /dev/mysem0 - Will unblock the cat process

Now, let’s try achieving the same with mutex. Below is the example for the same.

#include <linux/module.h>
#include <linux/fs.h>
#include <linux/cdev.h>
#include <linux/device.h>
#include <linux/errno.h>
#include <asm/uaccess.h>
#include <linux/mutex.h>

#define FIRST_MINOR 0
#define MINOR_CNT 1

DEFINE_MUTEX(my_mutex);

static dev_t dev;
static struct cdev c_dev;
static struct class *cl;

static int my_open(struct inode *i, struct file *f)
{
    return 0;
}
static int my_close(struct inode *i, struct file *f)
{
    return 0;
}

static char c = 'A';

static ssize_t my_read(struct file *f, char __user *buf, size_t len, loff_t *off)
{
    if (mutex_lock_interruptible(&my_mutex))
    {
        printk("Unable to acquire Semaphore\n");
        return -1;
    }
    return 0;
}
static ssize_t my_write(struct file *f, const char __user *buf, size_t len,
        loff_t *off)
{
    mutex_unlock(&my_mutex);
    if (copy_from_user(&c, buf + len - 1, 1))
    {
        return -EFAULT;
    }
    return len;
}

static struct file_operations driver_fops =
{
    .owner = THIS_MODULE,
    .open = my_open,
    .release = my_close,
    .read = my_read,
    .write = my_write
};

static int __init init_mutex(void)
{
    int ret;
    struct device *dev_ret;

    if ((ret = alloc_chrdev_region(&dev, FIRST_MINOR, MINOR_CNT, "my_mutex")) < 0)
    {
        return ret;
    }

    cdev_init(&c_dev, &driver_fops);

    if ((ret = cdev_add(&c_dev, dev, MINOR_CNT)) < 0)
    {
        unregister_chrdev_region(dev, MINOR_CNT);
        return ret;
    }

    if (IS_ERR(cl = class_create(THIS_MODULE, "char")))
    {
        cdev_del(&c_dev);
        unregister_chrdev_region(dev, MINOR_CNT);
        return PTR_ERR(cl);
    }

    if (IS_ERR(dev_ret = device_create(cl, NULL, dev, NULL, "mymutex%d",
        FIRST_MINOR)))
    {
        class_destroy(cl);
        cdev_del(&c_dev);
        unregister_chrdev_region(dev, MINOR_CNT);
        return PTR_ERR(dev_ret);
    }

    return 0;
}

static void __exit exit_mutex(void)
{
    device_destroy(cl, dev);
    class_destroy(cl);
    cdev_del(&c_dev);
    unregister_chrdev_region(dev, MINOR_CNT);
}

module_init(init_mutex);
module_exit(exit_mutex);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Pradeep");
MODULE_DESCRIPTION("Mutex Demonstration");

In the above example, I have replaced the semaphore with mutex. Below is the sample run:

cat /dev/mymutex0 - This will acquire the mutex
cat /dev/mymutex0 - This will block
echo 1 > /dev/mymutex0

So, what do you get after executing the echo command? I get the warning as below:

DEBUG_LOCKS_WARN_ON(lock->owner != current)

So, what does this warning mean? It warns that the process that is trying to unlock the mutex is not the owner of the same. But same thing worked without any warning with semaphore. What does this mean? This brings us to the important difference between the mutex and semaphore. Mutex have ownership associated with it. The process that acquires the lock is the one that should unlock the mutex. While such ownership didn’t exist with the semaphore. While using the semaphores for synchronization, its completely upto the user to ensure that the down & up are always called in pairs. But, mutex is designed in a way that lock and unlock must always be called in pairs.

Spinlock

Now, let’s come to the second question – can we use the semaphore / mutex in interrupt handlers. The answer is yes and no. I mean you can use the up and unlock, but can’t use down and lock, as these are blocking calls which put the process to sleep and we are not supposed to sleep in interrupt handlers. So, what if I want to achieve the synchronization in interrupt handlers? For this, there is a mechanism called spinlock. Spinlock is a lock which never yields. Similar to mutex, it has two operations – lock and unlock. If the lock is available, process will acquire it and will continue in the critical section and unlock it, once its done. This is pretty much similar to mutex. But, what if lock is not available? Here, comes the interesting difference. With mutex, the process will sleep, until the lock is available. But, in case of spinlock, it goes into the tight loop, where it continuously checks for a lock, until it becomes available. This is the spinning part of the spin lock. This was designed for multiprocessor systems. But, with the preemptible kernel, even a uniprocessor system behaves like an SMP. Below are the data structures associated with the spinlock:

#include <linux/spinlock.h>

// Data structure
struct spinlock_t my_slock

// Initialization
spinlock_init(&my_slock)

// Operations
spin_lock(&my_slock)
spin_unlock(&my_slock)

Now, let’s try to understand the complications associated with the spinlock. Let’s say, thread T1 acquires the spinlock and enters the critical section. Meanwhile, some high priority thread T2 becomes runnable and preempts the thread T1. Now, thread T2 also tries to acquire the spinlock and since the lock is not available, T2 will spin. Now, since T2 has a higher priority, T1 won’t run ever and this in turn will result in deadlock. So, how do we avoid such scenarios? Spinlock code is designed in such a way that any time kernel code holds a spinlock, the preemption is disabled on the local processor. Therefore, its very important to hold a spinlock for minimum possible time. What if the spinlock is shared between the thread T1 and interrupt handler? For this, there is a variant of spinlock, which disables the interrupts on local processor.

Conclusion

One common thing which we observed with mutex and semaphore is that they block the process, irrespective of the operation it wants to perform on the data structure. As you understand, there are two different operations a process can perform on the data structure – read and write. In most of the cases, it is innocuous to allow multiple readers at a time as far as they don’t modify the data structure. Such a parallelism would improve the performance. So, how do we achieve this? To find the answer to this, stay tuned to my next article. Till then, good bye!

Next Article >>

   Send article as PDF   

Concurrency Management in Linux Kernel

<< Previous Article

In the previous article, we discussed about the kernel threads, wherein we discussed various aspects of threads such as creation, stopping, signalling and so on. Threads provide one of the ways to achieve multitasking in the kernel. While multitasking brings the definite improvement in the system performance, but it comes with its own side effects. So, what are the side effects of multitasking? How can we overcome these? Read on to get the answer to all these questions.

Concurrency Management

In order to achieve the optimized system performance, kernel provides the multitasking, where multiple threads can execute in parallel and thereby utilizing the CPU in optimum way. Though useful, but multitasking, if not implemented cautiously can lead to concurrency issues, which can be very difficult to handle. So, let’s take an example to understand the concurrency issues. Let’s say there are two threads – T1 and T2. Among them is the shared resource called A. Both the threads execute the code as below:

int function()
{
    A++;
    printf("Value of i is %d\n", i);
}

Just imagine, when the thread T1 was in the middle of modifying the variable, it was pre-empted and thread T2 started to execute and it tries to modify the variable A. So, what will be result? Inconsistent value of variable A. These kind of scenarios where multiple threads are contending for the same resources is called race condition. These bugs are easy to create, but difficult to debug.

So, what’s the best way to avoid the concurrency issues? One thing is to avoid the global variables. But, its not always possible to do so. As you know, hardware resources are in nature globally shared. So, in order to deal with such scenarios, kernel provides us the various synchronization mechanisms such as mutex, semaphores and so on.

Mutex

Mutex stands for MUTual EXclusion. Its a compartment with a single key. Whoever, enters inside the compartment, locks it and takes the key with him. By the time, if someone else tries to acquire the compartment, he will have to wait. Its only when he comes outside, gives the key, would the other person be able to enter inside. Similar is the thing with the mutex. If one thread of execution acquires the mutex lock, other threads trying to acquire the same lock would be blocked. Its only when the first thread releases the mutex lock, would the other thread be able to acquire it. Below are the data structures for mutex:

#include <linux/mutex.h>

struct mutex /* Mutex data structure */

// Mutex Initialization
// Statically
DEFINE_MUTEX(my_mutex);
// Dynamically
struct mutex my_mutex;
mutex_init(&my_mutex);

// Operations
void mutex_lock(&my_mutex);
void mutex_unlock(&my_mutex);
int mutex_lock_interruptible(&my_mutex);
int mutex_trylock(&my_mutex);

Here, there are two versions for lock – interruptible and uninterruptible. mutex_lock_interruptible()  puts the current process in TASK_INTERRUPTIBLE state. So, the current process sleeps until the state is changed to TASK_RUNNING. For the process in TASK_INTERRUPTIBLE, there are two possible events which may change the process state to TASK_RUNNING. First event is obviously, when the mutex is available and another thing is, if any signal is delivered to process. But, if the process is put into the TASK_UNINTERRUPTIBLE state, which is the case when we invoke mutex_lock(), the only event which can wake up the process is the availability of resource. In almost all the scenarios we use mutex_lock_interruptible().

Below is the simple example to demonstrate the usage of mutex.

static int thread_fn(void *unused)
{
    while (!kthread_should_stop())
    {
        counter++;
        printk(KERN_INFO "Job %d started\n", counter);
        get_random_bytes(&i, sizeof(i));
        ssleep(i % 5);
        printk(KERN_INFO "Job %d finished\n", counter);
    }
    printk(KERN_INFO "Thread Stopping\n");
    do_exit(0);
}

Here, we have global variable counter, which is being shared between two threads. Each thread increments the counter, prints the value and then sleeps for random number of seconds. This is obvious entity for race condition and can result into the corruption of variable counter. So, in order to protect the variable, we use the mutex as below:

#include <linux/mutex.h>

DEFINE_MUTEX(my_mutex);
static int counter = 0;

static int thread_fn(void *unused)
{
    while (!kthread_should_stop())
    {
        mutex_lock(&my_mutex);
        counter++;
        printk(KERN_INFO "Job %d started\n", counter);
        get_random_bytes(&i, sizeof(i));
        ssleep(i % 5);
        printk(KERN_INFO "Job %d finished\n", counter);
        mutex_unlock(&my_mutex);
    }
    printk(KERN_INFO "Thread Stopping\n");
    do_exit(0);
}

As seen in the above code, we declare a variable my_mutex of type struct mutex which protects the global variable counter. Thread will be able to access the variable only if it is not in use by another thread. In this way, mutex synchronizes the access to the global variable counter.

Semaphore

Semaphore is a counter. It is mostly used, when we need to maintain the count of the resources. Let’s say we want to implement the memory manager, where we need to maintain the count of available memory pages. To start with, say we have 10 pages, so the initial value of the semaphore will be 10.  So, if the thread 1 comes and asks for the 5 pages, we will decrement the value of semaphore to 5 (10 – 5). Likewise, say thread 2 asks for 5 pages, this will further decrement the semaphore value to 0 (5 – 5). At this point, if there is an another thread say thread 3 and asks for 3 pages, it will have to wait, since we don’t any pages left. Meanwhile, if the thread 1 is done with the pages, it will release 5 pages, which in turn will increment the semaphore value to 5 (0 + 5). This, in turn will unblock the thread-3, which will decrement the semaphore value to 2 (5 – 3). So, as you understand, there are two possible operations with the semaphore – increment and decrement. Accordingly, we have two APIs – up and down. Below are the data structures and APIs for semaphore.

#include <linux/semaphore.h>

struct semaphore /* Semaphore data structure */

// Initialization
// Statically
DEFINE_SEMAPHORE(my_sem);
// Dynamically
struct semaphore my_sem;
sema_init(&my_sem, val);

// Operations
void down((&sem);
int down_interruptible(&sem);
int down_trylock(&sem);
void up(&sem);

As with the mutex, we have two versions of down – interruptible and uninterruptible. Initialization function sema_init() takes two arguments – pointer to the struct semaphore and initial value of semaphore. If semaphore value is greater than 1, we call it as counting semaphore and if the value of semaphore is restricted to 1, it operates in way similar to mutex. The semaphore for which the maximum count value is 1, is called as binary semaphore. Below is example, where the mutex is replaced with a semaphore:

#include <linux/semaphore.h>

static struct my_sem;
static int counter = 0;

static int thread_fn(void *unused)
{
    while (!kthread_should_stop())
    {
        if (down_interruptible(&my_sem))
            break;
        counter++;
        printk(KERN_INFO "Job %d started\n", counter);
        get_random_bytes(&i, sizeof(i));
        ssleep(i % 5);
        printk(KERN_INFO "Job %d finished\n", counter);
        up(&my_sem);
    }
    printk(KERN_INFO "Thread Stopping\n");
    do_exit(0);
}

static int init_module(void)
{
    // Binary semaphore
    sema_init(&my_sem, 1);
}

The code is same as with mutex. It provides the synchronized access to the global variable counter.

Conclusion

In this article, we discussed two most commonly used synchronization mechanisms. As seen from the above examples, the synchronization achieved  with mutex, can be achieved with the binary semaphore as well. Apart from this, semaphore can also operate in counting mode. So, why do we need mutex at all? Also, can we use the semaphore in interrupt handler? To find the answers to these questions, stay tuned to my next article on concurrency management. Till then, good bye.

Next Article >>

   Send article as PDF