Python GIL在Linux下的实现

《Python源码剖析:深度探索动态语言核心技术》真是好书,讲的深入浅出,语言又风趣。让我有点怨念的就是研究平台是Windows,这一般都没什么问题,不过碰到一些平台相关特性,比如thread时就对不上了。今天看到第15章Python多线程机制,里面对GIL的讲解就全是基于Windows的,自己研究了一下Linux下的源码,做点笔记。

在Python初始化多线程环境这部分Linux和Windows一样,就是把initialized变量设为1,并没有做其它的事情。Linux下的GIL实现有两套,由USE_SEMAPHORES宏决定。由名字就可以看出一种实现是使用信号量,一种是不使用信号量的。主要代码都在Python/thread_pthread.h中。

在Python中,GIL的数据结构为PyThread_type_lock,它是一个指向void的指针,这也是因为不同平台实现不同,无法确定一个具体的类型。对GIL的操作共有4种,分别是PyThread_allocate_lock,PyThread_free_lock,PyThread_acquire_lock,PyThread_release_lock。分别执行创建、销毁、锁、解锁GIL的动作。在Python内部还有一个变量会影响到线程請求GIL时的行为,这就是waitflag,如果waitflag为True,那么线程将被阻塞在請求GIL的操作上,反之非阻塞。下面分别分析两种不同的实现。

一、使用信号量。
使用信号量时的实现相当地简单明了。GIL就是一个信号量(sem_t)。直接上代码(有精简)。

PyThread_type_lock
PyThread_allocate_lock(void)
{
	sem_t *lock;
	int status, error = 0;

	lock = (sem_t *)malloc(sizeof(sem_t));

	if (lock) {
		status = sem_init(lock,0,1);
	}

	return (PyThread_type_lock)lock;
}

void
PyThread_free_lock(PyThread_type_lock lock)
{
	sem_t *thelock = (sem_t *)lock;
	int status, error = 0;

	status = sem_destroy(thelock);

	free((void *)thelock);
}

/*
 * As of February 2002, Cygwin thread implementations mistakenly report error
 * codes in the return value of the sem_ calls (like the pthread_ functions).
 * Correct implementations return -1 and put the code in errno. This supports
 * either.
 */
static int
fix_status(int status)
{
	return (status == -1) ? errno : status;
}

int
PyThread_acquire_lock(PyThread_type_lock lock, int waitflag)
{
	int success;
	sem_t *thelock = (sem_t *)lock;
	int status, error = 0;

        // [1]
	do {
		if (waitflag)
			status = fix_status(sem_wait(thelock));
		else
			status = fix_status(sem_trywait(thelock));
	} while (status == EINTR); /* Retry if interrupted by a signal */

	if (waitflag) {
		CHECK_STATUS("sem_wait");
	} else if (status != EAGAIN) {
		CHECK_STATUS("sem_trywait");
	}

	success = (status == 0) ? 1 : 0;

	return success;
}

void
PyThread_release_lock(PyThread_type_lock lock)
{
	sem_t *thelock = (sem_t *)lock;
	int status, error = 0;

	status = sem_post(thelock);
}

在代码中看到的sem_*开头的函数是符合POSIX的信号量操作函数,可以看到Python其实就是对信号量的操作进行了一下包装,而在[1]处的代码则保证請求锁这个操作不会被信号所中断,看过Unix Networking Programming的应该会对这样的語句很熟悉吧。

二、使用一个自定义结构。这种情况稍微复杂一点点。先来看看在这种情况下GIL是个什么东西。

/* A pthread mutex isn't sufficient to model the Python lock type
 * because, according to Draft 5 of the docs (P1003.4a/D5), both of the
 * following are undefined:
 *  -> a thread tries to lock a mutex it already has locked
 *  -> a thread tries to unlock a mutex locked by a different thread
 * pthread mutexes are designed for serializing threads over short pieces
 * of code anyway, so wouldn't be an appropriate implementation of
 * Python's locks regardless.
 *
 * The pthread_lock struct implements a Python lock as a "locked?" bit
 * and a <condition, mutex> pair.  In general, if the bit can be acquired
 * instantly, it is, else the pair is used to block the thread until the
 * bit is cleared.     9 May 1994 tim@ksr.com
 */

typedef struct {
	char             locked; /* 0=unlocked, 1=locked */
	/* a <cond, mutex> pair to handle an acquire of a locked lock */
	pthread_cond_t   lock_released;
	pthread_mutex_t  mut;
} pthread_lock;

可以看到Python自定义了一个struct,其中的locked属性用来表示GIL是否被锁,lock_released和mut共同来实现解锁和锁操作。为什么要这样做呢?注意代码中的注释,可以看到pthread_mutex在两种情况下的行为是不可以预测的:一个线程再次請求被它占有的锁和請求解锁属于另一个线程的锁。我又找了找这方面的资料(链接),发现pthread_mutex有一个kind属性,它可能是fast,recursive或error checking。不同kind属性的pthread_mutex在上述两个操作下的行为是不相同的,而这几个属性都是NP,也就是Non-Portable的,并不通用。在注释中还提到了设计理念的原因,不过这个我就没有太多研究了。接下来看看具体的对GIL的操作。

PyThread_type_lock
PyThread_allocate_lock(void)
{
	pthread_lock *lock;
	int status, error = 0;

	lock = (pthread_lock *) malloc(sizeof(pthread_lock));
	if (lock) {
		memset((void *)lock, '\0', sizeof(pthread_lock));
		lock->locked = 0;

		status = pthread_mutex_init(&lock->mut,
					    pthread_mutexattr_default);

		status = pthread_cond_init(&lock->lock_released,
					   pthread_condattr_default);
	}

	return (PyThread_type_lock) lock;
}

void
PyThread_free_lock(PyThread_type_lock lock)
{
	pthread_lock *thelock = (pthread_lock *)lock;
	int status, error = 0;

	status = pthread_mutex_destroy( &thelock->mut );

	status = pthread_cond_destroy( &thelock->lock_released );

	free((void *)thelock);
}

可以看到创建和销毁GIL的代码还是非常地直观地,就是分别地创建和销毁其中的lock_released和mut。請求锁和解锁的操作稍微绕了一点点弯,如下。


int
PyThread_acquire_lock(PyThread_type_lock lock, int waitflag)
{
	int success;
	pthread_lock *thelock = (pthread_lock *)lock;
	int status, error = 0;

	status = pthread_mutex_lock( &thelock->mut );
	success = thelock->locked == 0;

	if ( !success && waitflag ) {
		/* continue trying until we get the lock */

		/* mut must be locked by me -- part of the condition
		 * protocol */
		while ( thelock->locked ) {
			status = pthread_cond_wait(&thelock->lock_released,
						   &thelock->mut);
		}
		success = 1;
	}
	if (success) thelock->locked = 1;
	status = pthread_mutex_unlock( &thelock->mut );

	if (error) success = 0;
	return success;
}

void
PyThread_release_lock(PyThread_type_lock lock)
{
	pthread_lock *thelock = (pthread_lock *)lock;
	int status, error = 0;

	status = pthread_mutex_lock( &thelock->mut );

	thelock->locked = 0;

	status = pthread_mutex_unlock( &thelock->mut );

	/* wake up someone (anyone, if any) waiting on the lock */
	status = pthread_cond_signal( &thelock->lock_released );
}

具体每一步各个参数返回什么值可以自己手动拿张纸画画,还是很好推的。直接给出结论:GIL在上述两种情况下的动作分别是:当一个线程试图去锁定已经被它锁定过的GIL时,阻塞(waitflag == 1)或返回失败(waitflag == 0),分别类似于kind属性为fast 和 error checking的pthread_mutex;当一个线程试图去解锁并不是由它锁定的GIL时,马上返回,并成功解锁,类似于kink为fast的pthread_mutex。至此解析完成。

发表评论?

1 条评论。

  1. 已推荐到:http://simple-is-better.com/news/463

发表评论


注意 - 你可以用以下 HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>