Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Python Concurrency Decorators (github.com/alex-sherman)
201 points by orangepenguin on May 12, 2016 | hide | past | favorite | 46 comments


Another page you might be interested in bookmarking:

https://wiki.python.org/moin/PythonDecoratorLibrary

A similar decorator to thread function calls for concurrency:

https://wiki.python.org/moin/PythonDecoratorLibrary#Lazy_Thu...


Doesn't this second decorator only work with functions doing IO or releasing the GIL in any other way (like time.sleep() that they use in the example)? If all the function is doing is actual computation (i.e. using the CPU), no other code can run in parallel with it because of the GIL.


Yes, that's right, it is best used for I/O-bound applications. You could use it for:

* running multiple Database queries

* SSH-ing into multiple devices to run a command

* loading multiple web pages

* calling multiple APIs


A lot of these need to be updated. For one, they don't follow PEP8.

Secondly, unfortunately, the signal thing doesn't work on Windows and the threading thing is a bad example because it simulates multiplication, but if you actually had a CPU intensive task, there would be no performance benefits.



Never heard this name for this pattern (thunk). Isn't this the same as a Future/Promise object?


Thunk is from haskell,.


So, for me, replacing imap with pool.imap is the easy part. The hard part is dealing with things like handling exceptions, catching keyboard interrupts, and so on. Does this module do anything to address these issues?


I have a hack for dealing with KeyboardInterrupt's on a ProcessPoolExecutor: https://github.com/tgbugs/desc/blob/master/util/process_fixe.... I used this in concert with asyncio run_in_executor which helps with some of the exception handling.


That's a nice hack when you have only a few elements in the iterable, but Futures are pretty heavyweight, so this is unfortunately a lot slower than multiprocessing.Pool when you have many elements.

The other thing is that this won't work on Windows. Efficient multiprocessing always a pain in Python in my experience.


A couple of things offhand:

- It has Python 2 and 3 support

- It's a wrapper for the Python built-in "multiprocessing" library

- It spreads out work over all cores (so the abstraction hides the ability to control the pool)

Seems like a great way to get your feet wet with multiprocessing in Python, but it likely has limited use in production...although certain infrastructures like resource limited containers might be able to accommodate it.


It's not Py3 compatible yet, but from the commit log it seems like they're working on it (we could all help?).


I would love some help on this project! I have managed to get Python 3 support working in the 0.3 release for at least a few examples, and any further help with bug reporting would be very much appreciated.


I wonder what is keeping them from Python 3 support. I'll take a look, although I am currently stuck on Python 2.7 for my current project.



Does anyone know how this library addresses the Global Interpreter Lock (GIL) issue, as multiprocessing really isn't that great/is worse than running on a single thread in many versions of python.


> the Global Interpreter Lock (GIL) issue

This applies to programs that run Python bytecode in multiple threads in the same process. Multiprocessing forks multiple processes, so there is no GIL issue.


Is there no way to specify the number of cores to use?



Interesting, so when they claim it automatically scales out to all cores what they mean is it defaults to 3 unless overridden.


Should really be multiprocessing.cpu_count() / 2 or something like that.


I found out recently that some tasks do better if there are slightly more processes than cores.


If you check out the test file[1] in the repository for deco, it looks like you can specify the number of cores by setting the "processes" attribute of the concurrent-decorated function [object].

1. https://github.com/alex-sherman/deco/blob/master/conc_test.p...

https://github.com/alex-sherman/deco/blob/cee63391bf4c6d66ee...


You could certainly limit the process pool in the multiprocessing library itself: https://docs.python.org/2/library/multiprocessing.html ... but it doesn't seem like this decorator abstraction accounts for it.

> That's it, two lines of changes is all we need in order to parallelize this program. Now this program will make use of all the cores on the machine it's running on, allowing it to run significantly faster.

> As an overview, DECO is mainly just a smart wrapper for Python's multiprocessing.pool. When @concurrent is applied to a function it replaces it with calls to pool.apply_async. Additionally when arguments are passed to pool.apply_async, DECO replaces any index mutable objects with proxies, allowing it to detect and synchronize mutations of these objects. The results of these calls can then be obtained by calling wait() on the concurrent function, invoking a synchronization event.

I haven't really dug around the source code, but it sounds like not really.


There is in the latest version. Using like @concurrent(...) passes all arguments directly to Pool(...)



With Python 3.5 there is native support for concurrency by using the keywords await and async.

For simple usage if you are familiar with Go there is this library: https://github.com/pothos/awaitchannel


Asynch and concurrency are very different. Python 3.5 allows first class aysnchronous calls, but concurrency is still "hard".


Of note is that the version of deco on PyPI[1] (0.2) is incompatible with Python 3.

There have been few commits to fix compatibility, but it's not there yet.

1. https://pypi.python.org/pypi/deco


Perhaps a strange choice of the word "synchronized" when coming from Java, this typically implies a critical section. Here it seems to initialize a multiprocessing pool for use in the function labelled concurrent (perhaps)?


I think it's like sync in Cilk, meaning that all concurrent jobs must have finished before the part that is annotated synchronised is left. So it's not without precedence.


"We have proposed DECO, a simplification of concurrent programming techniques targeted at programmers with little understanding of concurrent programming." (from paper)

gave me the chills



this is such a good idea! the @synchronized decorator to collect the parallelized task at the end of a parent call is very very smart & simple.


It also has a potential for slowing down your code if used willy nilly, of course.

Multiprocessing gets pretty useless for anything outside of independent CPU bound tasks with little IPC and simple data types that can be stuck into shared memory.

If you're using multiprocessing pools so often that you think you need a decorator to clean up your code, then wow, I'd like to see what you're up to. ;)


> If you're using multiprocessing pools so often that you think you need a decorator to clean up your code, then wow, I'd like to see what you're up to. ;)

.. and I think once you do that in Python you should probably use NumPy.

I've been thinking that there should be a way to just program kernels in Modern Fortran because it's the easiest to interact with NumPy data structures (NumPy is nothing but glue code around a collection of very efficient Fortran numeric code). f2py [1] is doing that basically, but I've never had the chance to setup a project like this.

[1] http://docs.scipy.org/doc/numpy-1.10.1/user/c-info.python-as...


you guessed right that this isn't directly useful to my life. all my parallelism is already taken care of by a framework.

that said I really like the idea of using inner and outer function calls as a hook for spawning and collecting promises. it doesn't only have to be a join on a CPU-bound worker pool; this feels like a cleaner way to abstract IO waits than the yield statements I saw in an early prototype of tulip.

In general, this feels like a clean way to compose any library logic that involves an event loop or execution plan rather than just a function call.


source looks surprising, for instance the decorators parse the decorated function code and build an AST (still have to find out why)


My best guess is that this is to circumvent issues with naming encountered by using decorated functions in conjunction with multiprocessing pools. Typically you would run into serialization errors.


it turns out that the decorator "rewrite" the fn code. that's at least how the synchronized decorator does its own magic. Once I saw that in the source code (and other few hacks) I was sure I was never going to use this lib :)


This reminds me of AppEngine's ndb tasklets.


I've been looking for a way to replace functools.partial and pool.map with something that could cause me to make bad architectural decisions. This could be the ticket.


Haha, having mucked around with similar mechanisms for dealing with concurrency via decorators I have to agree with you that this is likely to cause hard to debug behavior (especially since direct modification of the AST is going on behind the scenes here). That being said, it's an interesting thought experiment and likely an excellent class project.


Let me sum up my feelings towards this: neat!


Then I can easily parallelize existing code without touching anything?


I love developing in Flask cause of the way decorators written.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: