Python process pool map

Содержание

Multiprocessing Pool.map() in Python
Need a Parallel Version of map()
How to Use Pool.map()

Multiprocessing Pool.map() in Python

You can apply a function to each item in an iterable in parallel using the Pool map() method.

In this tutorial you will discover how to use a parallel version of map() with the process pool in Python.

Need a Parallel Version of map()

The multiprocessing.pool.Pool in Python provides a pool of reusable processes for executing ad hoc tasks.

A process pool can be configured when it is created, which will prepare the child workers.

A process pool object which controls a pool of worker processes to which jobs can be submitted. It supports asynchronous results with timeouts and callbacks and has a parallel map implementation.

— multiprocessing — Process-based parallelism

The built-in map() function allows you to apply a function to each item in an iterable.

The Python process pool provides a parallel version of the map() function.

How can we use the parallel version of map() with the process pool?

Run your loops using all CPUs, download my FREE book to learn how.

How to Use Pool.map()

The process pool provides a parallel map function via Pool.map().

Recall that the built-in map() function will apply a given function to each item in a given iterable.

Return an iterator that applies function to every item of iterable, yielding the results.

— Built-in Functions

It yields one result returned from the given target function called with one item from a given iterable. It is common to call map and iterate the results in a for-loop.

The multiprocessing.pool.Pool process pool provides a version of the map() function where the target function is called for each item in the provided iterable in parallel.

A parallel equivalent of the map() built-in function […]. It blocks until the result is ready.

— multiprocessing — Process-based parallelism

Each item in the iterable is taken as a separate task in the process pool.

Like the built-in map() function, the returned iterator of results will be in the order of the provided iterable. This means that tasks are issued (and perhaps executed) in the same order as the results are returned.

Unlike the built-in map() function, the Pool.map() function only takes one iterable as an argument. This means that the target function executed in the process can only take a single argument.

The iterable of items that is passed is iterated in order to issue all tasks to the process pool. Therefore, if the iterable is very long, it may result in many tasks waiting in memory to execute, e.g. one per worker process.

It is possible to split up the items in the iterable evenly to worker processes.

For example, if we had a process pool with 4 child worker processes and an iterable with 40 items, we can split up the items into 4 chunks of 10 items, with one chunk allocated to each worker process.

The effect is less overhead in transmitting tasks to worker processes and collecting results.

This can be achieved via the “chunksize” argument to map().

This method chops the iterable into a number of chunks which it submits to the process pool as separate tasks. The (approximate) size of these chunks can be specified by setting chunksize to a positive integer.

— multiprocessing — Process-based parallelism

Источник