Python thread or multiprocessing

Demystifying Python Multiprocessing and Multithreading

Personally, I’ve tried to understand multiprocessing and multithreading multiple times over the years but have always failed to fully grasp these concepts. To do so, we need to understand several important terminologies and also something that is unique to the Python programming language — the Global Interpreter Lock (GIL).

If you already know these terminologies, feel free to skip to the next section.

Terminologies

Core: The CPU’s processor. This term refers to the hardware component of your CPU. A core can work on a single task; multi-core processors can perform multiple tasks at once.

Thread: Refers to the virtual component that manages the tasks. Each CPU core can have up to two threads if your CPU has multi/hyper-threading enabled. You can search for your own CPU processor to find out more. For Mac users, you can find out from About > System Report. This means that my 6-Core i7 processor has 6 cores and can have up to 12 threads.

We can use htop (see GIF below) to reconfirm the number of threads my machine has, numbered from 0 to 11.

Process: An instance of a computer program that is being executed by one or many threads. Depending on the operating system, a process may be made up of multiple threads of execution that execute instructions concurrently [1][2].

Multithreading: The ability of a central processing unit (CPU) (or a single core in a multi-core processor) to provide multiple threads of execution concurrently, supported by the operating system [3].

Multiprocessing: The use of two or more CPUs within a single computer system [4][5]. The term also refers to the ability of a system to support more…

Источник

Python Performance Showdown: Threading vs. Multiprocessing

Python Threading vs Multiprocessing

Python is a prevalent language for writing concurrent and parallel applications. In this article, we will look at the differences between Python threading vs. multiprocessing. We will focus on how both of these methods can be used to improve concurrency in your applications. We will also look at some of the differences between them and how they can be used together to create better solutions.

Читайте также:  Function compile in python

Python uses two different mechanisms for concurrency: threading and multiprocessing. These two methods are implemented as modules in the Python standard library, so they’re easy to install and use.

Python uses two different mechanisms for concurrency: threading and multiprocessing. These two methods are implemented as modules in the Python standard library, so they’re easy to install and use.

Threads, Threading, Multiprocessing

Threads

A thread is an execution context in which an application can run multiple tasks simultaneously. This can be useful when you have a long-running task that needs to be done asynchronously, such as reading a file or processing data from an API.

The most common way to create threads in Python is by using the Thread class, which has two methods: start() and join(). The start() method creates the new thread, while join() waits for the other threads to finish before returning.

More about Threads/ Threading

Threads are lightweight, fast-executing processes that can run on the same or different machines. They’re ideal for programs with short execution times and small numbers of inputs/outputs. Threads have a very low overhead compared to processes or mutexes (they don’t need anything but a memory).

However, they have several drawbacks: – Threads are difficult to debug because they don’t have access to any shared state between them; if one goes down, it’s hard to figure out why! – Each thread must be separated from other threads in memory space so that they don’t collide with each other (this is called “race conditions”).

Threading is a mechanism for creating independent threads within a process. Each thread can have its own memory space and access to disk storage. If a program needs to access shared resources like files or database connections, it must do so using locks that restrict access to those resources for all of the threads inside the program at once. This makes it difficult for a programmer to write programs that communicate with each other without locking around every single shared resource in their codebase.

thread = Thread(target=task) thread.start() #this will start a new thread #threading module has threading.Thread class

However, only one thread can be executed at a point. This is due to Global Interpretor Lock. ( This limitation may be surpassed in a few cases.) For multiple I/O bound tasks, threading still works. Threading works on parallelism in Python.

Читайте также:  How to use functions in python

Multiprocessing

A multiprocessing module allows you to run multiple processes on your computer at once, each with its own memory space and access to shared resources such as files or databases. It’s similar to having multiple tasks running at once, but instead of having them all run on the same CPU core—which would slow things down—, you can use multiple CPUs in parallel (one per process). This allows you to do more work in less time! Multiprocessing is a way for multiple instances of a program—each with its own memory space—to run.

It has the ability to use processes but not threads to carry out the functionalities of threading API.

In Python, a program means a process. A process has a thread that helps to execute the process.

The class used here is multiprocessing. Process. This example demonstrates how you can create a process in Python.

process = Process(target=task) process.start() #process.start() helps to run the target function here

Some other functionalities of multiprocessing are:

  • multiprocessing.Manager API
  • multiprocessing.Value
  • multiprocessing.Array
  • multiprocessing.Pipe
  • multiprocessing.connection.Connection

Similarities between Threading and Multiprocessing

  • They enable concurrency.
  • Their APIs (methodology) matches
  • The concurrency primitives are similar too.

The start() method is the same, which helps to commence a new thread or process. These two measures have taken inspiration from Java concurrency methods only.

Differences

Python Threading vs Multiprocessing

Multiprocessing is similar to threading but provides additional benefits over regular threading:

– It allows for communication between multiple processes

– It allows for sharing of data between multiple processes

They also share a couple of differences.

  • Threading works on threads, whereas multiprocessing is centered on processes’ functionality under the operating system. You should also be aware of the fact that a thread is a sub part of a process.
  • Threads have the ability to share. They follow this concept well. In the case of process, you can’t share everything; it follows some limitations and rules on sharing.
    • As is the case with multiprocessing, multiprocessing.Pipe or multiprocessing. Queue and other such commands will enable sharing.

    Let’s look at the summarized form:

    Multiprocessing Multithreading
    CPU bound tasks IO centered tasks
    It brings several processes into account. Threads are its foundation stone.
    Doesn’t Support Parallelism Supports Parallelism
    Processes are heavyweight Threads are lightweight
    They take a large span of time to work. Threads don’t take much time to process.
    Sharing is limited. Sharing in all formats is possible

    Performance comparison

    Threads can switch between tasks at a faster rate. Starting a thread is considered to be faster than starting a process.

    However, your choice should be highly dependent on the type of task you need to perform. As mentioned above, CPU-bound tasks are executed well with a multiprocessing modules, whereas multithreading works well for IO-bound tasks.

    CPU-bound tasks are normally heavy weight and will provide excellent results when they work with multiprocessing modules in Python.

    Multiprocessing vs. Multithreading vs. Async IO in python

    Python asyncio library

    Async IO promotes concurrent execution in Python. In other words, it aids in the asynchronous execution of processes. With the help of this library, one process doesn’t need to wait for the other one to stop in order to function.

    The Right Choice

    If the system is CPU Bound, opt for a Multi-Processing approach.

    In case it is I/O Bound and has a fast I/O but a limited number of Connections, Multi-Threading will be the best option.

    Lastly, if it is an I/O Bound system with a slow I/O and many connections are present, go for the Asyncio library.

    Multiprocessing vs. Multithreading vs. concurrent.futures in Python

    concurrent.futures API provides an easier methodology to implement threads and processes. It reduces the coding complexities.

    Threading Lock and Multiprocessing Lock

    These locks are brought into effect at times of critical section problems. When a thread has to stop during its waiting time (for a thread primitive), the thread lock is accessed.

    Now, when you need to deal with a process, there might be a case when you are stuck with a mutual exclusion lock . Before you access the critical section, the lock can be called. Once the process’s done, you will release the lock.

    Note: Make sure that you use thread synchronization methods for threads and follow the same for processes.

    FAQs on Python Threading vs Multiprocessing

    Multiprocessing is considered to be a better option here.

    GIL helps in the execution of only one thread at one point in time.

    Conclusion

    We moved on to explaining how Python threading works, followed by how multiprocessing works. Finally, we discussed in which all ways they are different, i.e., Python Threading vs. Multiprocessing.

    Источник

Оцените статью