Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
July 29, 2022 08:43 pm GMT

Multiprocessing in Python (Part 1)

A lot of you have probably been in a situation where you need to carry out multiple tasks, or a repetitive action on multiple items, like doing your homework, or even something as little as doing your laundry. Its so much easier when we have the ability to do multiple things at the same time. Like have multiple washing machines for our laundry, or 5 people do your homework.

The same principle also applies to computing. There are times when we have lots of data and we would like to perform the same action on all of our data. The problem now is, that its the same action and we have lots of data. This slows down our program.

import timedef our_function():    print("Processing stuff...")    time.sleep(5)    print("Done")def normal_linear_method():    our_function()    our_function()    our_function()normal_linear_method()# Time taken: about 15 seconds

Lets assume it takes exactly 5 seconds to complete the action or function on the data. If we have 100 units of data to process, it's going to take us 500 seconds, which is about 8 minutes of our time. What if I told you there was a way we could speed things up from 8 minutes back to our unit time of 5 seconds?

Multithreading in Python

The first technique we will use to solve our problem is something called multithreading. Multithreading works by constantly switching the context (basically the state of the task its working on at the moment) such that an illusion of parallel processing is achieved. This concept is also known as concurrency.

# Example of task speed up using multithreadingfrom threading import Threadimport timedef using_multithreading():    # Our threads    t1  = Thread(target=our_function)    t2 = Thread(target=our_function)    t3 = Thread(target=our_function)    # Starting our threads    t1.start()    t2.start()    t3.start()    # We join the threads/processes so our main thread/process    # can wait for it to be completed before terminating    t1.join()    t2.join()    t3.join()using_multithreading()# time taken: about 5 seconds

Multiprocessing in Python

The second technique we will use to solve our problem is multiprocessing. While multithreading in python makes use of context switching, multiprocessing in python runs each of the processes in parallel. Each process has its own copy of the entire program's memory and runs on its own core.

# Example of task speed up using multiprocessingimport timefrom multiprocessing import Processdef using_multiprocessing():    # Our processes    p1  = Process(target=our_function)    p2 = Process(target=our_function)    # Starting our processes    p1.start()    p2.start()    p1.join()    p2.join()if __name__ == '__main__':    start = time.perf_counter()    using_multiprocessing()    stop = time.perf_counter()    print("Time taken {}".format(stop-start))

Multiprocessing vs Multithreading: Parallelism vs Concurrency

Both multiprocessing and multithreading come in handy. The question is, when should we use what.

  • We use multithreading for IO-bound operations, like reading data from a file, or pooling data from a server.
  • We use multiprocessing for CPU-bound operations, like image processing, training a machine learning model, big data processing, etc.

Running multiple processes at once

There are times when we want to run a function on a sequence of data. Say we have a list of 100 units of data, and we would like to apply our function to all of them in parallel or concurrently. There are different approaches we can take:

Approach 1: iteratively create processes and start them

In this approach, weve used a loop to create a process for all our data and start them. The problem with this approach is that we cant really get the output of the processes easily.

import timefrom multiprocessing import Processdef multiple_processes():    # Spawn our processes iteratively    processes = [        Process(target=operation, args=(x,))         for x in data    ]    for process in processes:        # Iteratively start all processes        process.start()    for process in processes:        process.join()    return if __name__ == '__main__':    start = time.perf_counter()    multiple_processes()    stop = time.perf_counter()    print("Time taken {}".format(stop-start))    # time taken: about 8 seconds

Approach 2: The ProcessPoolExecutor

In this approach, weve used something called a pool, which is an easier and neater way to manage our computing resources. Although this is slower than spawning the processes iteratively, its way neater and allows us to use the output of those processes in our main process.

# Using multiprocessing with ProcessPoolExecutorimport timefrom concurrent.futures import \    ProcessPoolExecutor, as_completeddef multiple_processes_pooling():    with ProcessPoolExecutor() as executor:        process_futures = [            executor.submit(operation, x)             for x in data        ]        results = [            p.result()             for p in             as_completed(process_futures)        ]        print(results)if __name__ == '__main__':    start = time.perf_counter()    multiple_processes_pooling()    stop = time.perf_counter()    print("Time taken {}".format(stop-start))    # time taken: about 50 seconds

Approach 3: ProcessPoolExecutor().map

In this approach, instead of iteratively submitting a process to our pool executor, weve used the executor.map method to submit all of the data in the list at once. The output of this function is the result of all the completed processes.

import timefrom concurrent.futures import ProcessPoolExecutor# Using the executor.mapdef pooling_map():    with ProcessPoolExecutor() as executor:        results = executor.map(operation, data)        print([res for res in results])if __name__ == '__main__':    start = time.perf_counter()    pooling_map()    stop = time.perf_counter()    print("Time taken {}".format(stop-start))    # time taken: about 50 seconds

Very Important to remember

If you look at the time output, youd notice that the time taken isn't exactly the unit time, there are four main factors that affect this.

  • The computer in use can affect its time, as well as other programs running on your PC. The code was tested using an intel Core i5 7th generation computer.

  • It takes a few microseconds for our program to properly set up our processes and start it.

  • When there are more processes than we have CPU cores, our system automatically queues the pending processes and helps us manage them properly.

  • And finally, it takes a few microseconds for our program to properly close processes.

That being said, its important to note that we only use multiprocessing when theres a lot of data and the operation takes a lot of time to be completed.

Conclusion

  • Multiprocessing and Multithreading help us to speed up our programs.

  • Multiprocessing is most suited for CPU-bound operations, like machine learning and data processing.

  • Multithreading is most suited for IO-bound operations, like communicating with servers, or the file system.

  • Multiprocessing is not a magic wand; Don't use it unless you have to, or it could actually slow down your code.


Original Link: https://dev.to/tecnosam/multiprocessing-in-python-3g8p

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To