development

파이썬에서 병렬 프로그래밍을 수행하는 방법

big-blog 2020. 7. 8. 07:22
반응형

파이썬에서 병렬 프로그래밍을 수행하는 방법


C ++의 경우 OpenMP를 사용하여 병렬 프로그래밍을 수행 할 수 있습니다. 그러나 OpenMP는 Python에서 작동하지 않습니다. 파이썬 프로그램의 일부를 병렬화하려면 어떻게해야합니까?

코드의 구조는 다음과 같이 간주 될 수 있습니다.

 solve1(A)
 solve2(B)

어디에 solve1그리고 solve2두 개의 독립적 인 기능입니다. 실행 시간을 줄이기 위해 이러한 종류의 코드를 순차적이 아닌 병렬로 실행하는 방법은 무엇입니까? 누군가 나를 도울 수 있기를 바랍니다. 미리 감사드립니다. 코드는 다음과 같습니다

def solve(Q, G, n):
    i = 0
    tol = 10 ** -4

    while i < 1000:
        inneropt, partition, x = setinner(Q, G, n)
        outeropt = setouter(Q, G, n)

        if (outeropt - inneropt) / (1 + abs(outeropt) + abs(inneropt)) < tol:
            break

        node1 = partition[0]
        node2 = partition[1]

        G = updateGraph(G, node1, node2)

        if i == 999:
            print "Maximum iteration reaches"
    print inneropt

setinner와 setouter는 두 개의 독립적 인 기능입니다. 그것이 내가 평행하게하고 싶은 곳입니다 ...


다중 처리 모듈을 사용할 수 있습니다 . 이 경우 처리 풀을 사용할 수 있습니다.

from multiprocessing import Pool
pool = Pool()
result1 = pool.apply_async(solve1, [A])    # evaluate "solve1(A)" asynchronously
result2 = pool.apply_async(solve2, [B])    # evaluate "solve2(B)" asynchronously
answer1 = result1.get(timeout=10)
answer2 = result2.get(timeout=10)

이렇게하면 일반적인 작업을 수행 할 수있는 프로세스가 생성됩니다. 통과하지 않았으므로 processes머신의 각 CPU 코어마다 하나의 프로세스가 생성됩니다. 각 CPU 코어는 하나의 프로세스를 동시에 실행할 수 있습니다.

목록을 단일 함수에 매핑하려면 다음을 수행하십시오.

args = [A, B]
results = pool.map(solve1, args)

GIL 은 파이썬 객체에 대한 작업을 잠그기 때문에 스레드를 사용하지 마십시오 .


이것은 Ray를 사용 하여 매우 우아하게 수행 할 수 있습니다 .

예제를 병렬화하려면 @ray.remote데코레이터로 함수를 정의한 다음로 호출해야합니다 .remote.

import ray

ray.init()

# Define the functions.

@ray.remote
def solve1(a):
    return 1

@ray.remote
def solve2(b):
    return 2

# Start two tasks in the background.
x_id = solve1.remote(0)
y_id = solve2.remote(1)

# Block until the tasks are done and get the results.
x, y = ray.get([x_id, y_id])

멀티 프로세싱 모듈 에 비해 여러 가지 장점이 있습니다.

  1. 멀티 코어 머신과 머신 클러스터에서 동일한 코드가 실행됩니다.
  2. Processes share data efficiently through shared memory and zero-copy serialization.
  3. Error messages are propagated nicely.
  4. These function calls can be composed together, e.g.,

    @ray.remote
    def f(x):
        return x + 1
    
    x_id = f.remote(1)
    y_id = f.remote(x_id)
    z_id = f.remote(y_id)
    ray.get(z_id)  # returns 4
    
  5. In addition to invoking functions remotely, classes can be instantiated remotely as actors.

Note that Ray is a framework I've been helping develop.


CPython uses the Global Interpreter Lock which makes parallel programing a bit more interesting than C++

This topic has several useful examples and descriptions of the challenge:

Python Global Interpreter Lock (GIL) workaround on multi-core systems using taskset on Linux?


The solution, as others have said, is to use multiple processes. Which framework is more appropriate, however, depends on many factors. In addition to the ones already mentioned, there is also charm4py and mpi4py (I am the developer of charm4py).

There is a more efficient way to implement the above example than using the worker pool abstraction. The main loop sends the same parameters (including the complete graph G) over and over to workers in each of the 1000 iterations. Since at least one worker will reside on a different process, this involves copying and sending the arguments to the other process(es). This could be very costly depending on the size of the objects. Instead, it makes sense to have workers store state and simply send the updated information.

For example, in charm4py this can be done like this:

class Worker(Chare):

    def __init__(self, Q, G, n):
        self.G = G
        ...

    def setinner(self, node1, node2):
        self.updateGraph(node1, node2)
        ...


def solve(Q, G, n):
    # create 2 workers, each on a different process, passing the initial state
    worker_a = Chare(Worker, onPE=0, args=[Q, G, n])
    worker_b = Chare(Worker, onPE=1, args=[Q, G, n])
    while i < 1000:
        result_a = worker_a.setinner(node1, node2, ret=True)  # execute setinner on worker A
        result_b = worker_b.setouter(node1, node2, ret=True)  # execute setouter on worker B

        inneropt, partition, x = result_a.get()  # wait for result from worker A
        outeropt = result_b.get()  # wait for result from worker B
        ...

Note that for this example we really only need one worker. The main loop could execute one of the functions, and have the worker execute the other. But my code helps to illustrate a couple of things:

  1. Worker A runs in process 0 (same as the main loop). While result_a.get() is blocked waiting on the result, worker A does the computation in the same process.
  2. Arguments are automatically passed by reference to worker A, since it is in the same process (there is no copying involved).

In some cases, it's possible to automatically parallelize loops using Numba, though it only works with a small subset of Python:

from numba import njit, prange

@njit(parallel=True)
def prange_test(A):
    s = 0
    # Without "parallel=True" in the jit-decorator
    # the prange statement is equivalent to range
    for i in prange(A.shape[0]):
        s += A[i]
    return s

Unfortunately, it seems that Numba only works with Numpy arrays, but not with other Python objects. In theory, it might also be possible to compile Python to C++ and then automatically parallelize it using the Intel C++ compiler, though I haven't tried this yet.

참고URL : https://stackoverflow.com/questions/20548628/how-to-do-parallel-programming-in-python

반응형