# More MPI

Using what we've learned so far, we can now sum up an array in parallel.

 suma.py 1 import mpi 2 a = range(10) 3 4 # divide up the work 5 n = len(a)/mpi.size 6 ilo = mpi.rank*n 7 ihi = (mpi.rank+1)*n-1 8 if mpi.rank+1 == mpi.size: 9 ihi = len(a)-1 10 11 # sum one piece of the array 12 s = 0 13 for i in range(ilo,ihi+1): 14 s += a[i] 15 16 # call allreduce and print 17 s = mpi.allreduce(s,mpi.SUM) 18 if mpi.rank == 0: 19 print 'sum=',s ```\$ mpiexec -np 4 python ./suma.py sum= 45 ```

In addition to using allreduce, you can simply send a message between two processes using send and receive. The send function receives data and a destination (in this case proc 1), the recv function specifies the source it will receive data from (in this case proc 0). Note the variable named 'rc' that comes back with receive. We'll talk more about that in a moment.

 send.py 1 import random 2 import mpi 3 4 if mpi.rank==0: 5 n = random.randint(1,100) 6 print 'sending',n 7 mpi.send(n,1) 8 elif mpi.rank==1: 9 n,rc = mpi.recv(0) 10 print 'received',n ```\$ mpiexec -np 2 python ./send.py sending 71 received 71 ```

Problem:Note that we print on both rank 0 and rank 1. Does this create the possibility of overlapping output? Why, or why not?

If you don't care where you receive your data from, you can specify that too with mpi.ANY_SOURCE. Afterwards, you can use the rc variable to figure out what the source was.

 send2.py 1 import random 2 import mpi 3 4 if mpi.rank==0: 5 n = random.randint(1,100) 6 print 'sending',n 7 mpi.send(n,1) 8 elif mpi.rank==1: 9 n,rc = mpi.recv(mpi.ANY_SOURCE) 10 print 'received',n,'source=',rc.source ```\$ mpiexec -np 2 python ./send2.py sending 1 received 1 source= 0 ```

In the example above, the variable 'rc.source' contains the rank of the mpi process that sent the message.

Using the 'senda' code above, we were able to sum up an array in parallel. We "cheated" a little, because each process computed array "a" independently. Here's how we can communicate the array to the other processes.

 suma2.py 1 import mpi 2 3 if mpi.rank == 0: 4 # compute the array 5 a = range(10) 6 # send the array to everyone else 7 for i in range(1,mpi.size): 8 mpi.send(a,i) 9 else: 10 # receive the array 11 a,rc = mpi.recv(0) 12 13 # divide up the work 14 n = len(a)/mpi.size 15 ilo = mpi.rank*n 16 ihi = (mpi.rank+1)*n-1 17 if mpi.rank+1 == mpi.size: 18 ihi = len(a)-1 19 20 # sum one piece of the array 21 s = 0 22 for i in range(ilo,ihi+1): 23 s += a[i] 24 25 # call allreduce and print 26 s = mpi.allreduce(s,mpi.SUM) 27 if mpi.rank == 0: 28 print 'sum=',s ```\$ mpiexec -np 4 python ./suma2.py sum= 45 ```

Of course, the problem with the above program is it sends the entire array to each child process when it only needs to send a piece. The rank 0 process should only send what each of the other ranks need. This program fixes that problem, and introduces a new piece of python syntax, the array slice on line 14.

 suma3.py 1 import mpi 2 3 n = 0 4 if mpi.rank == 0: 5 # compute the array 6 a = range(10) 7 n = len(a)/mpi.size 8 # send the array to everyone else 9 for r in range(1,mpi.size): 10 ilo = r*n 11 ihi = (r+1)*n-1 12 if r+1 == mpi.size: 13 ihi = len(a)-1 14 mpi.send(a[ilo:ihi+1],r) 15 else: 16 # receive the array 17 a,rc = mpi.recv(0) 18 19 if mpi.rank == 0: 20 n = len(a)/mpi.size 21 else: 22 n = len(a) 23 24 # sum one piece of the array 25 s = 0 26 for i in range(n): 27 s += a[i] 28 29 # call allreduce and print 30 s = mpi.allreduce(s,mpi.SUM) 31 if mpi.rank == 0: 32 print 'sum=',s ```\$ mpiexec -np 4 python ./suma3.py sum= 45 ```

This seems like a long program for such a simple task, but parallel programming is more difficult.

Problems:

1. What's wrong with this program?
 bad60.py 1 import mpi 2 a = range(10) 3 4 # divide up the work 5 n = len(a)/mpi.size 6 ilo = mpi.rank*n 7 ihi = (mpi.rank-1)*n+1 8 if mpi.rank+1 == mpi.size: 9 ihi = len(a)-1 10 11 # sum one piece of the array 12 s = 0 13 for i in range(ilo,ihi+1): 14 s += a[i] 15 16 # call allreduce and print 17 s = mpi.allreduce(s,mpi.SUM) 18 if mpi.rank == 0: 19 print 'sum=',s
2. Modify the program "suma3" so that a is not simply set by range, but instead contains random numbers. After computing the answer in parallel, compute it again on process 0 and make sure you get the right answer.

3. Ping pong test: Process 0 should send, then receive 100 messages. Process 1 should receive, then send 100 messages. At the end, print out the amount of time it took to do all that sending and receiving.

The amount of time it takes to run a program can be determined using the time function.

 timed.py 1 import time 2 3 tstart = time.time() 4 ### measuring time below 5 s = 0 6 for i in range(1,10**7): 7 s += i 8 ### measuring time above 9 tend = time.time() 10 print 'time=',tend - tstart ```\$ python ./timed.py time= 1.32153916359 ```
