1. How to Program, Part I
  2. How to Program, Part II
  3. How to Program, Part III
  4. How to Program, Part IV
  5. How to Program, Part V
  6. How to Program, Part VI
  7. exercises
  8. pyMPI tutorial
  9. Calculating PI, Part I
  10. Calculating PI, Part II
  11. Calculating PI, Part III
  12. Dividing Work
  13. More MPI
  14. Poogle - Web Search
  15. Mandelbrot Sets
  16. Mandelbrot, The Code
  17. Mandelbrot, The Images
  18. Mandelbrot In CUDA
  19. Conway's Life, Part I
  20. Life Code Listing
  21. Conway's Life, Part II
  22. MPI Life Code Listing

Poogle - Web Search

Our next application is a web search program.

It will take a series of search terms (the items in the list called "inputs") and search through the various web pages (in the "pages" list) and count the number of pages where a match is found.

You can edit the list of pages and/or search terms to find your own results.

poogle.py
1# Challenge, return number of matches within a page
2import mpi
3import urllib
4 
5# what we are searching for
6inputs = [
7    "agent",
8    "cancer",
9    "solar",
10    "necromancer",
11    "robots"
12    ]
13 
14# collection of web pages our search engine knows about
15pages = [2, 6, 9, 10, 11, 12, 14, 15, 16, 17, 18, 19, 20,
16    21, 22, 23, 27, 29, 31, 32, 33, 36, 38, 39, 41, 42,
17    44, 48, 50, 51, 54, 55, 57, 63, 64, 66, 68, 69, 70,
18    71, 72, 73, 75, 76, 77, 78, 79, 80, 81, 83, 84, 85,
19    86, 87, 88, 89, 90, 92, 93, 94, 95, 96, 97, 99, 100,
20    101, 102, 103, 104, 106, 109, 110, 113, 114, 115,
21    118, 119, 122, 123, 125]
22 
23n = len(pages)/mpi.size
24ilo = mpi.rank*n
25ihi = (mpi.rank+1)*n-1
26if mpi.rank+== mpi.size:
27    ihi = len(pages)-1
28 
29# each mpi proc searches a subset of pages from ilo to ihi
30c = range(ihi+1)
31for i in range(ilo,ihi+1):
32    page = 'http://stevenrbrandt.com/wordpress/?p='+str(pages[i])
33    c[i] = urllib.urlopen(page).read().lower()
34 
35for input in inputs:
36    matches = []
37 
38    for i in range(ilo,ihi+1):
39        if c[i].find(input) >= 0:
40            matches.append(pages[i])
41 
42    # proc zero receives the results of all searches
43    if mpi.rank == 0:
44        for i in range(1,mpi.size):
45            other_matches = mpi.recv(mpi.ANY_SOURCE)[0]
46            for match in other_matches:
47                matches.append(match)
48    else:
49        mpi.send(matches,0)
50 
51    mpi.barrier()
52 
53    if mpi.rank == 0:
54        print input,len(matches)
$ pwd
/var/www/html/cios/work/python
$ mpiexec -np 2 python poogle.py
agent 2
cancer 6
solar 12
necromancer 9
robots 2
Running MPI