Tuesday, February 25, 2014
Sunday, February 9, 2014
Fast and Efficient Computation of PageRank
Team Members
Rakesh Bhatia, Christian Reifferscheid
Introduction
Nowadays everyone uses search engines and hence their underlying
algorithms. One of the most popular search engines is Google. Along with
other algorithms Google uses the PageRank to determine the importance
of a website. The computation of the PageRank for billions of websites
can be done fast and efficient via parallel computing.
be a website then the PageRank
is calculated by the sum
of other websites which are linking to
divided by
,
all outgoing hostlinks to other domains. To prevent websites with no
outgoing links to gain more PageRank than necessary, there is also a
damping factor
which is added to the equation. (In general it turns out that
is a good damping factor). This leads to the following equation where
is the number of websites.

Another more mathematic way to look at the PageRank is to see it as a probability distrubution which reprents the likelihood that a real person randomly surfing on the web reaches the current page
.
We can rearrange the equation into matrizes and see that the PageRank values are entries of an eigenvector:

with
as the solution of

Because not every website is linked to another website, the big
matrix is a sparse matrix.
The PageRank computation
The web contains billions of websites with billions of links which
are changing everyday. PageRank computation is an excellent application
for parallel computing, because the PageRank has to be calculated very
often and very fast on a huge set of data. Google allows you nowadays
also to choose preferenced sites which probably gain more PageRank. The
computation of PageRank can be done in different ways ( it’s solving a
linear algebra eigenvector equation). We can choose an iterative way for
solving the equation e.g. with the Jacobi-Method until it converges.
we assume

and can compute
.
The computation ends when, for some small
, the equation

fulfilled. Depending on
the computation needs usually 50-100 iterations.
The PageRank formula
The PageRank algorithms weighs websites according to their popularity. The more incoming links from other domains a web page has, the more popular it is and the higher the PageRank. LetAnother more mathematic way to look at the PageRank is to see it as a probability distrubution which reprents the likelihood that a real person randomly surfing on the web reaches the current page
We can rearrange the equation into matrizes and see that the PageRank values are entries of an eigenvector:
with
Because not every website is linked to another website, the big
The PageRank computation
The web contains billions of websites with billions of links which
are changing everyday. PageRank computation is an excellent application
for parallel computing, because the PageRank has to be calculated very
often and very fast on a huge set of data. Google allows you nowadays
also to choose preferenced sites which probably gain more PageRank. The
computation of PageRank can be done in different ways ( it’s solving a
linear algebra eigenvector equation). We can choose an iterative way for
solving the equation e.g. with the Jacobi-Method until it converges.Iterative computation
As initial probability distribution atand can compute
The computation ends when, for some small
fulfilled. Depending on
Parallel Approach:
We plan to create a graph structure representing multiple websites, each with multiple links to other websites. Then, we will split this graph among multiple processors, where each processor gets a chunk of the graph (a sub-graph). Each processor will do a PageRank computation on its sub-graph, and send this value to the master processor. At the moment, we are still trying to determine how to communicate the requisite data values between processors, and how to do the final "housekeeping" with our master processor.References:
[1] Fast Parallel PageRank: A Linear System Approach. Yahoo! Technical Report, 2004[2] “Efficient Parallel Computation of PageRank“, Christian Kohlschütter, Paul-
[2] http://en.wikipedia.org/wiki/PageRank
[3] The Anatomy of a Large-Scale Hypertextual, Web Search Engine , Sergey Brin, Lawrence Page, Stanford University
Subscribe to:
Posts (Atom)