Matrix Multiplication with MapReduce

Big Data possibly now has become the most used term in the tech world for this decade. Everybody is talking about it and everybody have their own understanding towards it which has made its definition quite ambiguous. Let us deconstruct this term with a situation. Suppose you are creating a database for movie ratings where rows indicate user IDs, columns indicate movies and the values of the cells indicates rating(0-5) given by user to the corresponding movie. Now this data is likely to be sparse as you can’t have a situation where all users have rated all movies. In real world situation you can conceive the sparsity of this database and the cost it takes to store this huge database/matrix.

Hence, in data centers data is stored in sparse representation i.e (row_index,col_index,value). This way no extra memory is used to store NULL values. Now algorithms which data scientists use to develop insights from the data involve matrix multiplications. So there is a need to develop some new algorithms to perform these matrix multiplications over this humongous data as traditional algorithm doesn’t work for data stored in sparse representation.

Hence data becomes big data when traditional  algorithms doesn’t work and new algorithms are needed to be devised to do the job of these traditional algorithms. These algorithms are built on Mapreduce. If you what to get a basic understanding of Mapreduce then check this article- MapReduce explained.

Let us now understand how Matrix Multiplication is implemented with MapReduce

C= A X B

A has dimensions L,M

B has dimensions M,N

Download the data set matrix.json.

  • In the map phase:

– for each element (i,j) of A, emit ((i,k),[j,A(i,j)]) for k in 1…N

– for each element (j,k) of B, emit ((i,k),[j,B(j,k)]) for i in 1..L

12169199_10207357172466027_1714891067_o

  • In the reduce phase, emit

– key(i,k)

– value= Sum(A[i,j]*B[j,k])

12169199_10207357172466027_1714891067_o (1)

Essentially we are replicating all values of A for corresponding columns in B and are replicating all values of B for corresponding rows in A.

Define this MapReduce class separately, which will take the Map and Reduce functions defined above.

12169928_10207357172426026_1105343388_o

You will obtain the results in the same sparse representation in self.result. That’s how matrix multiplication takes place when matrices are in sparse representation.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s