2 years ago
#25857
Lavender Lee
Why does the result of ItemSimilarityJob lack some similarities of itemId-pair?
Given that I have the following ratings.csv
userId,itemId,rating
1,1,1
1,2,2
1,3,3
2,2,4
2,3,2
2,5,4
2,6,5
3,1,5
3,3,1
3,6,2
4,4,4
Using org.apache.mahout.cf.taste.hadoop.item.RecommenderJob, we have
hadoop jar /mahout-examples-0.13.0-job.jar \
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob \
--input /item-cf/ratings --output /item-cf/recommend \
--similarityClassname SIMILARITY_EUCLIDEAN_DISTANCE \
--tempDir /item-cf/temp \
--outputPathForSimilarityMatrix /item-cf/similarity-matrix
Then Hadoop gives the following results:
# similarity matrix
1       2       0.13367660240019172
1       3       0.16952084719853724
1       6       0.14459058185587106
2       3       0.28989794855663564
2       5       0.3333333333333333
2       6       0.25
3       5       0.21089672205953397
3       6       0.18660549686337075
5       6       0.3090169943749474
# recommendation  lack of user-4
1       [5:2.3875139,6:2.0722904]
2       [1:3.565752]
3       [2:2.1649883,5:1.5943621]
On the other hand, I also use the following Python script to validate the results of Mahout. That is,
import numpy as np
from sklearn.metrics.pairwise import euclidean_distances
rating = np.array(
  [
    [1, 2, 3, 0, 0, 0],
    [0, 4, 2, 0, 4, 5],
    [5, 0, 1, 0, 0, 2],
    [0, 0, 0, 4, 0, 0]
  ]
)
1 / (1 + euclidean_distances(ratings.T))
#
#array([[1,       0.13368, 0.16952, 0.13368, 0.13368, 0.14459],
#       [0.13368, 1,       0.2899,  0.14286, 0.33333, 0.25   ],
#       [0.16952, 0.2899,  1,       0.15439, 0.2109,  0.18661],
#       [0.13368, 0.14286, 0.15439, 1,       0.15022, 0.12973],
#       [0.13368, 0.33333, 0.2109,  0.15022, 1,       0.30902],
#       [0.14459, 0.25,    0.18661, 0.12973, 0.30902, 1      ]])
However, I think Mahout gives a wrong similarity matrix, so I have the following questions/confusion:
- Why does it lack the similarities of item-id pairs - (1, 4), (1, 5), (2, 4), (3, 4), (4, 5), (4, 6)? How to explain the similarity matrix of Mahout?
- In addition, the recommendations lack the result of user-4, why? 
# similarity matrix --similarityClassname=[SIMILARITY_EUCLIDEAN_DISTANCE]
1       2       0.13367660240019172
1       3       0.16952084719853724
1       4           ?
1       5           ?
1       6       0.14459058185587106
2       3       0.28989794855663564
2       4           ?
2       5       0.3333333333333333
2       6       0.25
3       4           ?
3       5       0.21089672205953397
3       6       0.18660549686337075
4       5           ?
4       6           ?
5       6       0.3090169943749474
similarity
mahout
euclidean-distance
mahout-recommender
0 Answers
Your Answer