1 year ago
#25857
Lavender Lee
Why does the result of ItemSimilarityJob lack some similarities of itemId-pair?
Given that I have the following ratings.csv
userId,itemId,rating
1,1,1
1,2,2
1,3,3
2,2,4
2,3,2
2,5,4
2,6,5
3,1,5
3,3,1
3,6,2
4,4,4
Using org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
, we have
hadoop jar /mahout-examples-0.13.0-job.jar \
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob \
--input /item-cf/ratings --output /item-cf/recommend \
--similarityClassname SIMILARITY_EUCLIDEAN_DISTANCE \
--tempDir /item-cf/temp \
--outputPathForSimilarityMatrix /item-cf/similarity-matrix
Then Hadoop gives the following results:
# similarity matrix
1 2 0.13367660240019172
1 3 0.16952084719853724
1 6 0.14459058185587106
2 3 0.28989794855663564
2 5 0.3333333333333333
2 6 0.25
3 5 0.21089672205953397
3 6 0.18660549686337075
5 6 0.3090169943749474
# recommendation lack of user-4
1 [5:2.3875139,6:2.0722904]
2 [1:3.565752]
3 [2:2.1649883,5:1.5943621]
On the other hand, I also use the following Python script to validate the results of Mahout. That is,
import numpy as np
from sklearn.metrics.pairwise import euclidean_distances
rating = np.array(
[
[1, 2, 3, 0, 0, 0],
[0, 4, 2, 0, 4, 5],
[5, 0, 1, 0, 0, 2],
[0, 0, 0, 4, 0, 0]
]
)
1 / (1 + euclidean_distances(ratings.T))
#
#array([[1, 0.13368, 0.16952, 0.13368, 0.13368, 0.14459],
# [0.13368, 1, 0.2899, 0.14286, 0.33333, 0.25 ],
# [0.16952, 0.2899, 1, 0.15439, 0.2109, 0.18661],
# [0.13368, 0.14286, 0.15439, 1, 0.15022, 0.12973],
# [0.13368, 0.33333, 0.2109, 0.15022, 1, 0.30902],
# [0.14459, 0.25, 0.18661, 0.12973, 0.30902, 1 ]])
However, I think Mahout gives a wrong similarity matrix, so I have the following questions/confusion:
Why does it lack the similarities of item-id pairs
(1, 4), (1, 5), (2, 4), (3, 4), (4, 5), (4, 6
)? How to explain the similarity matrix of Mahout?In addition, the recommendations lack the result of user-4, why?
# similarity matrix --similarityClassname=[SIMILARITY_EUCLIDEAN_DISTANCE]
1 2 0.13367660240019172
1 3 0.16952084719853724
1 4 ?
1 5 ?
1 6 0.14459058185587106
2 3 0.28989794855663564
2 4 ?
2 5 0.3333333333333333
2 6 0.25
3 4 ?
3 5 0.21089672205953397
3 6 0.18660549686337075
4 5 ?
4 6 ?
5 6 0.3090169943749474
similarity
mahout
euclidean-distance
mahout-recommender
0 Answers
Your Answer