A New Machine Learning Algorithm for Users’ Movie Recommendation

Conference: CIBDA 2022 - 3rd International Conference on Computer Information and Big Data Applications
03/25/2022 - 03/27/2022 at Wuhan, China

Proceedings: CIBDA 2022

Pages: 4Language: englishTyp: PDF

Authors:
Liu, Zhongyuan (Faculty of Rail Transit, Wuxi University, Wuxi, China)
Wang, Xuefei (Shanghai Fuxing Senior High School, Shanghai, China)
Zhu, Hongzheng (College of Computer and Information Science, Faculty of Automation, Southwest University, Chongqing, China,)

Abstract:
In this information era, even a movie lover may feel overwhelmed when facing numerous movies on the movie website. Therefore, an efficient movie recommendation system is needed. In this paper, two different recommendation systems were compared, based on the user-based collaborative filtering and the content-based collaborative filtering respectively. Basically, several relevant studies were referred to, and a huge dataset was used from MovieLens. After pre-processing the data, where the data was transformed into matrixes of each user’s rating and the corresponding movie (the rating matrix represented the eigenvector of each user), cosine similarity was used to obtain the similarity relationship between users by calculating the cosine similarity between those eigenvectors. Then, K-NearestNeighbor (KNN) algorithm was used to select a certain number of users having the highest similarity with the target, based on which predicted how the target would rate a movie and recommended the movie with the highest predicted score to the target. In the part of using the content-based collaborative filtering, films in each category were ranked based on the average and the number of the scores and the user’s favorite movie types were obtained by the user-movie genre preference matrix based on the scores that users gave to each type of movies. The movies with the highest score in each target user’s favorite movie type would be recommended to the target. Experimental outcomes indicate that both algorithms generally meet the needs of users despite of the differences in the results.