Data Mining of Mass Storage Based on Cloud Computing


Cloud computing is an elastic computing model that the users can lease the resources from the rentable infrastructure. Cloud computing is gaining popularity due to its lower cost, high reliability and huge availability. To utilize the powerful and huge capability of cloud computing, this paper is to import it into data mining and machine learning field. As one of the most influential and open competition in machine learning area, Netflix Prize attached with mass storage had driven thousands of teams across the world to attack the problem, among which the final winner was BellKor’s Pragmatic Chaos team, who bested Netflix’s own algorithm for predicting ratings by 10%. Their solution is an ensemble of a large number of models, each of which specializes in addressing a different aspect of the data. Among such different models, k-nearest neighbors (KNN) and Restricted Boltzmann Machine (RBM) are reported to be two most important and successful models. As a result, we build two predictors based on such two model respectively with the order to testify their performance based on cloud computing platforms. The results show that KNN can achieve root mean square deviation (rmse) with 0:9468 after the Global Effect (GE) data preprocessing, which is better than the Cinematch’s performance with rmse being 0:951. The rmse for RBM algorithm is about 0:9670 on the raw dataset, which can be further improved by KNN model.

The Ninth International Conference on Grid and Cloud Computing
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.