Classifying Amazon Product Reviews into Positive and Negative Classes Using TF-IDF Features
Abstract
Text mining and its approaches are useful in a variety of real-world applications, including medical information retrieval, text analysis for fraud detection, and customer management in corporate intelligence applications. The data mining and natural language processing processes aid in the processing of data and the identification of required patterns in these applications. Text mining algorithms are utilized to classify Amazon product reviews in this project. This analysis may be useful for purchasers in making purchasing decisions, as well as for offering product feedback to the product producer. The Amazon product review dataset from Kaggle is used in this context. Preprocessing of the data set removes stop words and special characters from the text data. Following that, there will be two different steps. The Part of Speech Tagging (POS) tagging technique based on a Natural Language Processing (NLP) parser was used for feature extraction, and then the Term Frequency-Inverted Document Frequency (TF-IDF) based feature selection technique was utilized to identify probable keywords from the reviews. Both characteristics are then integrated, and the combined data features are utilized to create training and testing datasets. The training dataset is then used to train the Support vector machine (SVM), and the test dataset created after that is used to validate the performance. Experiments are conducted out with samples of varying sizes, increasing in size. Performance in terms of precision, recall, F1-score, time, and memory were also assessed.
The model has a higher level of accuracy and uses fewer resources. Finally, some future extensions of the work are offered based on the experimental investigation