Fraud Prediction in Movie Theater Credit Card Transactions using Machine Learning

This paper highlights how the proliferation of online transactions, especially those involving the use of credit cards, has resulted in the emergence of new security flaws that pose threats to customers and enterprises worldwide. E-commerce and other forms of online monetary transactions have become essential in the manufacturing and service sectors, propelling the global economy. The widespread and dependent connectivity of mobile payment systems using credit card transactions presents chances for fraud, risk, and security breaches. In light of the importance of accurately predicting fraud incidents through payment procedures, this study investigated the credit card payment methods used for movie tickets, using the machine learning logistic regression method to analyze and predict such incidents. This study used a dataset from cinema ticket credit card transactions made in two days of September 2013 by European cardholders, including 284,807 transactions out of which 492 were fraudulent purchases. The results of the proposed method showed a prediction accuracy of 99%, proving its high prediction performance.


INTRODUCTION
Credit card frauds have a negative impact not just on customers, but also on businesses [1]. Along with the rise in the popularity of credit card use and online transactions, there is also an increase in fraudulent activity. Therefore, it is vital to detect fraudulent use of credit cards to prevent monetary losses and preserve the personal and financial data of consumers [2]. Identifying fraudulent activity on credit cards can be accomplished using several methods from various angles. Rule-based systems or machine learning algorithms can examine patterns of normal behavior and identify deviations from the norm [3]. Trust is another issue that needs to be addressed in terms of credit card concerns [4]. Payment systems are the engine that drives e-commerce business, and a substantial amount of trust and security in cutting-edge technology is necessary [5]. Furthermore, it is not unusual for mobile users to switch to new solutions, adjust their payment routines, and encounter other types of security challenges [6][7].
This study used a machine learning model to predict fraudulent activity in credit card payment transactions. Today, electronic payment methods are increasingly growing and are increasingly vital and convenient [8][9]. Several studies have shown that phone-based payments are a crucial and necessary addition to the available payment retail [10]. Electronic payment systems are particularly vulnerable to fraud and cyber security breaches since they store sensitive consumer information such as ID, card number, card name, purchase history, and more [11]. Because of all these interconnected factors, it is essential to examine the problem of credit card fraud from a variety of perspectives to determine how it can be resolved. This study also considered other dimensions associated with electronic payment operations to detect unusual or suspicious behaviors, taking into account additional facets related to electronic payment operations, such as the geographic location of transactions. For instance, if a credit card is used unexpectedly multiple times in a short period, this could be indicative of fraudulent activity [12][13]. Another aspect lies in the analysis of user behavior to identify any unique patterns that may emerge. When a user suddenly starts making large transactions or uses his card at irregular times, this may be an indication of fraudulent activity [14]. Machine learning is one of the most important fields in this sector and currently depends on algorithms that can boost an organization's productivity and performance. There are four distinct methodologies of machine learning: supervised, unsupervised, semi-supervised, and reinforced learning [15][16][17][18].
This study used Logistic Regression (LR) as a classification algorithm to assign the data to a discrete set of classes. LR resembles the Linear Regression model, however, it uses a more complex cost function called Sigmoid function or logistic function rather than a linear one [19]. LR is a method for making predictions and its outcomes are converted into

www.etasr.com Alshutayri: Fraud Prediction in Movie Theater Credit Card Transactions using Machine Learning
probability values. To limit the cost function to values between 0 and 1, an LR hypothesis must be taken into account. As a result, linear functions are unable to adequately describe this because they can have a value that is either more than 1 or less than 0, both of which are prohibited by the LR hypothesis [19]. When using a classification strategy, LR is applied to the observations of the discrete classes involved, operating on the principle of probability and employing the predictive analysis on the provided scenarios. To accurately estimate the necessary probabilities, the sigmoid function converts any real values between 0 and 1 by: To validate the hypothesis and ensure its consistency with the expected assumption, the value of the hypothesis must fall somewhere between 0 and 1 by: When the prediction function is applied to the classifier, the classifier will provide values based on a set of outputs on probability. This establishes the decision boundaries with a selected threshold value. The accuracy of the optimized model is represented by the cost function after it has been optimized to provide the least amount of inaccuracy: Gradient descent is used to reduce the cost values using: Several previous studies used LR for detection and prediction. The need to apply security in e-commerce in a highly effective way creates the concepts of asset protection, security prediction, and vulnerability detection to save end users and resources that can be exploited. LR was a vital component in the development of deep and machine-learning models on the Google Cloud Platform [20]. Similarly, LR was used to forecast student performance in community colleges [21]. In [22], a road crash zone was modeled using LR to determine the nature of the accidents and the types of people that were involved. In a similar vein, a privacy-protecting LRbased diagnosis method for digital healthcare was presented in [23]. The effectiveness of machine-learning RL for tomography processing was shown in [24]. In [25], LR and a survival model were used in Russian exports. In [26], an intelligent categorization of coal structure was presented, using multinomial LR. In [27], LR was effectively used in nursing. In [28], a prediction model for clinical applications was developed using LR, showing its feasibility. In [29], an SK-Tree method to detect malicious events on a portion of a publicly available dataset achieved an AUROC score of 98%. In [19], several machine learning algorithms were investigated to detect and analyze frauds in online transactions with a European credit card dataset, proposing a novel fraud detection method to stream transaction data by analyzing the old transaction details of customers and extracting behavioral patterns. The study in [30] focused on fraud events in real-world transactions, using and evaluating a series of machine learning algorithms, such as Vector Machine, Naive Bayes, K-Nearest Neighbor, and LR. In [31], a credit card fraud detection system was presented using machine learning algorithms that included the modeling of the dataset. The past credit card transactions were modeled with data considered as fraud, and this model was used to define if a new transaction is fraud. This study showed that fraud detection is a classification problem and focused on preprocessing and analyzing datasets using different anomaly detection algorithms, such as local outlier factor and isolation forest, on a Principal Component Analysis (PCA)-transformed credit card transaction dataset. PCA reduced the feature dimensionality [32] and kept the most effective features to improve the prediction process [33].
The above-mentioned studies in the e-commerce security domain used different machine learning algorithms to examine which fit best to their datasets and provide the best accuracy scores. In other words, these studies show the importance of carefully choosing machine learning algorithms that are compatible with the underlying dataset and have the best accuracy performance.

II. METHODOLOGY
This study focused on applying machine-learning algorithms to predict credit card fraud transactions in movie theaters. Figure 1 shows the flow chart of the proposed method. LR was used to make the predictions, leading to successful results. These two approaches were combined into one. As a consequence, the procedure started with the construction of the model, followed by the collection and preprocessing of the dataset. Then, the model was trained and tested for its

www.etasr.com Alshutayri: Fraud Prediction in Movie Theater Credit Card Transactions using Machine Learning
prediction accuracy. The most important phase of this method was the generation and mapping of credit card fraud incidents in movie theater transactions. The study began by isolating movie theater credit card transactions, from all the credit card transactions contained in the dataset. In the end, only the dataset for credit card payments of movie tickets was selected for the best-fit test, which was followed by the import and cleaning of the data, their division into training and test sets, and finally using the machine-learning LR algorithm to determine the best-fit for making predictions.
The necessary data were collected from Kaggle [34], as shown in Table I, which contains a total of 284,807 transactions and 492 fraudulent transactions that took place in two days in September 2013 by European cardholders. The dataset has a huge imbalance, as the positive class (frauds) accounts for only 0.172% of all transactions. The features of the dataset were represented by a number value (V1-V28). The PCA method was used, and "Time" and "Amount" were the only unchanged features. The "Time" feature saves the number of seconds that elapsed between the first and each subsequent transaction, for each subsequent transaction in the dataset. The "Class" feature is the response variable, assigned the value of 1 in the event of fraud and 0 in all other cases, whereas "Amount" denotes the total amount of the transaction. This study selected only transactions that involved purchasing a ticket and additional services for a cinema. The data include not just numerical values, but also a description and a "Label".  Figure 2 shows the particular processes to import the dataset and initialize the fraud variable. Experimental import and initialization of the fraud variable.
Python, Pandas, and Scikit-learn were used to develop the proposed method. In the first stage, the dataset was loaded using Pandas, and the value of the fraud variable was initialized to 0 or 1, based on the dataset. In the second step, the test sample was defined, and the algorithm was applied using Scikit-learn, as shown in Figure 3. In the third phase, LR was used to perform the classification, and then the accuracy of the model was assessed, as shown in Figure 4. The variable X is independent and denotes the credit card transactions, while the variable Y is dependent and denotes the class of the transaction or the fraud flag. Figure 4 shows the results of the proposed method, indicating an accuracy of 0.9988. Fig. 3.
Selecting the SK-learn algorithm, importing the model, and defining the test sample. The accuracy results of this study were compared with similar studies that used LR to detect fraudulent credit card transactions, as shown in Table II. The study in [35] used a credit card transaction dataset from the UCI Machine Learning Data Repository to detect frauds in credit card transactions, while studies [36][37][38] used the same dataset as this study. All these studies used the LR method to achieve high accuracy and predictive performance, regardless of their individual approaches. This study achieved the best prediction performance result. IV. CONCLUSION The results of this study indicate that the implementation of a machine learning approach is imperative for the detection and prevention of fraudulent activities within payment operations. This study proposed the use of a logistic regression model to investigate fraudulent credit card transactions, using a machine learning approach to analyze various credit card payment methods for purchasing movie tickets. The raw data were divided into training and testing samples. The use of historical data to accurately predict future outcomes is a significant impetus for the extensive implementation of machine learning in various sectors. This study showcased the impact of the widespread adoption of digital financial transactions, particularly those involving credit cards, on the emergence of novel security risks that pose a threat to businesses and customers worldwide. The emergence of e-commerce and other electronic forms of monetary transactions has a crucial role in the production and service sectors, thereby promoting the advancement of the worldwide economy. The study was initiated based on the acknowledgment that, despite their widespread use, mobile credit card payment systems present opportunities for fraudulent activities, potential hazards, and security violations across all industries due to their ubiquitous and interdependent connectivity.

www.etasr.com Alshutayri: Fraud Prediction in Movie Theater Credit Card Transactions using Machine Learning
This study investigated credit card transactions related to cinema ticket purchases to detect potentially fraudulent activities. The application of logistic regression machine learning provided an accurate prediction. The dataset used in this study contained credit card transactions made by European cardholders during September 2013 [38]. The purchase of cinema tickets was segregated and analyzed autonomously from the remaining transactions. The dataset contained 284,807 transactions, where 492 were identified as fraudulent. The data were analyzed using logistic regression, and the findings showed a predictive accuracy of 99.88%, indicating an exceedingly prognostic performance.