Developing an intelligent trip recommender system by data mining methods

Internet has a very wide usage in almost every sector. People are continuously looking and searching for information through internet. Narrowing down relevant search results is not a very simple task. Recommender systems are being used in almost every search related area. Tourism domain is one of these sectors. This study proposes an implementation of an expert system framework which can accurately classify users and make predictions about user classifications for recommending tourism related services. Proposed approach predicts clusters for system users and according to these user clusters, trips, hotels and such services can be recommended individually or as a campaign to target user or user groups.


Introduction
Internet has a very wide usage in almost every sector.People are continuously looking and searching for information through internet.Narrowing down relevant search results is not a very simple task.Recommender systems (RS) are gaining popularity for filtering and providing relevant information about a person's search on a specific topic.A recommender system tries to predict a rating value of an item for a target user.To perform an accurate prediction, such systems use their member profiles and member behaviors on the system [1].
Recommender systems are being used in almost every search related area.Tourism domain is one of these sectors.Most of the recommender system applications in tourism domain involve proposing travel destinations, trip and activity recommendations and hotel suggestions in a destination within a given set of user defined constraints.Users can define budget limits, time intervals, interests, desired locations or similar necessities.After retrieving such data, a recommender system analyzes user input and proposes a relevant output [2].To generate accurate predictions, many approaches are tested by different researchers.Generally, most of these approaches are based on acquiring a set of parameters which can be used as constraints for the recommendation system.This study proposes an implementation of an expert system framework which can accurately classify users and make predictions about user classifications.Proposed approach predicts clusters for system users and according to these user clusters, trips, hotels and such services can be recommended individually or as a campaign to target user or user groups.
There are many recommender systems available on tourism domain.Main purpose of our proposed approach is increasing classification accuracy.The following steps summarizes details of the proposed approach: (1) Discretizing initial travel data set.( 2 The remainder of this paper is organized as follows: Section 2 includes reviews of recent studies about travel recommender systems, Section 3 describes materials and methods used in the proposed approach, Section 4 presents comparison results of the proposed approach and Section 5 contains the conclusion and summary of this study.

Related Works
Many studies were made about recommender systems.This section mostly focuses on previously implemented recommender systems on tourism domain in a chronological order.
In 2008, authors developed a mobile recommendation system which generates a user profile by considering users which are having similar interests on items.System tries to determine a list of activities for the target user and based on these activities, it generates trip plans.Proposed approach contains an ontology model.Recommendations were generated by the past experience of the system with similar users [3].
In other study, researchers developed a system which generates personalized recommendations of touristic attractions.Proposed system integrates heterogeneous online travel information using a tourism ontology.Travel behavior of the target user and similar users were analyzed to generate recommendations.Bayesian network technique and the analytic hierarchy process method used for recommendation engine [4].
An expert travel agent was developed for assisting tourists by suggesting package holidays and tours.The proposed method employs a hybrid approach containing both content-based and collaborative filtering methods.Demographic data was also used in recommendation system.Authors emphasized that the choice of this hybrid approach was made to cover shortcomings of each of the individual recommendation methods [5].
In another research, a semantic hotel recommender system was developed.To generate recommendations, hotel ontology was combined with a fuzzy logic approach.To involve customer experience, system contains a feedback mechanism which allows users to rate the generated recommendations.In order to generate more accurate recommendations, these ratings were used for updating fuzzy rules [6].
Another recommender engine study covers trip recommendations for both individuals and tourist groups.Group recommendation mechanism aggregates and intersects individual recommendations which were made for every member in a given group.Recommendation engine employs both demographic and content-based filtering methods [7].
In 2012, a decision support system for tourist attractions was implemented by combining Engel-Blackwell-Miniard model and Bayesian network approaches.Data which was published by the Tourism Bureau of Taiwan was used while building the proposed recommendation system.Generated recommendations were displayed on Google Maps to provide more detailed information for tourists [8].
A hybrid recommendation system which contained both content-based and collaborative filtering methods was built to propose better personalization for recommender systems in tourism.The proposed system was implemented using association based classification approach.Concepts form association and classification were combined to involve association rules in a prediction context [9].
Another mobile tourism recommendation system was implemented using a location based collaborative filtering method.The proposed system generates recommendations by considering other tourists' ratings on their visited attractions.Users exchange their rating through a mobile peer to peer connection.Three data exchange methods were proposed for effectively exchanging ratings about visited attractions [10].
Social media related recommender systems are also gaining popularity.From this perspective, [11] introduced a recommendation system which analyzes geo-tagged social media to recommend landmarks for customized travel planning.System obtains trip's spatial and temporal properties and using these properties, it computes the significance of landmarks.Specific landmark clusters are generated for similar themes and these clusters are recommended to system users [11].
A different research group proposed a travel schedule planning algorithm which generates customized recommendations based on user requirements.With a user-adapted interface, users can make changes on recommendation results and the provided feedback mechanism improves system's accuracy for later recommendations [12].
Another study which focused on social media based recommendation includes a method for city travel recommendation system.Researchers applied principals from both content based and collaborative filtering techniques.User preferences were mined from communitycontributed geotagged photos archive.User similarities were taken into account for improving accuracy of the proposed model [13].
Table 1 lists all of the mentioned studies by their publication years, titles, authors and methods.

Material and Methods
This section provides details about data gathering and pre-processing steps.Algorithms of the methods which were used to process the data set are also explained.

Data gathering and preprocessing
Initial data set of customer flights and details was obtained from an existing travel platform's database.Since the travel platform is working with different data sources, retrieved data was extracted from a nested XML structure.To convert nested XML structure to tabular format, XPath and T-SQL expressions were used.After converting data to tabular format, all of the identity columns were removed from data set.As a result of this data processing, 12 attributes were retrieved.The city in which the passenger is leaving from.

ArrivalCity
The city in which the passenger is arriving to.DepartureAirline Airline company for departure flight.

DepartureAirlineClass
Flight class for departure flight.

ReturningDepartureCity
The city in which the passenger is returning from.

ReturningArrivalCity
The city in which the passenger is returning to.

ReturningAirlline
Airline company for returning flight.

ReturningAirlineClass
Flight class for returning flight.

ReturnDate
Returning date of trip.
Records with missing attributes and duplicate entries were also removed from data set.And a total of 3213 records were obtained for processing.To be able use the data in Xmeans and ANFIS algorithms, nominal data was converted to numeric data and normalized into same range (between 0 and 1).
Final data set was used to discover clusters.After obtaining clusters for each record, 66% of data was used for training and the remaining 34% was used for testing the prediction models.

Used methods
This section contains brief information about the methods which were used in this study.

Xmeans clustering
K-means clustering algorithm is a simple but popular approach for finding clusters in a given data set.But there are some important shortcomings for this method such as the necessity of providing the number of clusters and random located initial cluster centers.[14] proposed Xmeans clustering method to overcome these drawbacks.It works as extending K-means with efficient estimation of the number of clusters.The algorithm searches the space of cluster locations and number of clusters to optimize the Bayesian Information Criterion (BIC) or the Akaike Information Criterion (AIC) measure [14].

Fuzzy c-means clustering
In fuzzy c-means clustering, every point has a degree of belonging to clusters.This causes some points on the edge of a cluster to be shared by other clusters.For any point x, there is a set of coefficients which gives the degree of being a member for a given cluster.The weighted means of all points in a cluster is the centroid of that cluster.The degree of belonging is related inversely to the distance from x to the cluster center.It also depends how much weight is given to the closest center [15,16].

Adaptive neuro fuzzy inference system (ANFIS)
ANFIS is a neural-fuzzy system which contains both neural networks and fuzzy systems.A fuzzy-logic system can be described as a non-linear mapping from the input space to the output space.This mapping is done by converting the inputs from numerical domain to fuzzy domain.To convert the inputs, firstly, fuzzy sets and fuzzifiers are used.After that process, fuzzy rules and fuzzy inference engine is applied to fuzzy domain.The obtained result is then transformed back to arithmetical domain by using defuzzifiers.Gaussian functions are used for fuzzy sets, and linear functions are used for rule outputs on ANFIS method.The standard deviation, mean of the membership functions and the coefficients of the output linear functions are used as network parameters of the system.The summation of outputs is calculated at the last node of the system.The last node is the rightmost node of a network.In Sugeno fuzzy model, fuzzy ifthen rules are used [17,18,19].

Radial basis function networks (RBFN)
Radial basis function network (RBFN) is a neural network model which uses radial basis functions (RBF) as activation functions of the system.System generates an output as a linear combination of neuron parameters and inputs' radial basis functions.These networks contain three layers.The first layer is an input layer, the next layer is a hidden layer which contains a non-linear RBF activation function and the last layer is a linear output layer.The input of the system can be a vector of real numbers [20].

Bayesian network
Bayesian Networks produce probability estimates as network output like logistic regression models.Prediction is not produced by the system itself.The main purpose of the system is estimating the probability of an instance for each class value whether that value suits for a class or not.If we compare plain predictions with probability estimates, we see that probability estimates are more useful than plain predictions because we can rank the predictions with probability estimates.In Bayesian Networks, the conditional probability distribution of the value of a given class attribute is predicted within other class attribute [20].

ID3
ID3 is an algorithm which is used for generating decision trees.Non-categorical attributes are represented by nodes.And each arc corresponds to a possible value of an attribute.A leaf of the tree specifies the expected value of the categorical attribute for the records.These records are described by the path from the root to that leaf.For measuring node informativness, entropy is used.Each node should be associated with the most informative non-categorical attribute which is not yet considered in the path from the root [20].

Results and Discussions
Both Xmeans and Fuzzy C-Means clustering algorithms were applied on the same data set separately.After obtaining clusters by using these algorithms, ANFIS, Radial Basis Function Networks, ID3 and Bayesian Network algorithms were processed for each clustered data set.According to the given results in Table 3, lowest RMSE and highest correctness values are obtained by combining ANFIS and Fuzzy C-means clustering algorithms whereas highest sensitivity value is obtained by combining ID3 and Fuzzy C-means clustering algorithms.If we compare Fuzzy C-means and Xmeans clustering algorithms, Fuzzy C-means algorithm generates better results which shows that the accuracy of the prediction model will be higher.
The obtained results indicate that combining Fuzzy C-means clustering algorithm with ANFIS method will be a suitable approach for implementing an expert system which can be used for classifying and proposing flight destinations for users.According to the system output, a recommender engine can propose hotels and related services for a target user.Figure 1 shows the architecture of such a recommender system.As it is stated in the diagram, data will be extracted from various databases.Customer clustering, classification and rule generation steps will be performed using the extracted data set.And product recommendation engine can be used for proposing products or planning specific campaigns for desired user groups.

Conclusion
In this study, various algorithms were compared for clustering and classifying customer flight data set.And an architecture for an expert system which can be used for generating recommendations or planning specific campaigns for desired user groups was proposed.As it is stated in the previous section, best clustering and classification results are obtained by combining ANFIS and Fuzzy c-means algorithms.An expert system which can produce efficient recommendations can be implemented by using ANFIS and Fuzzy c-means algorithms together.
) Discovering user clusters for prediction model.(3) Training data mining model for prediction of user classes.(4) Predicting class for a target user.(5) Recommending services suitable for predicted user class.

Table 1 .
Publications by years

Table 2
lists these attributes.

Table 2 .
Data set attributes

Table 3
lists root mean squared error (RMSE), sensitivity, specificity, precision and correctness values for each of the methods mentioned above.