Flight Passenger Prediction
The increasing number of people using the airport each year also creates greater challenges in handling passengers, e.g. when checking in or at the security check. Possible passenger dissatisfaction is mostly due to problems with airport handling of passengers and baggage. This applies in particular to delays in check-in and problems with seat allocation, loss of luggage or slow delivery to the destination as well as inadequate flow of information and poor treatment in the event of unusual events such as missed flights, strikes and weather influences.
To reduce consumer dissatisfaction caused by check-in and late security screening, airports need to plan the security personnel required in advance. The aim is to have enough employees ready to ensure the fastest possible passenger handling without long queues; at the same time, however, to achieve the highest possible employee utilization. Therefore, an intelligent passenger forecast system is required that automatically provides the airport staff with all necessary information.
The most important information for personnel planning is the expected number of passengers for each time instance with a forecast horizon of several weeks or even months. This means that a demand forecast is required for the entire airport, including all routes and airlines.
As part of this project, Knowtion has developed a state-of-the-art algorithm pipeline that uses all airport information and data from the past and predicts the expected number of passengers for each flight in the next few months. This is used to calculate the expected number of passengers at the airport for each future time instance.
From a data processing point of view, there are several challenges to be solved in order to have a system that can accurately and efficiently predict the number of passengers for each individual flight. The algorithms pipeline has to take into account with many uncertainties and special events like strike, weather conditions, holidays or unknown routes and schedule changes. Taking all these factors and their dependencies into account leads to a large feature vector for each flight. The algorithm pipeline must automatically find the factors that are relevant to an accurate prediction. Modern data-driven models and algorithms for machine learning are used, for example random forests and neural networks.
The large feature vector and the large number of flights to be taken into account lead to a huge data set that has to be processed. Modern big data technologies were used to handle such large amounts of data efficiently. For example, in order to process all flights on multiple distributed cluster nodes in parallel, the algorithm pipeline was implemented on a distributed processing framework such as Spark. This enables efficient processing for passenger forecast of all flights.