Model to Predict Survival of Transportation and Shipping Companies

The article focuses on the issue of evaluation of economic sustainability of transportation and shipping companies. The economic crisis, which is currently abating in many world’s countries, has affected all industries, including transportation and shipping. This important sector of national economy with a multifunctional character has been influenced by the economic recession very significantly and its consequences are visible even now. For this reason there is a growing demand for methods allowing comprehensive evaluation of businesses. This article contributes to the topic by describing a procedure for development of a model for comprehensive evaluation of transportation and shipping companies. The model makes it possible to monitor, to evaluate and to signal financial health and performance of such companies. Finally, the model is validated and its predictive power is assessed. The proposed model is an output from research conducted by the Institute of Technology and Business in České Budějovice, Czech Republic.


INTRODUCTION
In 2008, the economic crisis spread worldwide from the Unite States.Companies were getting into financial distress and started going bankrupt.Then a domino effect resulted in a situation in which companies and managers in practically all sectors had to deal with the problems.Transportation and shipping was one of the industries in which the economic crisis struck most severely.This was caused, among other factors, by the multifunctional character of this important sector of national economy and also as a result of provable negative synergic impacts of the other industries on transportation and shipping activities.Therefore there was a significant increase of demand from those companies for models capable of predicting the ability of those companies to survive financial distress.
Existing methods, such as Altman Z-score, Neumaier´s s indexes, Tamari model, Taffler index, Grünwald index, Kralicek quick test, have used companies from various industries in an identical manner.They did not discern between production companies, building contractors or transportation and shipping companies.However, the implementation in the transportation sector has shown the necessity to accentuate specific features of sectors in which the companies operate.
Therefore a hypothesis can be formulated that a model developed for transportation and shipping companies will be more accurate and its predictive power will be stronger.

ANALYSIS OF THE CURRENT STATE OF THE RESEARCHED PROBLEM
Traditional statistical methods were for many years widely used for development of one-dimensional discriminant analyses [1], [2], [3] and they made it possible to divide companies into the so-called failing and prospering.A much more widespread statistical method is a multi-dimensional discriminant analysis followed by a logit analysis [4].Numerous classical statistical methods were developed for bankruptcy prediction and they were described particularly by the following authors: Mohd-Sulaiman [1], Garcia-Gallego & Mures-Quintana [5], Teng & Bhatia & Anwar [6], Sun & Huang & He [7], Waszkowski [8], Yazdanfar [9], De Andrés &Landajo& Lorca [10], Lin &Yeh& Lee [11] and Wu & Hsu [4].The methods can be briefly characterized as follows: Recent bankruptcy prediction models were developed using logit analysis, probit analysis and linear probability models (LPM).The methods were used to develop evaluation models of conditional probability [1], [12], consisting of a combination of variables with the best capability to discern between the groups of failing and prospering companies.A non-linear estimate of the maximum probability in a logistic analysis was used to estimate parameters of the following logit-model [13], [14] (Eq.1): (1) where P1 (Xi) = probability of failure with regard to the vectors of attributes Xi; Bj = attribute coefficient j , for j = 1, …, n and B0 = limited section, Xij = attribute values j (for j = 1, …, n) for company i, Di = "logit" for company i.

OBJECTIVE, WORK PROCEDURE AND METHODOLOGY
The objective is to propose a procedure for development of a model to predict survival of transportation and shipping companies.The testing set consisted of transportation companies and was made up of absolute and relative indicators of all companies dealing with transport in the Czech Republic (section H, classification CZ-NACE) in 2003-2012.The set of data was generated from the Albertina database and it contained 12,930 entry lines.Each line contained 150 characteristics of each of the companies (figures from the financial statement, indexes of profitability, activity, liquidity, indebtedness etc.).
The following assumptions were used to devise the new model [15][16][17][18]: -The bankruptcy prediction model will be developed -the dependent variables will be only 0 or 1), -Absolute indicators will be used for the model, -The analyzed data do not have to follow the normal distribution, -The model development will use an iterative process repeated in cycles to ensure its further improvement, -The analyzed group is a representative sample of the investigated population, -The objective is to create a model that is as simple as possible and that sufficiently well explains behavior of the dependent variable.The dependent variable is binary.-The model will feature generalizing properties, -Quantitative variables of discrete nature (e.g.numbers of employees) will be viewed as continual variables (the employed software does not permit any better approach), -Every model is, in its own way and to a smaller or greater extent, insufficient, inaccurate and distorting reality but some models are (or can be) more useful than others.

MATHEMATICAL MODEL OF THE PROBLEM AND ITS SOLUTION
With regard to the characteristic and specific features of transportation and shipping companies, the attention was focused on the so-called binary logistic regression.

MATHEMATICAL MODEL OF THE ASSIGNMENT PROBLEM
In this case, the value of the dependent variable can equal only two values.The dependent binary variable in our context will be defined as follows: with the probabilities Pr(Z=1)=π and Pr(Z=0)=1-π.If we have n of such mutually non-correlating quantities Z 1 , Z 2 ,…,Z n , where Pr(Z j =1)=π for , j=1,2,…,n, then a random quantity Y can be defined, represented by the sum of all n random quantities Z j , j=1,2,…,n.The quantity Y therefore represents the total number of "successes" fromn completed tests.In this case the probability density of Y, i.e. probability density function, can be expressed as follows (Eq.2): We can anticipate even more general situation in which we obtain, from experiments or observations, n of such independent values y 1 , y 2 , …,y n .The values will respectively represent numbers of "successes" inn different groups.If we want to subsequently study relative frequencies of the "successes" in the individual groups, depending on various explanatory variables, we can use generalized linear models and determine probability π i through the model (Eq.3): The symbol x i represents a column vector of explanatory variables for the i th observation and β is a column vector of parameters we look for.The function g(.) then represents the so-called link function [14].The simplest method to predict relative frequency of success consists in the assumption that the so-called canonical link function g(.) is an identical function.This will result in a classical linear model in the form π i =β'x i .However, it is not very suitable and one can use a distribution function where [17], [18] (Eq.4): The probability density function f(s) is called a tolerance distribution.When choosing a suitable tolerance distribution we can formulate e.g.logit or probit models [15].For probability modeling it is convenient to use the logit link function and the resulting model can be then recorded as follows (Eq.5): (5) The goal of logistic regression is to model the conditional mean value of the dependent variable y at certain values of x, i.e. by means of a logistic function.This can be formally expressed as follows (Eq.6): (6) In order to estimate unknown parameters of the logistic model, identified with the symbol of β, it is possible to use the maximum likelihood method [16].The principle of the method can be described as follows.If we consider one pair of measurements obtained by observation or experiment, i.e. (x i ,y i ), then the contribution of information contained in this pair to the likelihood function can be expressed as follows (Eq.7): (7) becausey i ~Bi(1,π(x i )).If we further assume that the individual measurements are mutually independent then the likelihood function can be expressed as a product of all those contributions because the function is essentially a joint probability function, i.e. we will get the following likelihood function (Eq.8):  No. 1; the variables were then selected again and more tests of likelihood ratio followed (see Tables 2 and 3).The resulting model to predict survival of transportation and shipping companies was then formulated as follows (Eq.9): (9) 0.095064582 -0.061965429 × equity capital in thousands CZK i -0.44632997 × registered capital in thousands CZK i -1.062014871 × current liabilities i -0.002490906 × profit/loss from ordinary activities i -0.536248625 × share of receivable on current assets in % i -0.25903599 × payment time of payables from trade in days i -0.095270924 × quick test i + 0.194857318 × index of working capital in % i + 0.205345429 × capital coefficient of added value in % i -0.149379695 × long-term liabilities in percents of liabilities in % i -0.213213898 × current liabilities in percentage of liabilities % i -0.191000612 × longterm credits and loans in percentage of liabilities in % i .
The threshold value 0.523748 was determined based on a sensitivity analyses.Transportation companies with the model value 0.523748 and higher will probably survive potential financial distress.In the opposite case the companies will experience problems and will go bankrupt.

VALIDATION OF THE MODEL ON A SPECIFIC CASE SOLUTION IN PRACTICE
The quality of the model was evaluated with the Hosmer-Lemeshow test, confusion matrix and Receiver Operating Characteristic curve (ROC).Source: authors The following items had the best results of likehood ratio test type 1 (probability less than 0.05): 1. Registered capitalthous.CZK, 2. Payables from trade, 3. Current liabilities total, 4. Financial profit/loss -thous.CZK, 5. Profit/loss from ordinary activities -thous.CZK, 6.Share of receivables on current assets -%, 7. Repayment time of payables from trade -days, 8. Payroll expenses per employee -thous.CZK/month, 9. Quick test, 10.Working capital index -%, 11.Capital coefficient of added value -%, 12.Other assets in % of assets -%, 13.Current liabilities in % of liabilities -%, 14.Bank credits and loans in % of liabilities -%.
The results of the Hosmer-Lemeshow test and the other characteristics indicate suitability and information capability of the proposed model for the transportation and shipping sector.The confusion matrix for training data is provided in Table No. 3. The elements are used to establish relative indicators of effectiveness of the classification.The first, and probably the most important, indicator is the so-called accuracy of classification, which is defined as a ratio of sums of absolute frequencies on the main diagonal to the total number of the classified ones.This can be formally expressed as follows (Eq.10): (10) As implied by the Table No. 3, the ratio in the case of our model is:

Table 3 Confusion matrix for training data
(379+10)/ (379+4+31+10)=0.917453,i.e. 91.7453 % The confusion matrix obtained based on a set of training data provides a slightly deflected estimate of efficiency of the developed model and therefore it is necessary to use a set of validation data.The confusion matrix for the set of validation data is provided in Table No. 4, which shows that the estimated efficiency of classification with our regression model is 90.9 %.

RECEIVER OPERATING CHARACTERISTIC
Another possibility to evaluate predictive power of our model is the so-called Receiver Operating Characteristic (ROC) curve.Predictive power of the model is to a certain extent affected also by the selected threshold value and therefore the threshold shall be selected for which the respective point on the ROC curve is as close as possible to the point [0;1].When looking at the diagram we can conclude that the ROC curve and the confusion matrix indicate the same and that the efficiency of the model is 91.7453 %.

ASSESSING THE BENEFITS OF THE SOLUTIONS FOR TRANSPORT PRACTICE AND SCIENTIFIC KNOWLEDGE
The structure of the proposed model for transportation and shipping companies respects their specific features and differences from other industries of the national economy.This has been reflected in the efficiency of the developed model which is nearly 92 %.In agreement with the achieved results we can conclude that the proposed model is able to predict financial development of transportation and shipping companies in the Czech Republic [16][17][18][19].
To a certain extent the model offers a methodical procedure for its potential implementation in other countries.This is not to say that the model presented herein is applicable worldwide, however, the proposed procedure for its structure definitely is.If predictive power obtained with a model is greater than 50 % we can conclude that the model is valid.The developed methodology can be used to devise a model applicable worldwide for prediction of survival of transportation and shipping companies while taking into account that predictive power of such a model will be significantly limited by the selected set of data.

CONCLUSION
The completed research of specialized literature has shown that no method has been available to quickly predict survival of transportation and shipping companies.The solution respected the formulated hypothesis that a model taking into account specific features and characteristics of transportation and shipping companies will be more accurate and its predictive power will be higher.Our results have shown that the new model is highly efficient, it has high predictive power and therefore it is valid.When developing the model we used binary logistic regression, we analyzed a set of data from Czech transportation and shipping companies and we defined assumptions used for model development.
The selected methodology was used to find a model with high predictive power of nearly 92 % in comparison with the originally established assumption requiring predictive power greater than 50 %.We can nearly certainly conclude that the proposed model is able to predict survival of companies in the Czech transportation and shipping sector.Outputs from the model can be used both by managements of the concerned companies, as high-level indicators for company management, and by other stakeholders, i.e. creditors, owners, competitors and others [18]- [20].
A challenge to be addressed in the future is to develop similar models for individual countries in Europe and worldwide, which need to take into account national, regional and local conditions.However, it would be much more complicated to devise a model applicable worldwide which would overcome specific geographic features of transportation and shipping companies.Such a solution would require quality selection of the data set.
of the data set b) Primary screening of variables c) Model development Logistic regression was used and its results were analyzed by means of significance tests of regression coefficients.Subsequently, the model was evaluated for suitability and predictive power.The tests of likelihood ratio type 1 are shown in Table the confusion matrix -contingency tables have their own names: a)TP -True positive (Observed YES; Predicted YES) number of correctly classified cases of bankruptcy.b) FP -False positive (Observed YES; Predicted NO) number of incorrectly classified cases of bankruptcy.c.)TN -True negative (Observed NO; Predicted NO) number of correctly classified cases, in which the company is OK.d) FN -False negative (Observed NO; Predicted YES) number of incorrectly classified cases, in which the company is OK.

Figure 1
Figure 1 Diagram of the Receiver Operating Characteristic (ROC) curve for the training data set Source: authors

Table 4
Confusion matrix for the validation data