DiscreteChoiceModels.jl: High-performance scalable discrete choice models in Julia

Julia is a relatively new high-level dynamic programming language for numerical computing, with performance approaching C (Bezanson, Edelman, Karpinski, & Shah, 2017). This article introduces DiscreteChoiceModels.jl, a new open-source package for estimating discrete choice models in Julia.

DiscreteChoiceModels.jl is has an intuitive syntax for specifying models, allowing users to directly write out their utility functions. For instance, the code below specifies the Swissmetro example mode-choice distributed with Biogeme (Bierlaire, 2020):

multinomial_logit(
    @utility(begin
        1 ~ αtrain + βtravel_time * TRAIN_TT / 100 + βcost * (TRAIN_CO * (GA == 0)) / 100
        2 ~ αswissmetro + βtravel_time * SM_TT / 100 + βcost * SM_CO * (GA == 0) / 100
        3 ~ αcar + βtravel_time * CAR_TT / 100 + βcost * CAR_CO / 100

        αswissmetro = 0, fixed
    end),
    :CHOICE,
    data,
    availability=[
        1 => :avtr,
        2 => :avsm,
        3 => :avcar,
    ]
)

Within the utility function specification (@utility), the first three lines specify the utility functions for each of the three modes specified by the CHOICE variable: train, car, and the hypothetical Swissmetro. Any variable starting with α or β (easily entered in Julia as \alpha and \beta) is treated as a coefficient to be estimated, while other variables are assumed to be data columns. The final line specifies that the ASC for Swissmetro should have a starting value of 0, and be fixed rather than estimated. The remainder of the model specification indicates that the choice is indicated by the variable CHOICE, what data to use, and, optionally, what columns indicate availability for each alternative.

Features

DiscreteChoiceModels.jl currently supports estimating multinomial logit models; support for nested and mixed logit models, as well as prediction, is forthcoming. All optimization methods in Optim.jl (Mogensen & Riseth, 2018) are supported, including BFGS (the default), BHHH, Newton’s method, and Gradient Descent. Derivatives for optimization and for computation of variance-covariance matrices are exactly calculated using automatic differentiation (Revels, Lubin, & Papamarkou, 2016), providing both performance and accuracy improvements over finite-difference approximations. Data can be read using either DataFrames.jl (most common), or Dagger, which provides the ability to scale model estimation across multiple nodes in a compute cluster. Both backends allow scaling across cores within a single machine.

To help ensure algorithm correctness, DiscreteChoiceModels.jl has an automated test suite that compares estimation results against ground-truth results for the same models from other software. This test suite is run automatically on each change to the DiscreteChoiceModels.jl source code.

Performance

Julia is designed for high-performance computing, so a major goal of DiscreteChoiceModels.jl is to estimate models more quickly than other modeling packages. To that end, two models were developed and benchmarked using three packages—DiscreteChoiceModels.jl, Biogeme (Bierlaire, 2020), and Apollo (Hess & Palma, 2019), using default settings for all three packages. The first model is the Swissmetro example from Biogeme, with 6,768 observations, 3 alternatives, and 4 free parameters. The second is a vehicle ownership model using the 2017 US National Household Travel Survey, with 129,696 observations, 5 alternatives, and 35 free parameters. All runtimes are the median of 10 runs, and executed serially on a lightly-loaded circa-2014 quad-core Intel i7 with 16GB of RAM, running Debian Linux 11.1. DiscreteChoiceModels.jl outperforms the other packages when used with a DataFrame; using Dagger is slower due to the overhead of using a distributed computing system for a small model on a single machine.

Model DiscreteChoiceModels.jl: DataFrame DiscreteChoiceModels.jl: Dagger Biogeme Apollo
Swissmetro 188ms 2047ms 252ms 824ms
Vehicle ownership 35.1s 46.9s 163.4s 227.2s

Table 1: Comparison of model runtimes from DiscreteChoiceModels.jl and other packages. Julia runtimes include time to interpret the model specification, but not time to compile the DiscreteChoiceModels.jl package.

Scalability

For extremely large models, a single machine may not be powerful enough to estimate the model, either due to RAM or processing constraints. Using the Dagger backend and Julia’s built-in distributed computing capabilities, it is possible to scale model estimation across multiple nodes in a compute cluster. This is expected to be especially valuable for computationally-intensive mixed logit models.

Others have implemented this by modifications to the optimization algorithm (Gopal & Yang, 2013; Shi, Wang, & Zhang, 2019). DiscreteChoiceModels.jl takes a simpler approach. Data are divided into chunks for each node in the cluster. For a given set of parameters, the log-likelihood of each chunk is computed. These are transmitted back to the main node where they are summed to produce the overall log-likelihood. This approach was also used by Zwaenepoel & Van de Peer (2019) in a model of gene duplication in tree species.

References

Bezanson, J., Edelman, A., Karpinski, S., & Shah, V. B. (2017). Julia: A Fresh Approach to Numerical Computing. SIAM Review, 59(1), 65–98. https://doi.org/10.1137/141000671
Bierlaire, M. (2020). A short introduction to PandasBiogeme (No. TRANSP-OR 200605; p. 22). Lausanne: Ecole Poltechnique Fédérale de Lausanne. Retrieved from Ecole Poltechnique Fédérale de Lausanne website: https://transp-or.epfl.ch/documents/technicalReports/Bier20.pdf
Gopal, S., & Yang, Y. (2013). Distributed training of Large-scale Logistic models. 9.
Hess, S., & Palma, D. (2019). Apollo: A flexible, powerful and customisable freeware package for choice model estimation and application. Journal of Choice Modelling, 32, 100170. https://doi.org/10.1016/j.jocm.2019.100170
Mogensen, P. K., & Riseth, A. N. (2018). Optim: A mathematical optimization package for Julia. Journal of Open Source Software, 3(24), 615. https://doi.org/10.21105/joss.00615
Revels, J., Lubin, M., & Papamarkou, T. (2016). Forward-mode automatic differentiation in Julia. arXiv:1607.07892 [Cs.MS]. Retrieved from https://arxiv.org/abs/1607.07892
Shi, P., Wang, P., & Zhang, H. (2019). Distributed Logistic Regression for Separated Massive Data. In H. Jin, X. Lin, X. Cheng, X. Shi, N. Xiao, & Y. Huang (Eds.), Big Data (pp. 285–296). Singapore: Springer. https://doi.org/10.1007/978-981-15-1899-7_20
Zwaenepoel, A., & Van de Peer, Y. (2019). Inference of Ancient Whole-Genome Duplications and the Evolution of Gene Duplication and Loss Rates. Molecular Biology and Evolution, 36(7), 1384–1404. https://doi.org/10.1093/molbev/msz088