{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b1079054",
   "metadata": {},
   "outputs": [],
   "source": [
    "import emat, numpy, pandas\n",
    "emat.versions()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "58840996",
   "metadata": {},
   "source": [
    "# Optimization Tools"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3c685552",
   "metadata": {},
   "source": [
    "Typically, transportation policy planning models will be used to\n",
    "try to find policies that provide the \"best\" outcomes.  In a traditional\n",
    "analytical environment, that typically means using models to find \n",
    "optimal outcomes for performance measures.\n",
    "\n",
    "Transportation models as used in the TMIP-EMAT framework are \n",
    "generally characterized by two important features: they are \n",
    "subject to significant exogenous uncertainties about the future\n",
    "state of the world, and they include numerous performance measures\n",
    "for which decision makers would like to see good outcomes. Therefore,\n",
    "optimization tools applied to these models should be \n",
    "flexible to consider *multiple objectives*,\n",
    "as well as be *robust* against uncertainty."
   ]
  },
  {
   "cell_type": "raw",
   "id": "198f71c9",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. _methodology-multiobjective-optimization:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "838ecff8",
   "metadata": {},
   "source": [
    "## Multi-Objective Optimization"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36a74489",
   "metadata": {},
   "source": [
    "With exploratory modeling, optimization is also often undertaken as a \n",
    "[multi-objective optimization](https://en.wikipedia.org/wiki/Multi-objective_optimization) exercise, where multiple\n",
    "and possibly conflicting performance measures need to be addressed simultaneously. \n",
    "A road capacity expansion project is a good example of a multi-objective optimization \n",
    "problem. In such a situation, we want to expand the capacity of a roadway, both minimizing\n",
    "the costs and maximizing the travel time benefits.  A smaller expansion project will cost less\n",
    "but also provide lesser benefits.  Funding it with variable rate debt might \n",
    "decrease expected future costs but doing so entails more risk than fixed-rate debt."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c8defda",
   "metadata": {},
   "source": [
    "One approach to managing a multi-objective optimization problem is to distill it \n",
    "into a single objective problem, by assigning relative weights to the various \n",
    "objectives.  For a variety of reasons, this can be difficult to accomplish in public policy environments that are common in transportation planning. \n",
    "Multiple stakeholders may have different\n",
    "priorities and may not be able to agree on a relative weighting structure.  \n",
    "Certain small improvements in a performance measure may be valued very differently \n",
    "if they tip the measure over a regulated threshold (e.g. to attain a particular \n",
    "mandated level of emissions or air quality).\n",
    "\n",
    "Instead of trying to simplify a multi-objective into a simple-objective one,\n",
    "an alternate approach is to preserve the multi-objective nature of the problem\n",
    "and find a set or spectrum of different solutions, each of which solves the problem\n",
    "at a different weighting of the various objectives.  Decision makers can then\n",
    "review the various different solutions, and make judgements about the various\n",
    "trade-offs implicit in choosing one path over another.\n",
    "\n",
    "Within a set of solutions for this kind of problem, each individual solution is \n",
    "\"[Pareto optimal](https://en.wikipedia.org/wiki/Pareto_efficiency)\", such that\n",
    "no individual objective can be improved without degrading at least one other\n",
    "objective by some amount.  Thus, each of these solutions might be the \"best\"\n",
    "policy to adopt, and exactly which is the best is left as a subjective judgement\n",
    "to decision makers, instead of being a concretely objective evaluation based \n",
    "on mathematics alone."
   ]
  },
  {
   "cell_type": "raw",
   "id": "189f31ea",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. _methodology-robust-optimization:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c821821b",
   "metadata": {},
   "source": [
    "## Robust Optimization"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07efd8d3",
   "metadata": {},
   "source": [
    "Robust optimization is a variant of the more traditional optimization\n",
    "problem, where we will try to find policies that yield *good* outcomes\n",
    "across a range of possible futures, instead of trying to find a policy\n",
    "that delivers the *best* outcome for a particular future.\n",
    "\n",
    "To conceptualize this, let us consider a decision where there are four\n",
    "possible policies to choose among, a single exogenous uncertainty that\n",
    "will impact the future, and a single performance measure that we would\n",
    "like to minimize.  We have a model that can forecast the performance \n",
    "measure, conditional on the chosen policy and the future value of the\n",
    "exogenous uncertainty, and which gives us a forecasts as shown below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aeb5ce5c",
   "metadata": {
    "nbsphinx": "hidden"
   },
   "outputs": [],
   "source": [
    "import numpy\n",
    "from matplotlib import pyplot as plt\n",
    "x = numpy.linspace(0,1)\n",
    "y1 = x**3 + numpy.cos((x-3)*23)*.01\n",
    "y2 = x**3 + .1 + numpy.sin((x-3)*23)*.01\n",
    "y3 = 1.3*(1-x)**17 + numpy.cos((x-3)*23)*.01 + .1\n",
    "y4 = numpy.sin((x-3)*23)*.01+0.16 + .1*(x-0.5)\n",
    "\n",
    "linestyles = [\n",
    "    dict(ls='-', c='blue'),\n",
    "    dict(ls=':', c='orange'),\n",
    "    dict(ls='-.', c='green'),\n",
    "    dict(ls='--', c='red'),\n",
    "]\n",
    "\n",
    "fig, ax = plt.subplots(1,1)\n",
    "ax.plot(x, y1, **linestyles[0], label=\"Policy 1\")\n",
    "ax.plot(x, y2, **linestyles[1], label=\"Policy 2\")\n",
    "ax.plot(x, y3, **linestyles[2], label=\"Policy 3\")\n",
    "ax.plot(x, y4, **linestyles[3], label=\"Policy 4\")\n",
    "ax.set_ylabel(\"Performance Measure\\n← Lower is Better ←\")\n",
    "ax.set_xlabel(\"Exogenous Uncertainty\")\n",
    "ax.legend()\n",
    "ax.set_title(\"Example Simple Forecasting Experiment\")\n",
    "\n",
    "plt.savefig(\"robust_example.png\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "raw",
   "id": "77d7ce18",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. image:: robust_example.png"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43f28d31",
   "metadata": {},
   "source": [
    "In a naive optimization approach, if we want to minimize the performance\n",
    "measure, we can do so by selecting Policy 1 and setting the exogenous\n",
    "uncertainty to 0.1.  Of course, in application we are able to select\n",
    "Policy 1, but we are unable to actually control the exogenous uncertainty\n",
    "(hence, \"exogenous\") and we may very well end up with a very bad result\n",
    "on the right side of the figure.\n",
    "\n",
    "We can see from the figure that, depending on the ultimate value for \n",
    "the exogenous uncertainty, either Policy 1 or Policy 3 might yield the\n",
    "best possible value of the performance measure.  However, both of these\n",
    "policies come with substantial risks as well -- in each Policy there are\n",
    "some futures where the results are optimal, but there are also some \n",
    "futures where the results are exceptionally poor.\n",
    "\n",
    "In contrast with these optimal policies, Policy 4 may be considered a \n",
    "\"robust\" solution.  Although there is no value of the exogenous \n",
    "uncertainty where Policy 4 yields the best possible outcome, there is\n",
    "also no future where Policy 4 yields a very poor outcome.  Instead, across\n",
    "all futures it is always generating a \"pretty good\" outcome.\n",
    "\n",
    "Different expectations for the future may lead to different policy \n",
    "choices.  If the decision maker feels that low values of the exogenous\n",
    "uncertainty are much more likely than high values, Policy 1 might be\n",
    "the best policy to choose.  If high values of the exogenous uncertainty\n",
    "are expected, then Policy 3 might be the best choice.  If there is not\n",
    "much agreement on the probable future values of the exogenous uncertainty,\n",
    "or if decision makers want to adopt a risk-averse stance, then Policy 4\n",
    "might be the best choice.  \n",
    "\n",
    "The remaining policy shown in the figure, Policy 2, is the lone holdout \n",
    "in this example -- there is no set of expectations about the future, or\n",
    "attitudes toward risk, that can make this policy the best choice.  This\n",
    "is because, no matter what the future value of the exogenous uncertainty\n",
    "may be, the performance measure has a better outcome from Policy 1 than\n",
    "from Policy 2.  In this circumstance, we can say that Policy 2 is \n",
    "\"dominated\" and should never be chosen by decision makers."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07fdd73e",
   "metadata": {},
   "source": [
    "### Robustness Functions"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "643c0364",
   "metadata": {},
   "source": [
    "To perform robust optimization in EMAT, we need a core model (or meta-model)\n",
    "with defined input and output parameters, as well a set of functions called\n",
    "\"robustness measures\" that define what a \"robust\" measure represents.  \n",
    "As noted above, different expectations about future states of the world\n",
    "can lead to different rankings of policy options.  In addition, different\n",
    "attitudes towards risk can result in different robustness measures that are\n",
    "derived from the same underlying modeled performance measures.  \n",
    "\n",
    "For example, consider the *Example Simple Forecasting Experiment* shown above.\n",
    "In this example, we could compute a robustness measure for each policy where\n",
    "we calculate the maximum value of the performance measure across any possible\n",
    "value of the exogenous uncertainty.  This value is shown in the first column\n",
    "of robustness measures in the figure below, and under this measure Policy 4 is\n",
    "far and away the best choice. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "606c92d7",
   "metadata": {
    "nbsphinx": "hidden"
   },
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(\n",
    "    1,6, sharey=True, \n",
    "    gridspec_kw=dict(width_ratios=[7,1,1,1,1,1], wspace=0.05,), \n",
    "    figsize=[8, 4.],\n",
    ")\n",
    "\n",
    "\n",
    "ax[0].plot(x, y1, **linestyles[0], label=\"Policy 1\")\n",
    "ax[0].plot(x, y2, **linestyles[1], label=\"Policy 2\")\n",
    "ax[0].plot(x, y3, **linestyles[2], label=\"Policy 3\")\n",
    "ax[0].plot(x, y4, **linestyles[3], label=\"Policy 4\")\n",
    "ax[1].plot([0,1], [y1.max()]*2, **linestyles[0], lw=2)\n",
    "ax[1].plot([0,1], [y2.max()]*2, **linestyles[1], lw=2)\n",
    "ax[1].plot([0,1],[y3.max()]*2, **linestyles[2],  lw=2)\n",
    "ax[1].plot([0,1], [y4.max()]*2, **linestyles[3], lw=2)\n",
    "\n",
    "ax[2].plot([0,1], [y1.mean()]*2, **linestyles[0], lw=2)\n",
    "ax[2].plot([0,1], [y2.mean()]*2, **linestyles[1], lw=2)\n",
    "ax[2].plot([0,1],[y3.mean()]*2, **linestyles[2],  lw=2)\n",
    "ax[2].plot([0,1], [y4.mean()]*2, **linestyles[3], lw=2)\n",
    "\n",
    "ax[3].plot([0,1], [numpy.median(y1)]*2, **linestyles[0], lw=2)\n",
    "ax[3].plot([0,1], [numpy.median(y2)]*2, **linestyles[1], lw=2)\n",
    "ax[3].plot([0,1], [numpy.median(y3)]*2, **linestyles[2],  lw=2)\n",
    "ax[3].plot([0,1], [numpy.median(y4)]*2, **linestyles[3], lw=2)\n",
    "\n",
    "ax[4].plot([0,1], [numpy.percentile(y1, 90)]*2, **linestyles[0], lw=2)\n",
    "ax[4].plot([0,1], [numpy.percentile(y2, 90)]*2, **linestyles[1], lw=2)\n",
    "ax[4].plot([0,1], [numpy.percentile(y3, 90)]*2, **linestyles[2],  lw=2)\n",
    "ax[4].plot([0,1], [numpy.percentile(y4, 90)]*2, **linestyles[3], lw=2)\n",
    "\n",
    "ax[5].plot([0,1], [y1.min()]*2, **linestyles[0], lw=2)\n",
    "ax[5].plot([0,1], [y2.min()]*2, **linestyles[1], lw=2)\n",
    "ax[5].plot([0,1], [y3.min()]*2, **linestyles[2], lw=2)\n",
    "ax[5].plot([0,1], [y4.min()]*2, **linestyles[3], lw=2)\n",
    "\n",
    "\n",
    "ax[0].set_ylabel(\"Performance Measure\\n← Lower is Better ←\")\n",
    "ax[0].set_xlabel(\"Exogenous Uncertainty\")\n",
    "ax[0].legend()\n",
    "ax[0].set_title(\"Example Simple Forecasting Experiment\")\n",
    "ax[3].set_title(\"Robustness Measures\")\n",
    "ax[1].set_xlabel(\"Max\\nPM\")\n",
    "ax[2].set_xlabel(\"Mean\\nPM\")\n",
    "ax[3].set_xlabel(\"Median\\nPM\")\n",
    "ax[4].set_xlabel(\"90%ile\\nPM\")\n",
    "ax[5].set_xlabel(\"Min\\nPM\")\n",
    "ax[-1].yaxis.set_label_position(\"right\")\n",
    "for a in [1,2,3,4,5]:\n",
    "    ax[a].set_xticks([])\n",
    "    ax[a].set_yticks([])\n",
    "plt.savefig(\"robust_measures.png\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "raw",
   "id": "03b8a77d",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. image:: robust_measures.png"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c22e3d3d",
   "metadata": {},
   "source": [
    "The \"maximum performance measure result\" robustness function is a very\n",
    "risk averse approach, as no consideration is given to the shape or distribution\n",
    "of performance measure values other than the maximum.  Consider these same\n",
    "policies shown if Policy 4 is not available. In this case, the next best \n",
    "policy under this robustness function is Policy 1, as it has the next lowest\n",
    "maximum value.  However, when we look at\n",
    "a comparison between Policies 1 and 3 in aggregate, we might easily conclude\n",
    "that Policy 3 is a better choice overall: it is better than Policy 1 on average,\n",
    "as judged by the mean, median, and 90th percentile measures.  The only reason \n",
    "Policy 3 appears worse than Policy 1 on the initial robustness function is that \n",
    "it has an especially poor outcome at one extreme end of the uncertainty \n",
    "distribution.  Depending on our attitude towards risk on this performance measure,\n",
    "we may want to consider using some of these alternative robustness functions.\n",
    "An additional consideration here is that the various robustness measures\n",
    "in this example case are all unweighted measures: they implicitly assume a \n",
    "uniform probability distribution for the entire range of possible values for\n",
    "the exogenous uncertainty.  If we are able to develop a probability distribution\n",
    "on our expectations for the future values of the exogenous uncertainties,\n",
    "we can use that probability distribution to weight the robustness functions\n",
    "appropriately, creating more meaningful values, particularly for the non-extreme\n",
    "value robustness functions (i.e., everything except the min and max)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e94d82dc",
   "metadata": {},
   "source": [
    "# Mechanics of Using Optimization"
   ]
  },
  {
   "cell_type": "raw",
   "id": "ca310cde",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. _methodology-search-over-levers:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c23b8a58",
   "metadata": {},
   "source": [
    "## Policy Optimization: Search over Levers"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b39c3e67",
   "metadata": {},
   "source": [
    "The simplest optimization tool available for TMIP-EMAT users is \n",
    "a *search over policy levers*, which represents multi-objective\n",
    "optimization, manipulating policy lever values to find a Pareto\n",
    "optimal set of solutions, holding the exogenous uncertainties\n",
    "fixed at a particular value for each uncertainty (typically at \n",
    "the default values).  This is often a useful first step\n",
    "in exploratory analysis, even if your ultimate goal is to \n",
    "eventually undertake a robust optimization analysis.  This less\n",
    "complex optimization can give insights into tradeoffs between\n",
    "performance measures and reasonable combinations of policy levers.\n",
    "\n",
    "To demonstrate a search over levers, we'll use the Road Test \n",
    "example model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "97b3598e",
   "metadata": {},
   "outputs": [],
   "source": [
    "import emat.examples\n",
    "scope, db, model = emat.examples.road_test()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "787097f9",
   "metadata": {},
   "source": [
    "The scope defined for a model in TMIP-EMAT will already provide\n",
    "information about the preferred directionality of performance\n",
    "measures (i.e., 'maximize' when larger values are better,\n",
    "'minimize' when smaller values are better, or 'info' when we\n",
    "do not have a preference for bigger or smaller values, but we \n",
    "just want to be tracking the measure).  We can see these\n",
    "preferences for any particular performance measure by inspecting\n",
    "the scope definition file, or by using the `info` method of\n",
    "`emat.Measure` instances."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "65833963",
   "metadata": {},
   "outputs": [],
   "source": [
    "scope['net_benefits'].info()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2e9066f6",
   "metadata": {},
   "source": [
    "To conduct an optimization search over levers, we'll use the \n",
    "`optimize` method of the TMIP-EMAT model class, setting the\n",
    "`search_over` argument to `'levers'`.  In this example, we will\n",
    "set the number of function evaluations (`nfe`) to 10,000, although\n",
    "for other models you may need more or fewer to achieve a good, \n",
    "well converged result.  In a Jupyter notebook environment, \n",
    "we can monitor convergence visually in real time in the figures\n",
    "that will appear automatically."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3cbe3632",
   "metadata": {},
   "outputs": [],
   "source": [
    "result = model.optimize(\n",
    "    nfe=10_000, \n",
    "    searchover='levers', \n",
    "    check_extremes=1,\n",
    "    cache_file='./optimization_cache/road_test_search_over_levers.gz',\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "96fe5d7a",
   "metadata": {},
   "source": [
    "The `optimize` method returns an `OptimizationResult` object, \n",
    "which contains the resulting solutions, as well as some information\n",
    "about how they were derived.  We can review the raw values of the\n",
    "solutions as a pandas DataFrame, or see the scenario values used \n",
    "to generate these solutions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cf89e497",
   "metadata": {},
   "outputs": [],
   "source": [
    "result.result.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d2f9e63d",
   "metadata": {},
   "outputs": [],
   "source": [
    "result.scenario"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3e6d611d",
   "metadata": {},
   "source": [
    "We can visualize the set of solutions using a\n",
    "[parallel coordinates](https://en.wikipedia.org/wiki/Parallel_coordinates) \n",
    "plot.  This figure is composed of a number of vertical axes, one for each \n",
    "column of data in the results DataFrame.  Each row of the DataFrame is\n",
    "represented by a chord that connects across each of the vertical axes.\n",
    "\n",
    "By default, the axes representing performance measures to be minimized are\n",
    "inverted in the parallel coordinates, such that moving up along any \n",
    "performance measure axis results in a \"better\" outcome for that performance\n",
    "measure.\n",
    "\n",
    "In the figure below, we can quickly see that pretty much all the all of the Pareto\n",
    "optimal policy solutions for our reference scenario share an amortization\n",
    "period of 50 years, and all share a debt type of 'Paygo'.  By contrast, the set of \n",
    "solutions include multiple different values for the expand capacity lever,\n",
    "ranging from 0 to 100.  These different values offer possible tradeoffs \n",
    "among the performance measures: lower levels of capacity expansion (shown\n",
    "in yellow) will maximize net benefits and minimize the cost of the project,\n",
    "but they will also fail to provide much travel time savings.  Conversely,\n",
    "larger levels of capacity expansion will provide better travel time savings, \n",
    "but will not perform as well on costs.  It is left up to the analysts and\n",
    "decision makers to judge what tradeoffs to make between these conflicting\n",
    "goals."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3aa9b65a",
   "metadata": {},
   "outputs": [],
   "source": [
    "q=result.par_coords()\n",
    "q"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3bf643ed",
   "metadata": {},
   "source": [
    "As noted above, nearly all but not exactly all of the identified Pareto optimal\n",
    "solutions in this figure share an amortization period of 50 years.  We can review a table \n",
    "of a subset of particular solutions using the `query` method of a pandas DataFrame.  \n",
    "In this example, we may want to see a table of the instances where the amortization period \n",
    "is not 50 years."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e5f04d1a",
   "metadata": {},
   "outputs": [],
   "source": [
    "result.result.query(\"amortization_period != 50\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce963c92",
   "metadata": {},
   "source": [
    "The first row in this table shows a particular edge case: when the capacity expansion is\n",
    "exactly zero, all of the remaining policy levers have no effect -- the details of debt\n",
    "financing are irrelevant when there is no debt at all, and thus no values of the other\n",
    "levers result in a Pareto-optimal solution that would dominate any other such solution.\n",
    "In cases such as these, an analyst with domain knowledge, who understands the underlying system being modeled,\n",
    "will be able to bring a more nuanced understanding of the results than can be achieved\n",
    "merely by applying the mathematical algorithms in TMIP-EMAT, and correctly infer\n",
    "that the 50 year amortization is always an optimal solution, and that the outlier\n",
    "solutions are not important."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a21b41e9",
   "metadata": {},
   "source": [
    "Lastly, a note on the interpretation of the visualization of parallel coordinates\n",
    "plots: for numerical parameters and measures, the range of values shown in each \n",
    "vertical axis of plot is determined not by the full range of possible values, but instead\n",
    "it only displays the range of values included in the solutions being visualized.\n",
    "For example, the `amortization_period` axis only shows values between 47 and 50, even\n",
    "though the actual range of values is defined in the scope to be between 15 and 50 years.\n",
    "Similarly, the range of values for `net_benefits` is between 0.08 and -194.18.\n",
    "Because the solutions being displayed are optimal, the top value of 0.08 is \n",
    "(barring numerical problems) the best value of `net_benefits` that might be obtained,\n",
    "but the bottom value of -194.18 is by no means the worst possible `net_benefits`\n",
    "outcome that could arise from various different policies.  Instead, this bottom\n",
    "value is not a worst outcome but *also an optimal value*, except it is conditional \n",
    "on also achieving some particular other desirable outcome.  In the example shown, this \n",
    "other desirable outcome is a high level of travel time savings.  It is left entirely\n",
    "up to the analyst and policy makers to judge whether this outcome is \"bad\" or not,\n",
    "relative to the other possible outcomes."
   ]
  },
  {
   "cell_type": "raw",
   "id": "d6024ecf",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. _methodology-worst-case-discovery:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "79b2c8f8",
   "metadata": {},
   "source": [
    "## Worst Case Discovery: Search over Uncertainties"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9089526b",
   "metadata": {},
   "source": [
    "We can apply the same multiobjective optimization tool in reverse to \n",
    "study the worst case outcomes from any particular set of policy lever\n",
    "settings.  To do so, we switch out the `searchover` argument from\n",
    "`'levers'` to `'uncertainties'`, and set `reverse_targets` to True,\n",
    "which will tell the optimization engine to search for the worst outcomes\n",
    "instead of the best ones.  We'll often also want to override the reference\n",
    "policy values with selected particular values, although it's possible\n",
    "to omit this reference argument to search for the worst case scenarios\n",
    "under the default policy lever settings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d44e2128",
   "metadata": {},
   "outputs": [],
   "source": [
    "worst = model.optimize(\n",
    "    nfe=10_000, \n",
    "    searchover='uncertainties', \n",
    "    reverse_targets = True,\n",
    "    check_extremes=1,\n",
    "    cache_file='./optimization_cache/road_test_search_over_uncs.gz',\n",
    "    reference={\n",
    "        'expand_capacity': 100.0, \n",
    "        'amortization_period': 50, \n",
    "        'debt_type': 'PayGo', \n",
    "        'interest_rate_lock': False,\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1bdc7a2a",
   "metadata": {},
   "outputs": [],
   "source": [
    "worst.par_coords()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c473756",
   "metadata": {},
   "source": [
    "## Using Robust Optimization"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dde3d896",
   "metadata": {},
   "source": [
    "As discussed above, implementing robust optimization requires the\n",
    "definition of relevant robustness functions.  Because the functional\n",
    "form of these functions can be so many different things depending on\n",
    "the particular application, TMIP-EMAT does not implement a mechanism\n",
    "to generate them automatically.  Instead, it is left to the analyst to\n",
    "develop a set of robustness functions that are appropriate for each\n",
    "application.\n",
    "\n",
    "Each robustness function represents an aggregate performance measure,\n",
    "calculated based on the results from a group of individual runs of the\n",
    "underlying core model (or meta-model).  These runs are conducted with\n",
    "a common set of policy lever settings, and a variety of different \n",
    "exogenous uncertainty scenarios.  This allows us to create aggregate \n",
    "measures that encapsulate information from the distribution of possible\n",
    "outcomes, instead of for just one particular future scenario.\n",
    "\n",
    "A robust measure is created in TMIP-EMAT using the same `Measure` class\n",
    "used for performance measures that are direct model outputs.  Like any \n",
    "other measure, they have a `name` and `kind` (minimize, maximize, or info).\n",
    "The names used for robust measures must be unique new names that are \n",
    "not otherwise used in the model's scope, so you cannot use the same name\n",
    "as an existing performance measure. Instead, the names can (and usually \n",
    "should) be descriptive variants of the existing performance measures. \n",
    "For example, if an existing performance measure is `'net_benefits'`,\n",
    "you can name a robust measure `'min_net_benefits'`.\n",
    "\n",
    "In addition to the `name` and `kind`, robust measures have two important\n",
    "additional attributes: a `variable_name`, which names the underlying\n",
    "performance measure upon which this robust measure is based, and a\n",
    "`function` that describes how to aggregate the results.  The function\n",
    "should be a callable function, which accepts an array of performance\n",
    "measure values as its single argument, and returns a single numeric\n",
    "value that is the robust measure.  For example, the code below will \n",
    "create a robust measure that represents the minimum net benefit across \n",
    "all exogenous uncertainty scenarios."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4b4c2d03",
   "metadata": {},
   "outputs": [],
   "source": [
    "from emat import Measure\n",
    "\n",
    "minimum_net_benefit = Measure(\n",
    "    name='Minimum Net Benefits',\n",
    "    kind=Measure.MAXIMIZE,\n",
    "    variable_name='net_benefits',\n",
    "    function=min,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4380beef",
   "metadata": {},
   "source": [
    "As suggested earlier, this measure might be too sensitive to outliers\n",
    "in the set of exogenous uncertainty scenarios.  We can address this \n",
    "by creating a different robust measure, based on the same underlying\n",
    "performance measure, but which is based on the mean instead of the \n",
    "minimum value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "65c1242a",
   "metadata": {},
   "outputs": [],
   "source": [
    "expected_net_benefit = Measure(\n",
    "    name='Mean Net Benefits',\n",
    "    kind=Measure.MAXIMIZE,\n",
    "    variable_name='net_benefits',\n",
    "    function=numpy.mean,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ad925c7",
   "metadata": {},
   "source": [
    "Or we can adopt an intermediate approach, focusing on the 5th percentile\n",
    "instead of the minimum, which avoids being overly sensitive to the \n",
    "most extreme tail ends of the distribution, but maintains a fairly \n",
    "risk-averse robustness approach.\n",
    "\n",
    "Note that normally, the `numpy.percentile` function requires two arguments\n",
    "instead of one: the array of values, and the target percentile value.\n",
    "Since the `function` of the robust measure needs to accept only a single \n",
    "argument, we can inject the `q=5` argument here using \n",
    "[functools.partial](https://docs.python.org/3/library/functools.html#functools.partial)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8ff008b3",
   "metadata": {},
   "outputs": [],
   "source": [
    "import functools\n",
    "\n",
    "pct5_net_benefit = Measure(\n",
    "    '5%ile Net Benefits',\n",
    "    kind = Measure.MAXIMIZE,\n",
    "    variable_name = 'net_benefits', \n",
    "    function = functools.partial(numpy.percentile, q=5), \n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8657284f",
   "metadata": {},
   "source": [
    "We can also capture robustness measures that are not statistical versions\n",
    "of the performance measure (that can be contrasted directly with the \n",
    "performance measure outputs, like the mean or median), but rather more abstract \n",
    "measures, like the percentage of scenarios where the performance measure \n",
    "meets some target value.  For example, we can compute the percentage of scenarios\n",
    "for the road test example where the net benefits are negative.  To do so,\n",
    "we will use the `percentileofscore` function from the `scipy.stats` package.\n",
    "For this function, set the `kind` argument to `'strict'` to count only strictly\n",
    "negative results -- not scenarios where the net benefits are exactly zero -- or\n",
    "to `'weak'` to count all non-positive results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "84e3e97f",
   "metadata": {},
   "outputs": [],
   "source": [
    "from scipy.stats import percentileofscore\n",
    "\n",
    "neg_net_benefit = Measure(\n",
    "    'Possibility of Negative Net Benefits',\n",
    "    kind = Measure.MINIMIZE,\n",
    "    variable_name = 'net_benefits', \n",
    "    function = functools.partial(percentileofscore, score=0, kind='strict'), \n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cdc8c886",
   "metadata": {},
   "source": [
    "We can of course also create robust measures based on other \n",
    "performance measures in the core model.  For example, in the \n",
    "Road Test model the total cost of the capacity expansion is\n",
    "subject to some uncertainty, and we may want to make policy\n",
    "choices not just to maximize net benefits but also trying to\n",
    "keep costs in check."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c2d750c",
   "metadata": {},
   "outputs": [],
   "source": [
    "pct95_cost = Measure(\n",
    "    '95%ile Capacity Expansion Cost',\n",
    "    kind = Measure.MINIMIZE,\n",
    "    variable_name = 'cost_of_capacity_expansion', \n",
    "    function = functools.partial(numpy.percentile, q = 95), \n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f803f45",
   "metadata": {},
   "source": [
    "We may also be interested in finding policies that will maximize the expected time savings.  \n",
    "Although we can easily conceptualize that these two goals are in opposition (increasing time\n",
    "savings pretty obviously goes hand in hand with increasing cost) we will be able to use\n",
    "the results of this robust optimization to visualize the tradeoffs and try to find an \n",
    "appropriate balance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "298bf701",
   "metadata": {},
   "outputs": [],
   "source": [
    "expected_time_savings = Measure(\n",
    "    'Expected Time Savings',\n",
    "    kind = Measure.MAXIMIZE, \n",
    "    variable_name = 'time_savings', \n",
    "    function = numpy.mean,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "38732119",
   "metadata": {},
   "outputs": [],
   "source": [
    "from emat.util.distributed import get_client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5dedf5e8",
   "metadata": {},
   "outputs": [],
   "source": [
    "robust_result = model.robust_optimize(\n",
    "    robustness_functions=[\n",
    "        expected_net_benefit,\n",
    "        pct5_net_benefit,\n",
    "        neg_net_benefit,\n",
    "        pct95_cost,\n",
    "        expected_time_savings,\n",
    "    ],\n",
    "    scenarios=250,\n",
    "    nfe=25_000,\n",
    "    check_extremes=1,\n",
    "    evaluator=get_client(),\n",
    "    cache_file='./optimization_cache/road_test_robust_search.gz',\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "819a7a77",
   "metadata": {},
   "outputs": [],
   "source": [
    "robust_result.par_coords()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "71a852c7",
   "metadata": {},
   "source": [
    "### Constraints\n",
    "\n",
    "The robust optimization process can be constrained to only include \n",
    "solutions that satisfy certain constraints. These constraints can \n",
    "be based on the policy lever parameters that are contained in the \n",
    "core model, the aggregate performance measures identified in the \n",
    "list of robustness functions, or some combination of levers and \n",
    "aggregate measures.  Importantly, the constraints *cannot* be imposed \n",
    "on the exogenous uncertainties, nor directly on the output measures \n",
    "from the core models (or the equivalent meta-model). This is because \n",
    "the robust version of the model aggregates a number of individual \n",
    "core model runs, and effectively hides these two components from \n",
    "the optimization engine.  \n",
    "\n",
    "One way to work around this limitation, at least on the output measures, \n",
    "is to write robustness functions that transmit the relevant output \n",
    "measures through the aggregation process. For example, to constrain \n",
    "the robust search only to instances where a particular output measure \n",
    "is always positive, then write a robustness function that takes the \n",
    "*minimum* value of the targeted performance measure, and write a \n",
    "constraint that ensures that the minimum value is always positive.  \n",
    "This approach should be used with caution, however, as it may severely \n",
    "limit the search space.\n",
    "\n",
    "For the road test example, we can define some constraints to consider \n",
    "solutions that are within the limited search space. To do so, we will use the \n",
    "`Constraint` class.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a392b0fe",
   "metadata": {},
   "outputs": [],
   "source": [
    "from emat import Constraint"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "023faf02",
   "metadata": {},
   "source": [
    "Each `Constraint` needs to have a unique name\n",
    "(i.e. not the same as anything else in the scope or any robust\n",
    "measure).  Each `Constraint` is also defined by one or more \n",
    "`parameter_names` and/or `outcome_names`, plus a `function` that will be used to \n",
    "determine whether the constraint is violated.  The `function`\n",
    "should accept positional values for each of the `parameter_names` \n",
    "and `outcome_names`, in order, and return 0 if the constraint is not\n",
    "violated, and a positive number if it is violated.\n",
    "\n",
    "Two convenient class methods are provided within the `Constraint` class:\n",
    "`must_be_less_than` and `must_be_greater_than`, which can simplify\n",
    "the creation and legibility of simple constraints on a single\n",
    "parameter or outcome.  Each take a single argument, the threshold\n",
    "of the constraint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ab018c4f",
   "metadata": {},
   "outputs": [],
   "source": [
    "c_min_expansion = Constraint(\n",
    "    \"Minimum Capacity Expansion\",\n",
    "    parameter_names=\"expand_capacity\",\n",
    "    function=Constraint.must_be_greater_than(10),\n",
    ")\n",
    "\n",
    "c_positive_mean_net_benefit = Constraint(\n",
    "    \"Minimum Net Benefit\",\n",
    "    outcome_names = \"Mean Net Benefits\",\n",
    "    function = Constraint.must_be_greater_than(0),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bd31497c",
   "metadata": {},
   "source": [
    "It is also possible to impose constraints based on a combination \n",
    "of inputs and outputs. For example, suppose that the total funds \n",
    "available for pay-as-you-go financing are only 3,000. We may thus \n",
    "want to restrict the robust search to only solutions that are almost \n",
    "certainly within the available funds at 99% confidence (a robustness\n",
    "measure that is an output we can construct) \n",
    "but only if the Paygo financing option is used (a model input). \n",
    "This kind of constraint can be created by giving both \n",
    "`parameter_names` and `outcomes_names`, and writing a constraint \n",
    "function that takes two arguments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "09a0c21b",
   "metadata": {},
   "outputs": [],
   "source": [
    "pct99_present_cost = Measure(\n",
    "    '99%ile Present Cost',\n",
    "    kind=Measure.INFO,\n",
    "    variable_name='present_cost_expansion',\n",
    "    function=functools.partial(numpy.percentile, q=99),\n",
    ")\n",
    "\n",
    "c_max_paygo = Constraint(\n",
    "    \"Maximum Paygo\",\n",
    "    parameter_names='debt_type',\n",
    "    outcome_names='99%ile Present Cost',\n",
    "    function=lambda i,j: max(0, j-3000) if i=='Paygo' else 0,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2a555fc",
   "metadata": {},
   "source": [
    "The constraints are then passed to the `robust_optimize` method in addition to the \n",
    "other arguments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bfd59f60",
   "metadata": {},
   "outputs": [],
   "source": [
    "robust_constrained = model.robust_optimize(\n",
    "    robustness_functions=[\n",
    "        expected_net_benefit,\n",
    "        pct5_net_benefit,\n",
    "        neg_net_benefit,\n",
    "        pct95_cost,\n",
    "        expected_time_savings,\n",
    "        pct99_present_cost,\n",
    "    ],\n",
    "    constraints = [\n",
    "        c_min_expansion, \n",
    "        c_positive_mean_net_benefit, \n",
    "        c_max_paygo,\n",
    "    ],\n",
    "    scenarios=250,\n",
    "    nfe=10_000,\n",
    "    check_extremes=1,\n",
    "    evaluator=get_client(),\n",
    "    cache_file='./optimization_cache/road_test_robust_search_constrained.gz',\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9572f2e0",
   "metadata": {},
   "outputs": [],
   "source": [
    "robust_constrained.par_coords()"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "cell_metadata_json": true,
   "encoding": "# -*- coding: utf-8 -*-",
   "formats": "ipynb,py:percent"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}