Pragmatic Programming Techniques: optimization

Showing posts with label optimization. Show all posts

Tuesday, August 25, 2015

Common techniques in optimization

Optimization is a frequently encountered problem in real life. We need to make a decision to achieve something within a set of constraints, and we need to maximize or minimize our objective based on some measurement. For example, a restaurant may need to decide how many workers (of each position) to hire to serve the customers, with the constraint that workers cannot work overtime, and the objective is to minimize the cost. A car manufacturer may need to decide how many units of each model to be produced, within the constraint of the storage capacity of its warehouse, and maximize the profit of its sales.

Exhaustive Search

If there isn't a lot of choices, an exhaustive search (e.g. breath-first, depth-first, best-first, A* ... etc.) can be employed to evaluate each option, see if it meets all the constraints (ie: a feasible solution), and then compute its objective value. Then we sort all feasible solutions based on its corresponding objective value and pick the solution that has the max (or min) objective value as our decision. Unfortunately, real world problem usually involve a large number (exponentially large due to combinatorial explosion) of choices, making the exhaustive search in many cases impractical.

When this happens, two other solution approaches are commonly used.
1) Mathematical Programming
2) Local Greedy Search.

Mathematical Programming

Mathematical programming is a classical way to solve optimization problem. It is a family of approaches including linear programming, integer programming, quadratic programming and even non-linear programming. The development process usually go through the following steps ...

From a problem description, the modeler will express the problem into a mathematical structure containing 3 parts.

Variables: there are two types of variables. Data variables contains the current value of your business environment (e.g. cost of hiring a waiter, price of a car), and Decision variables hold the decision you make to optimize your objective (e.g. how many staff to hire, how many cars to make).
Constraints: it is a set of rules that you cannot break. Effectively, constraints disallow certain combinations of your decision variables and is mainly used to filter out in-feasible solutions. In a typical settings, constraints are expressed as a system of linear inequality equations where a linear expression of decision variables are specified on the left hand side and a value is specified on the right hand side after an inequality comparison.
Objective function: it encapsulates the quantitative measure of how our goal has been achieved. In a typical setting, objective function is expressed as a single linear (or quadratic) combination of decision variables

After the mathematical structure is defined, the modeler will submit it to a solver, which will output the best solution. The process can be depicted as follows.

Expressing the problem in the mathematical structure is the key design of the solution. There are many elements to be considered, which we described below.

The first consideration is how to represent your decision, specially whether the decision is a quantity (real number), a number of units (integer) or a binary decision (integer 0, 1).

The next step is to represent the constraints in terms of inequality equations of linear combination of decision variables. You need to think whether the constraints is a hard of soft constraints. Hard constraints should be expressed in the constraint part of the problem. Notice the solver will not consider any solution once it violates any constraints. Therefore, if the solver cannot find a feasible solution that fulfill all constraints, it will simply abort. In other words, it won't return you a solution that violates the smallest number of constraints. If you want the solve to tell you that because you have rooms to relax them, you should model these soft constraints in the objective function rather than in the constraint section. Typically, you define an objective function that quantifies the degree of violation. The solver will then give you the optimal solution (violating least number of constraints) rather than just telling you no solution is found.

Finally you define the objective function. In most real-life problems, you have multiple goals in mind (e.g. you want to minimize your customer's wait time in the queue, but you also want to minimize your cost of hiring staffs). First you express each goal as a linear expression of decision variables and then take the weighted average among different goals (which is also a linear combination of all decision variables) to form the final objective function. How to choose the weights among different goals is a business question, based on how much you are willing to trade off between conflicting goals.

There are some objectives that cannot be expressed by a linear expression of decision variables. One frequently encountered example is about minimizing the absolute deviation from a target value (ie. no matter the deviation is a positive and negative value). A common way is to minimize the sum of square of the difference. But after we square it, it is no longer a linear expression. To address this requirement, there is a more powerful class of solver call "quadratic programming" which relax the objective function to allow a degree 2 polynomial expression.

After you expressed the problem in the mathematical structure. You can pass it to a solver (there are many open source and commercial solver available) which you can treat it as a magical black box. The solver will output an optimal solution (with a value assigned to each decision variable) that fulfill all constraints and maximize (or minimize) the objective function.

Once you received the solution from the solver, you need to evaluate how "reliable" is this optimal solution. There may be fluctuations in the data variables due to collection error, or the data may have a volatile value that changes rapidly, or the data is an estimation of another unknown quantity.

Ideally, the optimal solution doesn't change much when we fluctuate the data values within its error bound. In this case, our optimal solution is stable against error in our data variables and therefore is a good one. However, if the optimal solution changes drastically when the data variable fluctuates, we say the optimal solution is unreliable and cannot use it. In the case, we usually modify each data variables one at a time to figure out which data variable is causing a big swing of optimal solution and try to reduce our estimation error of that data variable.

The sensitivity analysis is an important step to evaluate the stability and hence the quality of our optimal solution. It also provides guidance on which area we need to invest effort to make the estimation more accurate.

Mathematical Programming allows you to specify your optimization problem in a very declarative manner and also output an optimal solution if it exist. It should be the first-to-go solution. The downside of Mathematical programming is that it requires linear constraints and linear (or quadratic) objectives. And it also has limits in terms of number of decision variables and constraints that it can store (and this limitation varies among different implementations). Although there are non-linear solvers, the number of variables it can take is even smaller.

Usually the first thing to do is to test out if your problem is small enough to fit in the mathematically programming solver. If not, you may need to use another approach.

Greedy Local Search

The key words here are "local" and "greedy". Greedy local search starts at a particular selection of decision variables, and then it look around surrounding neighborhood (hence the term "local") and within that find the best solution (hence the term "greedy"). Then it moves the current point to this best neighbor and then the process repeats until the new solution stays at the same place (ie. there is no better neighbor found).

If the initial point is selected to be a non-feasible solution, the greedy search will first try to find a feasible solution first by looking for neighbors that has less number of constraint violation. After the feasible solution is found, the greedy search will only look for neighbors that fulfill all constraints and within which to find a neighbor with the best objective value. Another good initialization strategy is to choose the initial point to be a feasible (of course not optimal) solution and then start the local search from there.

Because local search limits its search within a neighborhood, it can control the degree of complexity by simply limits the scope of neighbor. Greedy local search can evaluate a large number of variables by walking across a chain of neighborhoods. However, because local search only navigate towards the best neighbor within a limited scope, it loses the overall picture and rely on the path to be a convex shape. If there are valleys along the path, the local search will stop there and never reach the global optimum. This is called a local optimal trap because the search is trapped within a local maximum and not able to escape. There are some techniques to escape from local optimum that I describe in my previous blog post.

When performing greedy local search, it is important to pick the right greedy function (also called heuristic function) as well as the right neighborhood. Although it is common to choose the greedy function to be the same objective function itself, they don't have to be the same. In many good practices, the greedy function is chosen to be the one that has high correlation with the objective function, but can be computed much faster. On the other hand, evaluating objective function of every point in the neighborhood can also be a very expensive operation, especially when the data has a high dimension and the number of neighbors can be exponentially large. A good neighbor function can be combined with a good greedy function such that we don't need to evaluate each neighbor to figure out which one has the best objective value.

Combining the two approaches

In my experience, combining the two approaches can be a very powerful solution itself. At the outer layer, we are using the greedy local search to navigate the direction towards better solution. However, when we search for the best neighbor within the neighborhood, we use the mathematical programming by taking the greedy function as the objective function, and we use the constraints directly. This way, we can use a very effective approach to search for best solution within the neighborhood and can use the flexible neighborhood scoping of local search to control the complexity of the mathematical programming solver.

Thursday, December 12, 2013

Escape local optimum trap

Optimization is a very common technique in computer science and machine learning to search for the best (or good enough) solution. Optimization itself is a big topic and involves a wide range of mathematical techniques in different scenarios.

In this post, I will be focusing in local search, which is a very popular technique in searching for an optimal solution based on a series of greedy local moves. The general setting of local search is as follows ...

1. Define an objective function
2. Pick an initial starting point
3. Repeat
     3.1 Find a neighborhood
     3.2 Locate a subset of neighbors that is a candidate move
     3.3 Select a candidate from the candidate set
     3.4 Move to the candidate

One requirement is that the optimal solution need to be reachable by a sequence of moves. Usually this requires a proof that any arbitrary state is reachable by any arbitrary state through a sequence of moves.

Notice that there are many possible strategies for each steps in 3.1, 3.2, 3.3. For example, one strategy can examine all members within the neighborhood, pick the best one (in terms of the objective function) and move to that. Another strategy can randomly pick a member within the neighborhood, and move to the member if it is better than the current state.

Regardless of the strategies, the general theme is to move towards the members which is improving the objective function, hence the greedy nature of this algorithm.

One downside of this algorithm is that it is possible to be trapped in a local optimum, whose is the best candidate within its neighborhood, but not the best candidate from a global sense.

In the following, we'll explore a couple meta-heuristic techniques that can mitigate the local optimum trap.

Multiple rounds

We basically conduct k rounds of local search, each round getting a local optimum and then pick the best one out of these k local optimum.

Simulated Anealing

This strategy involves a dynamic combination of exploitation (better neighbor) and exploration (random walk to worse neighbor). The algorithm works in the following way ...

1. Pick an initial starting point
2. Repeat until terminate condition
     2.1 Within neighborhood, pick a random member
     2.2 If neighbor is better than me
           move to the neighbor
         else
           With chance exp(-(myObj - neighborObj)/Temp)
               move to the worse neighbor
     2.3 Temp = alpha * Temp

Tabu Search

This strategy maintains a list of previously visited states (called Tabu list) and make sure these states will not be re-visited in subsequent exploration. The search will keep exploring the best move but skipping the previously visited nodes. This way the algorithm will explore the path that hasn't be visited before. The search also remember the best state obtained so far.

1. Initialization
     1.1 Pick an initial starting point S
     1.2 Initial an empty Tabu list
     1.3 Set the best state to S
     1.4 Put S into the Tabu list
2. Repeat until terminate condition
     2.1 Find a neighborhood
     2.2 Locate a smaller subset that is a candidate move
     2.3 Remove elements that is already in Tabu list
     2.4 Select the best candidate and move there
     2.5 Add the new state to the Tabu list
     2.6 If the new state is better that best state
          2.6.1 Set the best state to this state
     2.7 If the Tabu list is too large
          2.7.1 Trim Tabu list by removing old items

Monday, January 14, 2013

Optimization in R

Optimization is a very common problem in data analytics. Given a set of variables (which one has control), how to pick the right value such that the benefit is maximized. More formally, optimization is about determining a set of variables x1, x2, … that maximize or minimize an objective function f(x1, x2, …).

Unconstrained optimization

In an unconstraint case, the variable can be freely selected within its full range.

A typical solution is to compute the gradient vector of the objective function [∂f/∂x1, ∂f/∂x2, …] and set it to [0, 0, …]. Solve this equation and output the result x1, x2 … which will give the local maximum.

In R, this can be done by a numerical analysis method.

> f <- function(x){(x[1] - 5)^2 + (x[2] - 6)^2}
> initial_x <- c(10, 11)
> x_optimal <- optim(initial_x, f, method="CG")
> x_min <- x_optimal$par
> x_min
[1] 5 6

Equality constraint optimization

Moving onto the constrained case, lets say x1, x2 … are not independent and then have to related to each other in some particular way: g1(x1, x2, …) = 0, g2(x1, x2, …) = 0.

The optimization problem can be expressed as …
Maximize objective function: f(x1, x2, …)
Subjected to equality constraints:

g1(x1, x2, …) = 0
g2(x1, x2, …) = 0

A typical solution is to turn the constraint optimization problem into an unconstrained optimization problem using Lagrange multipliers.

Define a new function F as follows ...
F(x1, x2, …, λ1, λ2, …) = f(x1, x2, …) + λ1.g1(x1, x2, …) + λ2.g2(x1, x2, …) + …

Then solve for ...
[∂F/∂x1, ∂F/∂x2, …, ∂F/∂λ1, ∂F/∂λ2, …] = [0, 0, ….]

Inequality constraint optimization

In this case, the constraint is inequality. We cannot use the Lagrange multiplier technique because it requires equality constraint. There is no general solution for arbitrary inequality constraints.

However, we can put some restriction in the form of constraint. In the following, we study two models where constraint is restricted to be a linear function of the variables:
w1.x1 + w2.x2 + … >= 0

Linear Programming

Linear programming is a model where both the objective function and constraint function is restricted as linear combination of variables. The Linear Programming problem can be defined as follows ...

Maximize objective function: f(x1, x2) = c1.x1 + c2.x2

Subjected to inequality constraints:

a11.x1 + a12.x2 <= b1
a21.x1 + a22.x2 <= b2
a31.x1 + a32.x2 <= b3
x1 >= 0, x2 >=0

As an illustrative example, lets walkthrough a portfolio investment problem. In the example, we want to find an optimal way to allocate the proportion of asset in our investment portfolio.

StockA gives 5% return on average
StockB gives 4% return on average
StockC gives 6% return on average

To set some constraints, lets say my investment in C must be less than sum of A, B. Also, investment in A cannot be more than twice of B. Finally, at least 10% of investment in each stock.

To formulate this problem:

Variable: x1 = % investment in A, x2 = % in B, x3 = % in C

Maximize expected return: f(x1, x2, x3) = x1*5% + x2*4% + x3*6%

Subjected to constraints:

10% < x1, x2, x3 < 100%
x1 + x2 + x3 = 1
x3 < x1 + x2
x1 < 2 * x2

> library(lpSolve)
> library(lpSolveAPI)
> # Set the number of vars
> model <- make.lp(0, 3)
> # Define the object function: for Minimize, use -ve
> set.objfn(model, c(-0.05, -0.04, -0.06))
> # Add the constraints
> add.constraint(model, c(1, 1, 1), "=", 1)
> add.constraint(model, c(1, 1, -1), ">", 0)
> add.constraint(model, c(1, -2, 0), "<", 0)
> # Set the upper and lower bounds
> set.bounds(model, lower=c(0.1, 0.1, 0.1), upper=c(1, 1, 1))
> # Compute the optimized model
> solve(model)
[1] 0
> # Get the value of the optimized parameters
> get.variables(model)
[1] 0.3333333 0.1666667 0.5000000
> # Get the value of the objective function
> get.objective(model)
[1] -0.05333333
> # Get the value of the constraint
> get.constraints(model)
[1] 1 0 0

Quadratic Programming

Quadratic programming is a model where both the objective function is a quadratic function (contains up to two term products) while constraint function is restricted as linear combination of variables.

The Quadratic Programming problem can be defined as follows ...

Minimize quadratic objective function:
f(x1, x2) = c1.x12 + c2. x1x2 + c2.x22 - (d1. x1 + d2.x2)
Subject to constraints

a11.x1 + a12.x2 == b1
a21.x1 + a22.x2 == b2
a31.x1 + a32.x2 >= b3
a41.x1 + a42.x2 >= b4
a51.x1 + a52.x2 >= b5

Express the problem in Matrix form:

Minimize objective ½(DTx) - dTx
Subject to constraint ATx >= b
First k columns of A is equality constraint

As an illustrative example, lets continue on the portfolio investment problem. In the example, we want to find an optimal way to allocate the proportion of asset in our investment portfolio.

StockA gives 5% return on average
StockB gives 4% return on average
StockC gives 6% return on average

We also look into the variance of each stock (known as risk) as well as the covariance between stocks.

For investment, we not only want to have a high expected return, but also a low variance. In other words, stocks with high return but also high variance is not very attractive. Therefore, maximize the expected return and minimize the variance is the typical investment strategy.

One way to minimize variance is to pick multiple stocks (in a portfolio) to diversify the investment. Diversification happens when the stock components within the portfolio moves from their average in different direction (hence the variance is reduced).

The Covariance matrix ∑ (between each pairs of stocks) is given as follows:
1%, 0.2%, 0.5%
0.2%, 0.8%, 0.6%
0.5%, 0.6%, 1.2%

In this example, we want to achieve a overall return of at least 5.2% of return while minimizing the variance of return

To formulate the problem:

Variable: x1 = % investment in A, x2 = % in B, x3 = % in C

Minimize variance: xT∑x

Subjected to constraint:

x1 + x2 + x3 == 1
X1*5% + x2*4% + x3*6% >= 5.2%
0% < x1, x2, x3 < 100%

> library(quadprog)
> mu_return_vector <- c(0.05, 0.04, 0.06) 
> sigma <- matrix(c(0.01, 0.002, 0.005, 
+                   0.002, 0.008, 0.006, 
+                   0.005, 0.006, 0.012), 
+                  nrow=3, ncol=3)
> D.Matrix <- 2*sigma
> d.Vector <- rep(0, 3)
> A.Equality <- matrix(c(1,1,1), ncol=1)
> A.Matrix <- cbind(A.Equality, mu_return_vector, 
                    diag(3))
> b.Vector <- c(1, 0.052, rep(0, 3))
> out <- solve.QP(Dmat=D.Matrix, dvec=d.Vector, 
                  Amat=A.Matrix, bvec=b.Vector, 
                  meq=1)
> out$solution
[1] 0.4 0.2 0.4
> out$value
[1] 0.00672
>

Integration with Predictive Analytics

Optimization is usually integrated with predictive analytics, which is another important part of data analytics. For an overview of predictive analytics, here is my previous blog about it.

The overall processing can be depicted as follows:

In this diagram, we use machine learning to train a predictive model in batch mode. Once the predictive model is available, observation data is fed into it at real time and a set of output variables is predicted. Such output variable will be fed into the optimization model as the environment parameters (e.g. return of each stock, covariance ... etc.) from which a set of optimal decision variable is recommended.