Portfolio Optimization

In this section we cover portfolio optimization - I thought it would be useful to add it this year as it is becoming more and more important today and might spur some interests in project applications.

While the list of possible optimizers to explore in R is vast, we will be focussing on setting up a simple optimization problem and covering this topic at a comparatively high level.

Below we will do a basic overview of available portfolio optimization techniques in R.

The purpose of doing portfolio optimization is to find some optimal strategy conditional upon some predefined target.

There are various types of optimizations, including e.g. a linear programming exercise of the following form:

Maximize: \(c^T.x\) Subject to: \(Ax < b\) and: \(x >= 0\)

From the above, \(x\) would, in a portfolio optimization setting, represent a vector of weights, with \(c\) and \(b\) known vectors, and \(A\) a matrix of coefficients. \(c^T.x\) is the objective function, with the inequalities the constraints over which the objective is maximized. Let’s translate this into intuitive terms: we simply want to find the weights, \(w\), which maximize expected returns, while being subject to specific constraints.

George Dantzig’s celebrated work on linear programming resulted in the simplex form solution which makes solving these optimizers easy.

MVO

A common example is a mean-variance optimizer (MVO, of the Markowitzian type), where the idea is to find a weights vector which maximizes return(\(\mu\)), while minimizing risk (\(\sigma\), or the covariance matrix). This is a quadratic type optimization. Additionally, we may add various other box (e.g. long-only, or weight > 0) and group constraints (e.g. max exposure to single sector of 20%, and max single stock exposure of 10%) too.

Thus: MVO is a quadratic problem with linear constraints.

We can frame this problem as:

\[\begin{equation} min_w \quad \bar{\sigma} = w^T\hat{\Sigma}w \\ s.t. \\ w^T \hat{\mu} = \bar{r} \\ w^T * 1 = 1 \end{equation}\]

From the above - we find the weights vector, \(w\), which minimize the sample covariance matrix (\(\hat{\Sigma}\)), subject to the constraints that \(w^T * 1 = 1\), or that the sum of the weights equal one (i.e, fully invested capital). The target return, \(\bar{r}\), is expressed such that it equates the m-dimensional vector of expected returns of the assets.

The solution we arrive at is unique, and can thus be labeled as the MV-optimized portfolio weights.

Of course the accuracy of this out-of-sample depends on various factors, including the quality of return forecasts and stability of the covariance matrix. Nonetheless, the solution is pareto efficient in-sample.

Contrasting in-sample (IS) efficiency with out-of-sample (OOS) inaccuracy is a fun past-time enjoyed by many a scholar.

Further, MVO is part of the subjective optimization model suite - i.e. where a required return level is provided. Another common example includes portfolio utility optimization, where an explicit risk preference (typically called \(\lambda\)) is provided and incorporated into the optimization as:

\(Max \quad U_P = E(R_P) - \lambda_i*\sigma_p\)

For the above, the level of risk is given and return and risk optimized around it. See below for several different levels of risk on 10 stocks that I randomly constructed and optimized using the above framework, and linear constraints of long only (\(w>0\)) and full investment (\(\sum w_i = 1\)):

Objective optimization techniques include: minimum variance (MV), max return or max Sharpe portfolios, which are comparatively easy to calculate and apply irrespective of risk or return preference (hence, objective optimal point - not subjectively determined by preferred level of risk).

Implementing quadratic programming in R:

We will next test some optimization techniques on German stock market data using R’s quadprog package.

Before testing the MVO optimization routine, let’s first learn the notation used in quadprog:

solve.QP(Dmat, dvec, Amat, bvec, meq)

So we have:

minimize: \(\frac{1}{2}w^T*Dmat*w - dvec^T w\)

s.t.: \(Amat^T*w >= bvec\)

with:

Dmat: matrix to be minimized (typically sample covariance matrix)
dvec: vector in quadratic function below to be minimized (expected returns vector for MVO, where we minimize negative returns, which is akin to maximizing returns).
Amat: matrix of constraints to be respected when minimizing quadratic function
bvec: \(b_0\) matrix (discussed later)
meq: first meq constraints are equality constraints (otherwise inequality) - defaults to zero, i.e. all are inequality.

The complicated part of programming the above is planning the constraints matrix (the optimize part could be as simple as the sample covariance matrix and past returns vector, although some shrinkage would probably be appleid to \(\Sigma\)).

E.g., if we have 3 stocks, and we want to make them sum to 1 (i.e. be fully invested), as well as have the maximum weight to any stock be 50% and minimum weight be 10% (box constraint), we set the Amat and bvec as follows (multiply the following out to understand how we use meq as well):

\[\begin{equation} Amat * w >= bvec \\ \begin{bmatrix} 1 & 1 & 0 & 0 & -1 & 0 & 0\\ 1 & 0 & 1 & 0 & 0 & -1 & 0\\ 1 & 0 & 0 & 1 & 0 & 0 & -1 \\ \end{bmatrix} \begin{bmatrix} w1 \\ w2\\ w3 \end{bmatrix} >= [1, 0.1, 0.1, 0.1, -0.5, -0.5, -0.5] \end{equation}\]

We then need to set meq = 1: as we require w1 + w2 + w3 = 1, while -w1 > -0.5 (i.e. w1 < 0.5) and w1 > 0.1 is required for the others (they are inequality).

The reason for the minusses above is that the optimization is always framed as a larger than sign.

So… let’s go do this on some real data so that you can brag to your friends at your next braai.

Optimizer: Covariance Estimator and returns vector

We will be using the RiskPortfolios portfolio to estimate our sample covariance matrix. We will also take a look at the fitHeavyTail package for more extensions to our covariance matrix.

pacman::p_load(tidyverse)
pacman::p_load(RiskPortfolios);pacman::p_load(fitHeavyTail)

Let’s now load some real data:

pacman::p_load(tidyverse);pacman::p_load(quadprog)
StartDate <- lubridate::ymd(20150101)
Dax <- fmxdat::DAX %>% filter(date > StartDate)

# Let's now pick and only use the top 50 stocks from the last date:
Top_50 <- 
  Dax %>% filter(date == last(date)) %>% 
  arrange(desc(weight)) %>% top_n(50, weight) %>% pull(Tickers)

# Right, with this let's subset our entire dataframe:
Dax <- 
  Dax %>% filter(Tickers %in% Top_50)

Now for calculating a covariance matrix using RiskPortfolios (or other packages) - you will find that we have to spread the data (make it wide):

return_mat <- Dax %>% select(date, Tickers, return) %>% spread(Tickers, return)

# Notice here - I do not spread with Sectors and weights in the df... 
# This follows as it would create tons of duplicates - as the sectors and weights are now not assigned tidily, but effectively repeated for all stocks..
# Please check this yourself and be careful when gathering / spreading...

Square data matrix

The first problem that you will encounter is having a square data matrix. Mostly stocks enter and exit an index at odd intervals (delisting, listing, etc) - implying they don’t have similar start and end dates.

Naturally, as you want to calculate a covariance matrix - this would require a full square matrix of returns, as NA values cannot be tolerated.

There are several ways to deal with this (I list good and bad and absolute no-nos below):

Force a square matrix by only considering stocks that have returns for the full period.

This is not a great option as you would lose a lot of information

Set NA to zero

This is a massive no-no, as zero is a valid return (flat price) and would thus bias your covariance estimate

Use average returns for that period

Plausible solution, but the problem is now that you bias the distribution of that stock to the market - thereby potentially bidding up its covariance with other stocks.

Create random draws based on a sample distribution for the stock

Better solution - as it preserves the distributional elements particular to the stock.
Could also use peer distribution (e.g. sector) as well.

Distributional draws to make returns matrix square.

Right, let’s use step 4 above to fill our missing data using a function that we create (doubling as an exercise in function creator too).

Note first my use of Seed - this is the randomness specifyer. I like to make my seed something really random, like the exact time, e.g.

ALSO note (NB):

Whenever we use mutate to get an answer that does not produce a singular answer (like e.g. producing an object), we need to place it in a list environment.

An example of this would be if you try to calculate the t.test for a group using mutate. See for yourself the following won’t work:

mtcars %>% group_by(gear) %>% mutate(ttest = stats::t.test(mpg))

But the following will:

tresult <- mtcars %>% group_by(gear) %>% mutate(ttest = list(stats::t.test(mpg)))
print(tresult)

## # A tibble: 32 × 12
## # Groups:   gear [3]
##      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb ttest  
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <list> 
##  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4 <htest>
##  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4 <htest>
##  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1 <htest>
##  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1 <htest>
##  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2 <htest>
##  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1 <htest>
##  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4 <htest>
##  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2 <htest>
##  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2 <htest>
## 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4 <htest>
## # … with 22 more rows

Now the ttest column contains an htest object as a list, which requires us to first unbox it (using [[1]]) to use it:

# Knowing that ttest works as follows (for first row):
tresult %>% slice(1) %>% pull(ttest) -> ttestex
ttestex[[1]]$p.value

## [1] 3.081667e-11

# We can extract the pvalue for all rows as:
tresult %>% mutate(pval = ttest[[1]]$p.value)

## # A tibble: 32 × 13
## # Groups:   gear [3]
##      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb ttest  
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <list> 
##  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4 <htest>
##  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4 <htest>
##  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1 <htest>
##  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1 <htest>
##  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2 <htest>
##  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1 <htest>
##  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4 <htest>
##  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2 <htest>
##  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2 <htest>
## 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4 <htest>
## # … with 22 more rows, and 1 more variable: pval <dbl>

# Or alternatively using map as:
tresult %>% mutate(pval = unlist(map(ttest, "p.value")))

## # A tibble: 32 × 13
## # Groups:   gear [3]
##      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb ttest  
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <list> 
##  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4 <htest>
##  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4 <htest>
##  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1 <htest>
##  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1 <htest>
##  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2 <htest>
##  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1 <htest>
##  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4 <htest>
##  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2 <htest>
##  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2 <htest>
## 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4 <htest>
## # … with 22 more rows, and 1 more variable: pval <dbl>

Work through the above function to understand the below use of density and sample draws from it…

# Note I set impute_returns_method to NONE by default. This will produce a warning if there is indeed a missing return in the matrix...

impute_missing_returns <- function(return_mat, impute_returns_method = "NONE"){
  # Make sure we have a date column called date:
  if( !"date" %in% colnames(return_mat) ) stop("No 'date' column provided in return_mat. Try again please.")

  # Note my use of 'any' below...
  # Also note that I 'return' return_mat - which stops the function and returns return_mat.
  if( impute_returns_method %in% c("NONE", "None", "none") ) {
    if( any(is.na(return_mat)) ) warning("There are missing values in the return matrix.. Consider maybe using impute_returns_method = 'Drawn_Distribution_Own' / 'Drawn_Distribution_Collective'")
    return(return_mat)
  }


  if( impute_returns_method  == "Average") {

    return_mat <-
      return_mat %>% gather(Stocks, Returns, -date) %>%
      group_by(date) %>%
      mutate(Avg = mean(Returns, na.rm=T)) %>%
      mutate(Avg = coalesce(Avg, 0)) %>% # date with no returns - set avg to zero
      ungroup() %>%
      mutate(Returns = coalesce(Returns, Avg)) %>% select(-Avg) %>% spread(Stocks, Returns)

    # That is just so much easier when tidy right? See how I gathered and spread again to give back a wide df?
    return(return_mat)
  } else

    if( impute_returns_method  == "Drawn_Distribution_Own") {

      N <- nrow(return_mat)
      return_mat <-
        # DIY: see what density function does
        left_join(return_mat %>% gather(Stocks, Returns, -date),
                  return_mat %>% gather(Stocks, Returns, -date) %>% group_by(Stocks) %>%
                    mutate(Dens = list(density(Returns, na.rm=T))) %>%
                    summarise(set.seed(as.numeric(format( Sys.time(), format = "%s"))/1e3*sample(1:100)[1]), Random_Draws = list(sample(Dens[[1]]$x, N, replace = TRUE, prob=.$Dens[[1]]$y))),
                  by = "Stocks"
        ) %>%  group_by(Stocks) %>%
        # Random draw from sample:
        mutate(Returns = coalesce(Returns, Random_Draws[[1]][row_number()])) %>%
        select(-Random_Draws) %>% ungroup() %>% spread(Stocks, Returns)
      return(return_mat)
    } else

      if( impute_returns_method  == "Drawn_Distribution_Collective") {
        NAll <- nrow(return_mat %>% gather(Stocks, Returns, -date))
        # DIY: see what density function does
        return_mat <-
          bind_cols(
            return_mat %>% gather(Stocks, Returns, -date),
            return_mat %>% gather(Stocks, Returns, -date) %>%
              mutate(Dens = list(density(Returns, na.rm=T))) %>%
              summarise(set.seed(as.numeric(format( Sys.time(), format = "%s"))/1e3*sample(1:100)[1]), Random_Draws = list(sample(Dens[[1]]$x, NAll, replace = TRUE, prob=.$Dens[[1]]$y))) %>%
              unnest(Random_Draws)
          ) %>%
          mutate(Returns = coalesce(Returns, Random_Draws)) %>% select(-Random_Draws) %>% spread(Stocks, Returns)
        return(return_mat)
      } else

        if( impute_returns_method  == "Zero") {
          warning("This is probably not the best idea but who am I to judge....")
          return_mat[is.na(return_mat)] <- 0
          return(return_mat)
        } else
          stop("Please provide a valid impute_returns_method method. Options include:\n'Average', 'Drawn_Distribution_Own', 'Drawn_Distribution_Collective' and 'Zero'.")

  return_mat

}

# Now we will use this function as follows (after saving and sourcing it of course....):

# As a quick example:

 return_mat_Quick = fmxdat::asisa %>% filter(Funds %in% paste0("Fund_", 1:5)) %>% select(date, Funds, Returns) %>% filter(date > lubridate::ymd(20150101)) %>% spread(Funds, Returns)
 impute_missing_returns(return_mat_Quick, impute_returns_method = "NONE")

## Warning in impute_missing_returns(return_mat_Quick, impute_returns_method =
## "NONE"): There are missing values in the return matrix.. Consider maybe using
## impute_returns_method = 'Drawn_Distribution_Own' /
## 'Drawn_Distribution_Collective'

## # A tibble: 90 × 6
##    date          Fund_1  Fund_2   Fund_3   Fund_4    Fund_5
##    <date>         <dbl>   <dbl>    <dbl>    <dbl>     <dbl>
##  1 2015-01-31  0.00411   0.0107  0.0338  NA        0.000464
##  2 2015-02-28  0.0501    0.0209  0.0457  NA        0.0521  
##  3 2015-03-31 -0.0182   -0.0352  0.0218  NA        0.00500 
##  4 2015-04-30  0.0335    0.0388  0.0290  NA        0.0682  
##  5 2015-05-31  0.000094 -0.0211 -0.0201  NA       -0.0432  
##  6 2015-06-30 -0.0164   -0.0263 -0.00556 NA       -0.0181  
##  7 2015-07-31  0.0576   -0.0132  0.0124  NA       -0.0297  
##  8 2015-08-31 -0.00583  -0.0265 -0.0289  -0.0305  -0.0336  
##  9 2015-09-30 -0.0317   -0.0307  0.0104   0.00994 -0.0502  
## 10 2015-10-31  0.0357    0.0587  0.0631   0.0765   0.0693  
## # … with 80 more rows

 impute_missing_returns(return_mat_Quick, impute_returns_method = "Average")

## # A tibble: 90 × 6
##    date          Fund_1  Fund_2   Fund_3   Fund_4    Fund_5
##    <date>         <dbl>   <dbl>    <dbl>    <dbl>     <dbl>
##  1 2015-01-31  0.00411   0.0107  0.0338   0.0123   0.000464
##  2 2015-02-28  0.0501    0.0209  0.0457   0.0422   0.0521  
##  3 2015-03-31 -0.0182   -0.0352  0.0218  -0.00663  0.00500 
##  4 2015-04-30  0.0335    0.0388  0.0290   0.0423   0.0682  
##  5 2015-05-31  0.000094 -0.0211 -0.0201  -0.0211  -0.0432  
##  6 2015-06-30 -0.0164   -0.0263 -0.00556 -0.0166  -0.0181  
##  7 2015-07-31  0.0576   -0.0132  0.0124   0.00680 -0.0297  
##  8 2015-08-31 -0.00583  -0.0265 -0.0289  -0.0305  -0.0336  
##  9 2015-09-30 -0.0317   -0.0307  0.0104   0.00994 -0.0502  
## 10 2015-10-31  0.0357    0.0587  0.0631   0.0765   0.0693  
## # … with 80 more rows

 impute_missing_returns(return_mat_Quick, impute_returns_method = "Drawn_Distribution_Own")

## # A tibble: 90 × 6
##    date          Fund_1  Fund_2   Fund_3   Fund_4    Fund_5
##    <date>         <dbl>   <dbl>    <dbl>    <dbl>     <dbl>
##  1 2015-01-31  0.00411   0.0107  0.0338  -0.0316   0.000464
##  2 2015-02-28  0.0501    0.0209  0.0457   0.0107   0.0521  
##  3 2015-03-31 -0.0182   -0.0352  0.0218   0.0455   0.00500 
##  4 2015-04-30  0.0335    0.0388  0.0290   0.00842  0.0682  
##  5 2015-05-31  0.000094 -0.0211 -0.0201  -0.0536  -0.0432  
##  6 2015-06-30 -0.0164   -0.0263 -0.00556 -0.0770  -0.0181  
##  7 2015-07-31  0.0576   -0.0132  0.0124  -0.0188  -0.0297  
##  8 2015-08-31 -0.00583  -0.0265 -0.0289  -0.0305  -0.0336  
##  9 2015-09-30 -0.0317   -0.0307  0.0104   0.00994 -0.0502  
## 10 2015-10-31  0.0357    0.0587  0.0631   0.0765   0.0693  
## # … with 80 more rows

 impute_missing_returns(return_mat_Quick, impute_returns_method = "Drawn_Distribution_Collective")

## # A tibble: 90 × 6
##    date          Fund_1  Fund_2   Fund_3   Fund_4    Fund_5
##    <date>         <dbl>   <dbl>    <dbl>    <dbl>     <dbl>
##  1 2015-01-31  0.00411   0.0107  0.0338   0.0189   0.000464
##  2 2015-02-28  0.0501    0.0209  0.0457   0.0341   0.0521  
##  3 2015-03-31 -0.0182   -0.0352  0.0218   0.0364   0.00500 
##  4 2015-04-30  0.0335    0.0388  0.0290   0.0402   0.0682  
##  5 2015-05-31  0.000094 -0.0211 -0.0201  -0.0132  -0.0432  
##  6 2015-06-30 -0.0164   -0.0263 -0.00556 -0.0155  -0.0181  
##  7 2015-07-31  0.0576   -0.0132  0.0124   0.00130 -0.0297  
##  8 2015-08-31 -0.00583  -0.0265 -0.0289  -0.0305  -0.0336  
##  9 2015-09-30 -0.0317   -0.0307  0.0104   0.00994 -0.0502  
## 10 2015-10-31  0.0357    0.0587  0.0631   0.0765   0.0693  
## # … with 80 more rows

 # The following should, broadly speaking, be close to one another:
impute_missing_returns(return_mat_Quick, impute_returns_method = "Drawn_Distribution_Own") %>% pull(Fund_4) %>% sd()

## [1] 0.04076856

return_mat_Quick %>% pull(Fund_4) %>% sd(., na.rm=T)

## [1] 0.04166576

options(scipen = 999) # Stop the scientific notation of
return_mat <- impute_missing_returns(return_mat, impute_returns_method = "Drawn_Distribution_Collective")

# And it is pretty quick too....

# Right! So now we have a square matrix:

any(is.na(return_mat))

## [1] FALSE

Simple Covariance matrix

Let’s get the simplest form of a covariance matrix, and then calculate the geometric means (remember - for optimizations, it is not correct to use simple mean vectors… can you argue why?).

# Drop date column for this...
return_mat_Nodate <- data.matrix(return_mat[, -1])

# Simple Sample covariance and mean:
Sigma <- RiskPortfolios::covEstimation(return_mat_Nodate)
# Simple means (should not be used here):
# Mu <- RiskPortfolios::meanEstimation(return_mat_Nodate)

# Rather use geometric means, which is the actual returns for each stock as opposed to the average:
Mu <- return_mat %>% summarise(across(-date, ~prod(1+.)^(1/n())-1)) %>% purrr::as_vector()
# Notice now that we need to raise to the power of 1/N to make it daily here.
# The important thing is that the periodicity between the mean and sigma is the same. NB!
# Also, purrr::as_vector() just makes a vector out of the Df that can be used downstream

# Btw the above is equivalent to:
# Mu <- return_mat %>% summarise(across(-date, ~exp(mean(log(1+.))) - 1 )) %>% purrr::as_vector()

Great! now we have our sample covariance and geometric means!

More accurate covariance estimations.

Let’s digress momentarily on the covariance matrix. There is a long and interesting literature that has developed highlighting some of the shortcomings of using a simple covariance matrix as the one above.

First, we know that financial returns data is anything but Gaussian.

E.g., looking at the SP500 returns:

prices <- fmxdat::SP500 %>% tbl_xts()

plot(log(prices), col = 'blue', lwd = 2, ylab = "log-price", main = "S&P 500 index")

R_daily <- diff(log(prices))[-1]
plot(R_daily, col = 'steelblue', lwd = 1, ylab = "log-return", main = "S&P 500 Index Returns")

h <- hist(as.vector(R_daily), breaks = 100, prob = TRUE, col = "lightgray", 
          xlab = "return", main = "Histogram of log-returns")
xfit <- seq(min(R_daily), max(R_daily), length = 100) 
yfit <- dnorm(xfit, mean = mean(R_daily) + 5e-4, sd = 0.6*sd(R_daily))
lines(xfit, yfit, col = "blue", lwd=2)

qqnorm(R_daily, col="blue", main="QQ plot of log-returns")
qqline(R_daily, lwd=2)

Outliers have an outsized effect on the moments derived from it - causing covariance matrices to be strongly skewed.

A few possible solutions exist. Arguably the most prevalent is some form of shrinkage.¹

The concept of shrinking a covariance matrix to reduce the impact of outliers is nicely summarized in this widely quoted paper as:

The central message of this article is that no one should use the sample covariance matrix for portfolio optimization. It is subject to estimation error of the kind most likely to perturb a mean-variance optimizer. Instead, a matrix can be obtained from the sample covariance matrix through a transformation called shrinkage. This tends to pull the most extreme coefficients toward more central values, systematically reducing estimation error when it matters most. Statistically, the challenge is to know the optimal shrinkage intensity. Shrinkage reduces portfolio tracking error relative to a benchmark index, and substantially raises the manager’s realized information ratio.

In particular the authors, Le Doit and Wolf, used shrinking towards the identity matrix (using random matrix theory), where:

\[ \hat\Sigma^{Shrunk} = (1-\rho)*\hat{\Sigma}+\rho*T\]

with:

\(T = \frac{1}{N}Tr(\hat\Sigma)\times I\)

For our sample covariance matrix, this can simply be implemented as:

# Ledoit Wolf shrinkage:
Sigma_LW <- RiskPortfolios::covEstimation(return_mat_Nodate, control = list(type = "lw"))

# shrinkage using constant correlation matrix:
Sigma_const <- RiskPortfolios::covEstimation(return_mat_Nodate, control = list(type = "const"))

Other estimation techniques that are robust to heavier tails include the estimations offered by the fitHeavyTail package.

E.g., fitHeavyTail::fit_mvt() assumes the data follows a multivariate t distribution and fits distributional objects accordingly. Notice, for these functions the data can actually include missing values.

HTT <- fitHeavyTail::fit_mvt(return_mat_Nodate)
mu <- return_mat %>% summarise(across(-date, ~prod(1+.)^(1/n())-1)) %>% purrr::as_vector()
# mu <- HTT$mu
Sigma <- HTT$cov
# Ensure order is the same for mu and Sigma (some optimizers are sensitive to ordering... :( )
mu <- mu[colnames(Sigma)]

Last thing - covariance matrices need to be positive definite in our optimizer. This may not always be the case (large N, high correlation, etc).

To make it approximately positive definite, I normally resort to using:

# Purely for safety reasons, to avoid a non-positive definite matrix breaking your function...
Sigma <- as.matrix( Matrix::nearPD(Sigma)$mat)

Let’s do some basics before delving into more advanced optimization routines that has constraints imposed.

Minimum variance portfolio

Now that we have our mean and variance estimates, it is trivial to specify certain optimal portfolios.

E.g. the minimum variance portfolio is the vector of portfolio weights that are the solution to

\[ \omega_\text{mvp} = \arg\min \omega'\Sigma \omega \text{ s.t. } \sum\limits_{i=1}^N\omega_i = 1. \]

The constraint that weights sum up to one simply implies that all funds are distributed across the available asset universe, i.e., there is no possibility to retain cash. It is easy to show analytically that (with \(I\) a matrix of ones):

\[ \omega_\text{mvp} = \frac{\Sigma^{-1}I}{I'\Sigma^{-1}I} \]

Tip: To Solve the equation: Ax = b, we use solve(). For the minimum variance portfolio, there is no ‘b’ provided, solve(Sigma) then simply means \(\Sigma^-1\)

mu_ann <- (1+mu)^(252)-1
Sigma_ann <- Sigma * sqrt(12)
N <- ncol(Sigma_ann)
Imat <- rep(1, N)
sigma_inv <- solve(Sigma_ann)

mvp_weights_raw <- sigma_inv %*% Imat
mvp_weights <- as.vector(mvp_weights_raw / sum(mvp_weights_raw))
mvp_weights_df <- tibble(Tickers = rownames(mvp_weights_raw), minv_weights = mvp_weights)

# The annualised return and SD for this portfolio follows:
tibble(
  average_ret_Ann = as.numeric(t(mvp_weights) %*% mu_ann),
  volatility_Ann = as.numeric(sqrt(t(mvp_weights) %*% Sigma_ann %*% mvp_weights))*sqrt(252)
)

## # A tibble: 1 × 2
##   average_ret_Ann volatility_Ann
##             <dbl>          <dbl>
## 1          0.0305          0.172

MVO

We can also add means to the above to do a mean-variance optimization, but note that this will now give us a frontier if we do not specify a specific return target (i.e., there’s no unique solution for MVO).

But we can of course specify return levels we want to target and then get a unique, efficient solution (think of balanced funds e.g. targeting CPI + 4%).

For simplification of notation, let:

\[ A = I'\Sigma^{-1}I \] \[ B = I'\Sigma^{-1}\mu \]

\[ C = \mu'\Sigma^{-1}\mu \]

\[ \lambda^* = 2\frac{a\bar\mu + (1-a)\tilde\mu - B/A}{C-B^2/A} \]

Then, it can be shown that the optimal weights vector (for a target return):

\[ \omega^* = a\omega_\text{eff}(\bar\mu) + (1-a)\omega_\text{mvp} = \omega_\text{mvp} + \frac{\lambda^*}{2}\left(\Sigma^{-1}\mu -\frac{B}{A}\Sigma^{-1}I \right) \] Putting this into code:

Imat <- rep(1, N)
# Let's say our target is 6% inflation + 4%:
mu_bar <- 0.1 # of course, we annualise the returns...
A <- as.numeric(t(Imat) %*% sigma_inv %*% Imat)
B <- as.numeric(t(Imat) %*% sigma_inv %*% mu_ann)
C <- as.numeric(t(mu_ann) %*% sigma_inv %*% mu_ann)
lambda_tilde <- as.numeric(2 * (mu_bar - B / A) / (C - B^2 / A))
efp_weights <- mvp_weights + lambda_tilde / 2 * (sigma_inv %*% mu_ann - B * mvp_weights)
# Note from the last line: B/A * sigma^-1 ::>  B * MVP ::> # B * sigma_inv %*% Imat / sum(sigma_inv %*% Imat)

# Test, it sums to 1:
# efp_weights %>% as.vector() %>% sum()

Note the weights vector: we’ve not yet specified constraints, so its still long & short…

Now, let’s draw the entire efficiency curve with the condition that weights sum to 1 and that we optimize returns per unit of risk:

This means, we create a frontier such that:

\[ \omega_\text{eff}(\bar{\mu}) = \arg\min \omega'\Sigma \omega \text{ s.t. } \omega'I = 1 \text{ and } \omega'\mu \geq \bar{\mu}. \] Now considering we have two efficient portfolios (minimum variance and one maximizing the returns per unit of risk), we can simply combine these to create the frontier.

Let’s loop across many predictions to create this:

Mnths_Year <- 12
a <- seq(from = -10, to = 15, by = 0.01)
res <- tibble(
  a = a,
  mu = NA,
  sd = NA
)
for (i in seq_along(a)) {
  w <- (1 - a[i]) * mvp_weights + (a[i]) * efp_weights
  res$mu[i] <- t(w) %*% mu_ann   
  res$sd[i] <- sqrt(Mnths_Year) * sqrt(t(w) %*% Sigma_ann %*% w)
}

gg <- 
res %>%
  ggplot(aes(x = sd, y = mu)) +
  geom_point() +
  geom_point(
    data = res %>% filter(a %in% c(0, 1)),
    size = 4
  ) +
  geom_point(
    data = tibble(
      mu = mu_ann,       
      sd = sqrt(Mnths_Year) * sqrt(diag(Sigma_ann))
    ),
    aes(y = mu, x = sd), size = 3, color = "steelblue"
  ) +
  labs(
    x = "Annualized SD",
    y = "Annualized Expected Returns",
    title = "Efficient frontier"
  ) + 
  fmxdat::theme_fmx()

fmxdat::finplot(gg, x.pct = T, x.pct_acc = 1, y.pct = T, y.pct_acc = 1)

Adding linear constraints

Now for the difficult part (warning: this requires planning).

Having now specified the Dmat and dvec objects above (mu and sigma), we need to create the Amat, bvec, meq objects to achieve the desired constraints of:

LB: 0%
UB: 15%
Full investment (\(\sum w = 1\))

Let’s implement the constraints we discussed above.

TIP: use diag for creating a diagonal sequence…

# Notice the difference:
print(rep(1,6))

## [1] 1 1 1 1 1 1

print(diag(rep(1,6)))

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    0    0    0    0    0
## [2,]    0    1    0    0    0    0
## [3,]    0    0    1    0    0    0
## [4,]    0    0    0    1    0    0
## [5,]    0    0    0    0    1    0
## [6,]    0    0    0    0    0    1

# Let's now use this in designing Amat, et al...

NStox <- ncol( return_mat_Nodate )
LB = 0.01
UB = 0.25
meq = 1 # as only the first column of Amat is an equality (weight sum equals 1)

bvec <- c( 1, rep(LB, NStox), -rep(UB, NStox))
Amat <- cbind(1, diag(NStox), -diag(NStox))

# And now we are ready to combine this all into getting optimal weights given these constraints:
  
# we will use the quadprog package"
  w.opt <- 
    quadprog::solve.QP(Dmat = Sigma,
                            dvec = mu, 
                            Amat = Amat, 
                            bvec = bvec, 
                            meq = meq)$solution

 result.QP <- tibble(stocks = colnames(Sigma), weight = w.opt)

Beware though: the solutions easily do not converge, and often requires some tweaking… this can be quite frustrating in practice!

Alterative: CVXR package

You can also use the CVXR package, which seeks to make the above simpler.²

What you lose in flexibility, you gain in ease and readabilitiy (so, horses for courses).

For the above example, we need to specify our optimization problem as:

pacman::p_load(CVXR)
NStox <- ncol( return_mat_Nodate )
LB = 0.01
UB = 0.25

w <- Variable(NStox)
ret <- t(mu) %*% w
risk <- quad_form(w, Sigma)
obj <- ret - risk
constr <- list(w >= LB, w <= UB, sum(w) == 1)
#constr <- list(p_norm(w,1) <= Lmax, sum(w) == 1)
prob <- Problem(Maximize(obj), constr)
result <- solve(prob)
result.CVXR <- tibble(stocks = colnames(Sigma), weight = result$getValue(w) %>% as.vector())

# result$getValue(risk)
# result$getValue(ret)

…with the above solutions similar.

Maximum Sharpe Portfolio

Another possible optimized solution is to specify the maximum Sharpe portfolio.

This involves:

Maximizing: \(\frac{W^T\mu}{\sqrt{w^T\Sigma w}}\)

s.t. (long only, fully invested): \(1^Tw = 1, w > LB, w<UB\)

Which can be put into convex form using:

Minimizing: \(\hat{w}^T\Sigma \hat{w}\)

s.t.: \(\hat{w}^T\mu = 1, \hat{w} > LB, \hat{w} < UB\)

with: \(w = \frac{\hat{w}}{1^T\hat{w}}\)

Now it is a quadratic solving problem, and we can again use either quadprog or CVXR to solve this (let’s create a generic function for doing this!).

MSRP <- function(mu, Sigma, LB = 0) {
  
  w_ <- CVXR::Variable(nrow(Sigma))
  prob <- CVXR::Problem(Minimize(quad_form(w_, Sigma)),
                  constraints = list(w_ >= LB, t(mu) %*% w_ == 1))
  result <- solve(prob)
  
  if(result$status == "infeasible") stop("Solution is infeasible. Please adjust bounds accordingly.")
  
  w <- tibble( Stocks =  colnames(Sigma), weights = as.vector(result$getValue(w_)/sum(result$getValue(w_))))
  w
}


Solution <- 
  MSRP(mu, Sigma, LB = 0)

RiskPortfolios

A simple alternative to the above (although you should definitely not simply default to simplicity) is to use the RiskPortfolios package in R.

It has some nice off-the-shelf functionalities that you can use.

See: ?RiskPortfolios::optimalPortfolio

E.g., in our example above:

pacman::p_load(RiskPortfolios)
Type = "minvol"
      Opt_W <- 
        RiskPortfolios::optimalPortfolio(mu = mu, Sigma = Sigma, 
                control = list(type = Type, constraint = 'user', 
                               LB = rep(LB, ncol(Sigma)), 
                               UB = rep(UB, ncol(Sigma))))

See the Bonus practical for a nice example of using this to string together a rolling optimizer…

Conclusion

While the above is a very basic introduction, the possible applications of optimization techniques in designing portfolio weights is manifold.

Be careful as these optimizers can (and do) often arrive at infeasible corner solutions. In practice, tt needs to be carefully constructed to consider other targets and considerations too (such as active risk, e.g., and sector exposures).

There are several more detailed texts³, but I hope you found the above useful as a high level overview.

O. Ledoit and M. Wolf, “Improved estimation of the covariance matrix of stock returns with an application to portfolio selection”, Journal of Empirical Finance, vol. 10, no. 5, pp. 603–621, 2003; 2O. Ledoit and M. Wolf, “A well-conditioned estimator for large-dimensional covariance matrices”, Journal of multivariate analysis, vol. 88, no. 2, pp. 365–411, 2004. D. Palomar Shrinkage and Black-Litterman↩︎
Fu, Anqi, Balasubramanian Narasimhan, and Stephen Boyd. 2017. “CVXR: An R Package for Disciplined Convex Optimization.” https://web.stanford.edu/~boyd/papers/cvxr_paper.html.↩︎
F. J. Fabozzi. Robust Portfolio Optimization and Management. Wiley, 2007.; G. Cornuejols and R. Tutuncu. Optimization Methods in Finance. Cambridge University Press, 2006.; D. Ruppert and D. Matteson. Statistics and Data Analysis for Financial Engineering: With R Examples. Springer, 2015.↩︎

Financial Econometrics Practical

Practical 3: Portfolio Optimization

NF Katzke