Stock Selection Optimization

Pickstock is an optimization model from the financing sector. The goal is to pick a small subset of stocks together with some weights, such that this portfolio has a similar behavior to our overall Dow Jones index.

Data:

Performance (price) data of all 30 shares of the Dow Jones index over a period of 1 year:

Date	Stock	Price
2016-01-04	AAPL	105.35
2016-01-04	AXP	67.59
2016-01-04	BA	140.5
2016-01-04	CAT	67.99
2016-01-04	CSCO	26.41
2016-01-04	CVX	88.85
2016-01-04	DD	63.07
2016-01-04	DIS	102.98
2016-01-04	GE	30.71
2016-01-04	GS	177.14
[…]	[…]	[…]

Goal:

The goal is to find a small selection of stocks that follows the Dow Jones as good as possible. The interesting part is not how good the model performs with the data it knows about but how well it approximates the Dow Jones index in the future. In order to simulate this the stock data is devided into two parts: the training phase and the testing phase. The training phase is be the model input data, i.e. the data the model uses to find an index fund to approximate the Dow Jones index as good as possible. When the model finds a solution, the results can be evaluated in the testing phase.

Training and testing phase

The data of the testing phase is not used for the optimization but only for the evaluation. Since the objective function minimizes the absolute deviation between the DJ index and the selected stocks,the area between the two lines shows how good the solution is.

Example

Optimization model:

Select a subset (≤ maxstock) of Dow Jones stocks, along with weights, so that this portfolio behaves similarly to the overall index (in the training phase). The model is based on a linear regression over the time series, but it minimizes the loss using the L1-norm (absolute value), and allows only a fixed number of weights to take nonzero variable.

$$\text{minimize} \qquad \text{obj}:= \sum_{ds} \text{slpos}_{ds} + \text{slneg}_{ds} $$ $$\text{subject to} \qquad \sum_{s} \text{price}_{ds, s} \cdot w_{s} = \text{index}_{ds} + \text{slpos}_{ds} - \text{slneg}_{ds} \quad (\forall{ds})$$ $$w_{s} \leq p_{s} \quad (\forall{s}) $$ $$\sum_{s}{p_{s}} \leq \text{maxstock}$$ $$w_{s}\geq 0, \qquad p_{s}\in \{0,1\} \quad (\forall s)$$ $$\text{slpos}_{d}\geq 0, \qquad \text{slneg}_{d}\geq 0 \quad (\forall d) $$

Important Sets and Parameters:

Set       date                 'date'
          symbol               'stock symbol';

Parameter price(date<,symbol<) 'Price';

Scalar    maxstock             'maximum number of stocks to select'  /  2 /
          trainingdays         'number of days for training'         / 99 /;

Alias (d,date), (s,symbol);

The price data is provided by a CSV file:

$setNames "%gams.input%" fp fn fe
$if not set fileName $set fileName %fp%dowjones2016.csv
$call.errorlevel csv2gdx "%fileName%" output=stockdata.gdx ValueDim=0 id=price Index="(1,2)" Value=3 UseHeader=y
$gdxin stockdata
$load price

Definition of the two phases training days and testing (non-training) days:

Set td(date)    'training days'
    ntd(date)   'none-training days';

td(d) = ord(d)<=trainingdays;
ntd(d) = not td(d);

The mean price per stock is calculated which can be used in order to calculate weights:

Parameter
    avgprice(symbol)          'average price of stock'
    weight(symbol)            'weight of stock';

avgprice(s)       = sum(d, price(d,s))/card(d);
weight(symbol)    = avgprice(symbol)/sum(s, avgprice(s));

Computation of the contributions using weight and price:

Parameter contribution(date,symbol) 'contribution of stock on date';

contribution(d,s) = weight(s)*price(d,s);

Computation of index values:

Parameter index(date) 'Dow Jones index';

index(d)          = sum(s, contribution(d,s));

Variables and equations:

Variable
    p(symbol)       'is stock included?'
    w(symbol)       'what part of the portfolio'
    slpos(date)     'positive slack'
    slneg(date)     'negative slack'
    obj             'objective';

Positive variables w, slpos, slneg;
Binary variable p;

Equation
    deffit(date)    'fit to Dow Jones index'
    defpick(symbol) 'can only use stock if picked'
    defnumstock     'few stocks allowed'
    defobj          'absolute violation (L1 norm) from index';

deffit(td)  ..  sum(s, price(td,s)*w(s)) =e= index(td) + slpos(td) - slneg(td);

defpick(s)  ..  w(s) =l= p(s);

defnumstock ..  sum(s, p(s)) =l= maxstock;

defobj      ..  obj =e= sum(td, slpos(td) + slneg(td));

Model declaration and solve statement:

Model pickStock /all/;

option optCR=0.01;

solve pickStock min obj using mip;

Reporting parameters:

Parameter
    fund(date)                'Index fund report parameter'
    error(date)               'Absolute error';

fund(d)  = sum(s, price(d, s)*w.l(s));
error(d) = abs(index(d)-fund(d));

Set fHdr      'fund header'            / dj 'dow jones','index fund'  /
    errHdr    'stock symbol header'    / 'absolute error train', 'absolute error test' /;

Scalar error_train                     'Absolute error in entire training phase'
       error_test                      'Absolute error in entire testing phase'
       error_ratio                     'Ratio between error test and error train'
Parameter
       stock_weight(symbol)            'weight'
       dowVSindex(date,fHdr)           'dow jones vs. index fund'
       abserror(date,errHdr)           'absolute error'
       priceMerge(date,*)              'Price (stocks & dow jones)';

stock_weight(s)                        = w.l(s);
dowVSindex(d,'dj')                     = index(d);
dowVSindex(d,'index fund')             = fund(d);
abserror(td, 'absolute error train')   = error(td);
abserror(ntd,'absolute error test')    = error(ntd);
priceMerge(d,symbol)                   = price(d,symbol);
priceMerge(d,'DowJones')               = index(d);
error_train                            = obj.l;
error_test                             = sum(ntd, error(ntd));
if(error_train > 0,
   error_ratio = error_test/error_train;
else
   error_ratio = inf;);

Hypercube analysis script:

The Hypercube analysis script allows your to analyse a large number of scenarios to answer high-level questions like How many stock should I pick? or How many training days should I choose?.

This analysis script relies on Python to be installed on your machine. In addition, the GAMS Python API is required as well as the following Python packages:

notebook
pandas
matplotlib

The latter can be installed via pip install notebook pandas matplotlib.

Please first import all data this element depends on.

The table was set to read-only because the symbol contains duplicate records. You can remove duplicates using the "Scenario"->"Remove duplicates" dialog.

Search:

Solver to use

Load scenarios

No scenarios selected

Current job
Job list

Model status

GAMS output

Update

Import jobs

Job was discarded

Job was imported successfully

Access denied

The job was not found on GAMS Engine.

The maximum number of parallel downloads has been reached.

An unexpected error occurred. If this problem persists, please contact a system administrator.

dow jones vs. index fund
absolute error