Stock Market Data Scenario Set Generation – S&P 100

I just love to create portfolio optimization models based on Optimization theory and such models require a well-defined return scenario set which is nothing more than a matrix where we have a joint possible set of returns of all our assets under consideration. The easiest way is to use historical data for this purpose. While it is dangerous to use historical data in many price-based single asset strategies, it definitely makes sense for portfolio-based analysis because we capture the empirical dependency of the assets which works surprisingly well.

We focus on the S&P 100 here, but the code can be extended to any set of assets. Furthermore we download the data from Yahoo! Finance which optimally should also be replaced at least with e.g. Alpha Vantage.

Speaking about the code – it is also available as a GIST on GitHub which your can find under this link.

We require two libraries to replicate all the code below, i.e.


First we need to get the list of all components of the S&P 100. Because we are lazy we use (and web-scrape) Wikipedia for this purpose. Please note that we hard-coded the fact that the list of assets and ticker symbols is in the third table of this Wikipedia page. If the structure of the page changes this number has to be adapted. Last check: August 5th, 2019.

table_id <- 3
url <- ""
sp100 <- url %>%
  read_html() %>%
  html_nodes("table") %>%
  .[[table_id]] %>%

Next we store the parsed info (i.e. ticker symbols and company names) into two vectors:

sp100_ticker <- sp100$Symbol
sp100_company <- sp100$Name

Now we are ready to download all price data from Yahoo! Finance and store all separate data frames into one large list for easier post-processing:

sp100_data_complete <- list()
for(current_ticker in sp100_ticker) {
  sp100_data_complete[[length(sp100_data_complete) + 1]] 
    <- get(current_ticker)

Next we check whether enough data for all stocks is available and if not we drop the stocks. In this case we require at least 1500 days of data availability.

rows <- sapply(sp100_data_complete, function(x) nrow(x))
nrow.cutoff <- 1500
pos <- which(nrows < nrow.cutoff)
sp100_data_complete <- sp100_data_complete[-pos]
sp100_company <- sp100_company[-pos]
sp100_ticker <- sp100_ticker[-pos]

Now we are ready to compute returns. To do so we start by extracting the Adjusted Closing Prices only from the OHLC/VA data we have.

current_pos <- 1
sp100_adjusted <- Ad(sp100_data_complete[[current_pos]])
for(current_pos in 2:length(sp100_ticker)) {
  sp100_adjusted <- merge(sp100_adjusted, 
names(sp100_adjusted) <- sp100_ticker

Then we may select a certain time frame and compute daily returns from this set of Adjusted Closing Prices:

timeframe <- "2013/2019"
sp100_returns <- dailyReturn(sp100_adjusted[,1])[timeframe]
for(current_pos in 2:length(sp100_ticker)) {
  sp100_returns <- merge(sp100_returns,
names(sp100_returns) <- sp100_ticker

Now we are done! The full code contains one more (almost unnecessary) cleanup, but in any way we may now store our data for subsequent use.

scenario.set <- sp100_returns
save(scenario.set, sp100_ticker, sp100_company,

That’s it! Now you have a perfect scenario set to (back)test any portfolio optimization methods based on scenario input. Enjoy!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.