767-plausible-performance-HERO

Plausible Performance: Have Smart Beta Return Claims Jumped the Shark?

December 2019
Read Time: 15 min
Key Points
  • Performance backtests are often used as evidence to “prove” a smart beta strategy is “better” than its competitors. In our view, careful attention must be given to these claims because backtested results are easily data mined.

  • The historical distribution of live fund performance is a useful guide to gauge what long-term smart beta strategy returns are truly plausible. We look at 40 years of mutual fund returns and find that consistent outperformance is elusive.

  • If performance claims seem too good to be true, they probably are. Over longer horizons, every strategy will encounter both favorable and unfavorable market conditions.


The number of strategic beta exchange-traded products (ETPs) has exploded in recent years. According to Morningstar, in 2018 nearly 1,500 strategic beta ETPs, representing approximately $800 billion in total assets, existed worldwide, with 132 new products coming to market in that year alone.1 More importantly, the total number of smart beta indices published by index providers and available for tracking has grown even faster, with institutional investors executing these strategies in separate accounts to the tune of billions of dollars.

Investors’ appetite for smart beta strategies is not abating. A recent FTSE Russell smart beta survey revealed that “78% of survey respondents have implemented, are currently evaluating, or plan to evaluate a smart beta strategy” (FTSE Russell, 2019, p. 4). With so much competition and such large numbers of assets at stake, the need for smart beta providers to differentiate their strategies as being “better” is becoming increasingly more intense.

So, what is the means to proving a strategy is better? A performance backtest, of course! Regardless of whether delivery is through an ETP wrapper or an institutional account, nearly all smart beta strategies are touted as having impressive (albeit backtested) excess returns. Many of these backtests have only 10 to 15 years of history and cover a very limited set of market environments. The pressure to generate ever-better backtests may also be a response to recently disappointing results of many smart beta strategies.2

The confluence of shorter time horizons, increased competition, and recent underperformance may well have led to smart beta backtests “jumping the shark,” that is, reporting utterly implausible return outcomes. Jumping the shark refers to the outlandish storyline of the 1970s American television sitcom Happy Days that followed one of the show’s main characters, Fonzie, on a California vacation as he accepted a dare to jump over a caged man-eating shark while on water skis. Now viewed as a pejorative term, jumping the shark refers to outrageous plot devices designed to generate attention.

It is our understanding that at least one factor strategy provider is claiming a 4% annualized excess return over the last 10 years, without incurring a single calendar year of underperformance versus the cap-weighted index. This and similar claims by other providers, like the Fonz’s biker-turned-stunt-skier routine, definitely get attention, but are they plausible? In this article, we examine live US mutual fund track records over the last 40 years to gain an intuition into how much outperformance is reasonable for the most superior smart beta strategy (or for that matter, active manager). We call this return “plausible outperformance.”3

Plausible Outperformance: The Mutual Fund Experience

At Research Affiliates, we believe that backtests can deliver valuable insights into the behavior of rules-based strategies, especially those targeting well-established and accepted factor exposures. That said, we also know that backtesting is a great way to maximize positive error through the combination of selection bias and data mining. Care and attention need to be applied to the construction of backtests and the interpretation of their results to establish realistic forward-looking return expectations (Harvey and Liu, 2015). So, assuming the backtest has not been data mined too aggressively, what are the levels of excess return we can plausibly expect to earn in the future with live assets over a realistic holding period?

To gain an understanding of realistic levels of outperformance achieved over various investor holding periods, we surveyed US mutual fund data over the past 40 years from January 1, 1979, through December 31, 2018. We used the Morningstar survivorship bias–free database of US equity open-end mutual funds, excluding index funds, which survived for at least one calendar year. For funds with multiple share classes, we chose the share class with the longest history. The total sample includes 4,463 funds. We simply measured the after-fee returns of each fund relative to the S&P 500 Index in order to gauge each fund’s ability to outperform the market.

We chose to measure each fund relative to the S&P 500 rather than against the fund’s stated benchmark or Morningstar-assigned benchmark. We did this for the following reasons:

  • First, we want to use the same benchmark used in the shark-jumping claim.
  • Second, the goal of our research is to understand the historical ability of funds to beat the market rather than to beat their benchmark.
  • Third, we make the assumption that investors in US equity mutual funds primarily care about beating the S&P 500 and they have different beliefs as to the best way to do this: choosing to invest in small-cap funds, mid-cap funds, or growth/value (or other factor)-tilted funds within any size category.
  • Fourth, fund managers have discretion as to what they list as their benchmark in their prospectus and they could be motivated to pick an easy benchmark to beat.
  • Finally, measuring each fund against the S&P 500 allows our results to have a clean and easy interpretation.


Because the newer methods of strategy construction are better than what fund managers had at their disposal in the past, some may very well counter that using historical mutual fund returns to understand the plausible outperformance of smart beta indices is an apples-to-oranges proxy. We concur that history is full of innovations,  that often the new methods are better, and that hundreds of billions of dollars have poured into these new methods. No guarantee exists, however, that today’s new methods will perform better than the best strategies in the past and indeed many won’t add value at all.

In fact, many smart beta strategies replicate successful strategies of active managers. For example, Frazzini, Kabiller, and Pederson (2018) found that nearly all of Warren Buffet’s public stock performance at Berkshire Hathaway can be explained by exposure to the quality, value, and low beta factors.4 As Towers Watson (2013) (now Willis Towers Watson, the consulting firm that coined the term smart beta) stated: “Smart beta is simply about trying to identify good investment ideas that can be structured better…. Smart beta strategies should be simple, low cost, transparent, and systematic.”

Smart beta proponents may claim the structured-better element will lead to better performance than what mutual funds have historically earned, at least partially through lower fees. Smart beta portfolio management is closer to indexed portfolio management and therefore can (and arguably should) be priced closer to index fund fees.5 In contrast, active mutual fund managers need to pay for analysts and portfolio managers and, of course, pass along this overhead cost through higher management fees.

Effective smart beta strategies likewise require much skilled analysis, but are able to bypass the high costs of forecasting and of ongoing fundamental monitoring. It is reasonable to expect lower management fees for these strategies because of their rules-based implementation. A well-designed smart beta strategy should have low transaction costs given that it typically has broad stock holdings and minimal turnover. Unfortunately, the expectation of lower transaction costs is not always realized. We assert the true skill in smart beta investing is in balancing the intended factor exposure while minimizing transaction costs and other forms of implementation shortfall (Israel, Jiang, and Ross, 2017, and Chow et al., 2011).

Nonetheless, analyzing the historical performance of actively managed mutual funds in order to understand the distribution of future plausible live performance offers two advantages versus simply looking at backtested results. First, historical fund returns are net of transaction costs and management fees, which most smart beta backtests blissfully ignore. Although smart beta strategies are likely to have lower management fees and, for well-designed strategies, lower transaction costs, these costs aren’t zero—yet this is exactly what is implied in backtests!

Second, mutual fund managers do not need to publish their methodologies. A mutual fund manager can find a profitable anomaly and not disclose it. In contrast, smart beta index strategies typically have a rulebook that explains the method used for stock selection, the rebalancing dates, and so forth. Savvy investors therefore can determine ahead of time which stocks these strategies will trade, when they will trade, and in what quantities.

McLean and Pontiff (2016) examined the out-of-sample performance of 97 equity factor strategies and found that post-publication premiums declined by an average of 32% versus the published figure. Our own work covering the most widely used smart beta strategies found that out-of-sample excess return following publication falls far short of the in-sample published results (Li and West, 2017, and West and Hsu, 2018).6

Considering these various points, we believe the pros easily outweigh the cons of extrapolating the mutual fund experience to smart beta backtests. Again, our intention is to look at plausible outperformance over realistic investor holding periods.

We analyzed the frequency of mutual fund outperformance over 1-, 2-, 3-, 5-, and 10-year periods at various levels of outperformance, ranging from just barely beating the market (> 0%) to beating it handedly (> 5% annualized). We are able to make two major observations: 1) most mutual funds underperform the market regardless of time period, and 2) the win rates of star performers subsequently collapse.

Most Mutual Funds Underperform the Market
Our first observation is self-evident and well-trodden territory. Most mutual funds underperform the market regardless of time period. Only 43.28% of the 1-calendar-year fund periods beat the market (i.e., a 43.28% win rate) based on 53,127 observations. That figure drops slightly when we extend the analysis to 3-calendar-year periods; only 41.61% of the 44,568 observations are winners, earning an average annualized excess return of −0.52% a year. When the performance period is extended to 10 years, the win rate actually improves slightly to 46.19% with only a −0.09% a year shortfall. This is largely due, however, to survivorship bias creeping into the results.

767-plausible-performance-table-1

This may seem counterintuitive given that we used the Morningstar survivorship bias–free database for our analysis. Let us explain. The Morningstar survivorship bias–free database contains all fund returns including those that closed over the 1979–2018 measurement period. Funds generally have to do well to stay alive for 10 years (let alone a full 40 years) in order to be counted in the 10-year averages. Survivorship bias comes into play, for example, when a fund closes after 3 years of poor performance; that fund will show up in the 3-year averages, but not the 10-year averages.

The 1-year numbers have little survivorship bias, but each added year introduces more. The mutual funds with poor 3-year returns often fail to make it to a 5- or a 10-year track record. Fund companies know performance sells, so mutual funds with poor performance get shuttered. On average, 5.9% of the funds in our data set closed in any given year, and only 2,194 survived for 10 years or more, meaning that 50.8% of funds did not make it to the 10-year mark! Presumably, these funds were not delivering stellar performance.

Mutual Fund Star Performers Subsequently Underperform
The second observation we make from our analysis is that win rates of the truly star performers—those with excess returns greater than 4% a year—drop off over time. As the measurement period extends from 3 to 10 years, the win rate falls from 17.18% to 9.19%, even with the aforementioned survivorship bias.7 Digging into the results, we find that the majority of the 9.19% occurred over 10-year periods ending between 2004 and 2013.

Heroic outperformance generally does not endure as market cycles progress. Funds typically have exposure mandates, and as a result, the funds optimally positioned to take advantage of today’s popular asset-class and factor exposures will not be exposed to tomorrow’s. In other words, when the value factor does well, most value funds do well, and when low volatility is the factor du jour, low-vol funds outperform, and so on. But factors and asset classes inevitably undergo periods of underperformance, and so do the funds exposed to them.

Although asset class and factor exposures are well known as the main drivers of portfolio returns, considerable variation exists in fund returns even among funds with similar exposures, such as large-cap value. Three key interrelated elements can contribute to a fund beating the market over a 3-year horizon, which typically encompasses only one market condition. But over a longer horizon such as 10 years, which likely encompasses multiple market conditions, these elements do not hold up:

  1. Luck or “noise.” With a starting universe of a few thousand equity securities, a strategy—especially a concentrated one—can get randomly lucky given the large standard deviation of returns of individual securities and industries.
  2. Capacity and trading issues. Successful strategies often get a flood of assets, leading to some combination of higher transaction costs and/or drifting away from the original successful style. Berk and Green (2004) demonstrated that fund size is inversely related to performance. Harvey and Liu (2016) estimated an alpha decrease of 20 basis points (bps) if a fund doubles in size over one year. Li et al. (2019) found that trading costs are directly proportional to assets under management, turnover, and portfolio liquidity.
  3. Changes in relative valuations. Lastly, Arnott et al. (2016) showed that changes in relative valuation can have a dramatic impact on shorter-term 3–5 year results, creating an illusion the factor or strategy has terrific value-add, when all that really happened is the strategy became more expensive.8 Arnott, Beck, and Kalesnik (2017) found that a factor’s most recent 5-year performance is negatively correlated with its subsequent 5-year performance. Thus, it is unsurprising when 10-year returns are lower than 3- and 5-year returns as mean reversion tends to take down these shorter-term winners.
767-plausible-performance-figure-1

Importantly, these dynamics are not mutually exclusive. Randomness can cause a one-time valuation shock upward, which may be followed by increased assets under management and higher transaction costs, which in turn can reduce returns.

Consistency Proves Consistently Elusive!

The cyclicality of returns is a challenge for both asset managers and their clients. Clients want high excess returns with consistency. Smart beta providers are well aware of this concern and are increasingly emphasizing multi-factor strategies to ostensibly alleviate wide performance swings associated with a particular investing style. Recently, the main driver of flows into multi-factor strategies has been disaffection with value. Value has been trading unusually cheap, while other factors are for the most part trading rich relative to their own history.

Likewise, live mutual funds over the past 40 years have experienced the ebbs and flows of cyclical performance, which has led investors to seek strategies that have lower tracking error. So we ask, are the backtested multi-factor results—too often presented as a reasonable basis for forecasting future returns—supported by the live track record of the best performing mutual funds?

To answer this question, we broke down the total number of calendar years that each surviving mutual fund outperformed in each time period of 1, 2, 3, 5, and 10 years. We previously noted that only 17.18% of mutual fund track records produced 3-year annualized excess returns above 4%. That track record, however, may have occurred in a single year, which doesn’t achieve the consistency of returns that investors seek. So, what percentage of funds beat the market by over 4% a year over a 3-year span and also outperformed in each 1-year period? Achieving this level of consistency is much more difficult to do. Only 3.71% of the 44,568 observations in our sample—about a quarter of the 3-year winners—managed to accomplish this feat.

767-plausible-performance-table-2

Let’s return to the jump-the-shark claim, namely, that a smart beta backtest produced a 4% average annual excess return over the past 10 years while also outperforming each year. How often has this happened in 40 years of mutual fund data? In our data sample of 10-year periods from 1979 through 2018, the answer is never. Not once. In effect, any smart beta vendor who suggests that this is a reasonable expectation is laying claim to skill that no asset manager has ever exhibited before—including themselves if they have a live 10-year history! And what if we relax the assumption just a bit? How about a 3% annualized 10-year excess return with 9 years of outperformance? This has happened twice in 23,740 observations. Arnott, Cornell, and Shepherd (2018) defined a bubble in asset prices as requiring implausible future return assumptions. Might we have reached a bubble in smart beta performance claims?

What Is Plausible?

Clearly, earning a 3–4% annualized 10-year excess return with little to no annual shortfalls appears implausible. How many managers aspired to this outcome 10 or 20 years ago? Probably quite a few. How many achieved it? None, of course. What is a more plausible return assumption for long-term investors? Based on our analysis, it appears reasonable to assume that the best smart beta strategies can earn an annualized 10-year excess return of 1–2% net of transaction costs.

This estimate also lines up with the way Willis Towers Watson describes smart beta as “good investment ideas that can be structured better.” They aren’t claiming smart beta is a magic elixir that can deliver returns no active manager has ever produced! The estimate also lines up with the 1–2% a year expected excess return that our research indicates. Of the 28 replicated smart beta strategies included in the Research Affiliates Smart Beta Interactive (SBI) tool, 24 had an historical excess return net of trading costs between 1% and 2%. Of course, different strategies have different transaction costs and starting valuation levels, which impact the returns they produce. The differences in the strategies can be compared more fully and across geographic regions using the SBI tool.

Now, let’s look at the consistency of returns. Sadly, investors’ desire to avoid short-term underperformance is woefully unrealistic. Most long-term outperformers earn an excess return in only 5–6 years out of 10. A smart beta strategy, indeed any strategy whose performance deviates (even successfully) from the market’s performance, is virtually guaranteed to have multiple years of underperformance over a 10-year holding period.

767-plausible-performance-figure-2

Backtests, especially those optimized to maximize the backtest results and then presented in sample (spanning the very years that were used to develop the model), may create the illusion of seemingly massive excess returns and limited to few if any bouts of underperformance. A long-term survey of live mutual fund returns reveals a very different picture.

Conclusion

Our purpose is not to bash backtesting nor to discourage sharing backtests with sophisticated clients and prospects. Empirical research depends on backtesting. Our concern is that the quant community uses backtests repeatedly to fine-tune backtests’ results. This practice is exacerbated with smart beta index strategies, because the cost of launching another index backfilled with the better track record is virtually nil. In our view, if a backtest is used, iteratively and repeatedly, to boost a strategy’s own backtested performance, the strategy probably should be discarded.

The Happy Days sitcom continued to generate consistent ratings after the 1977 jump-the-shark episode. In 1978, it was the fourth highest-rated television program in the United States and aired in prime time for another five seasons. After syndication, the show aired on networks in the United States, United Kingdom, and Australia under the name Happy Days Again—and you may still be able to watch it today, 35 years after its final season! It also sported successful spinoffs, such as Mork and Mindy and Laverne and Shirley, starring Hollywood notables Robin Williams and Penny Marshall, respectively. By any measure, the show was a substantial success both before and after its preposterous episode.

We, likewise, feel the same about smart beta. Some investors may look at a long-term 1–2% excess return and not be impressed, particularly when accompanied by 4 to 5 years of underperformance over a 10-year investing horizon. Those investors may be lured instead by more grandiose backtested claims, but our study of live mutual fund returns indicates these inflated claims are implausible.

A 1.5% return premium can add upward of 20% more wealth after 10 years.9 With savers currently penalized by low interest rates and generally high equity valuations, carefully selected allocations to the better smart beta strategies is one of the more effective ways to narrow the return expectations gap. And with many smart beta strategies, especially those linked to the value factor, trading at abnormally cheap relative valuations, we see happy days again for smart beta investors with reasonable expectations.

Endnotes

1.  Adding to the ambiguous language around these strategies, Morningstar calls smart beta strategic beta (Johnson, 2019).

2.  Much of the underperformance falls within the range of returns implied by longer-term histories.

3.  In order to outperform the market with our investments, we must necessarily make several underlying assumptions. First, we must believe that the capital markets are inefficient. Second, we must believe that managers can identify these inefficiencies and that strategies exist to exploit them. Third, we must believe that the manager’s ability to add value exceeds all implementation costs and fees and that we have the requisite skill to identify these managers in advance before they become large enough to arbitrage away their own ability to add value or rich enough to no longer care. Lastly and most importantly, we must display the requisite patience to hold these strategies through inevitable periods of cyclical underperformance (West and Ko, 2014.)

4.  As Israel, Jiang, and Ross (2017) were careful to point out, these exposures don’t degrade the outstanding track record Buffett has achieved, specifically praising him for “recognizing early on that these investment themes work, applying leverage without ever having a fire sale, and sticking to his principles.”

5.  We would suggest that the obsession with fees has arguably gone too far. We have witnessed the case of a client thinking one strategy will beat another by 50–100 bps a year but rejected the strategy over a 1 bp fee difference, as well as another instance in which expected trading costs of 50 bps a year were dismissed to save 1 bp in fees. We question the wisdom of spending 20 bps more in assured cost in order to earn 50 bps more in “backtest expected” (but by no means assured) excess return. When the measurable costs—the bird in the hand—matter 10 or 100 times more than the uncertain expected benefits—the bird in the bush—that’s nonsensical.

6. We additionally find that, on average, a new index will outperform the market by nearly 5% a year for three years prior to the launch of an ETF based on it. After the ETF’s launch, however, the index behaves similarly to an average investor portfolio (Brightman, Li, Liu, 2015).

7.  Again, survivorship bias likely inflates the 10-year experience; in other words, the drop-off is likely even greater. If 50% of funds on average do not survive for 10 years, and 9% of the survivors achieve a 4.0% excess return for 10 years, it’s not unreasonable to assume that only 4.5% of the funds available at the start of a 10-year period will finish with a 4.0% value-add. Additionally, this assumes there is no time variation in the likelihood of outperformance, which we believe is a strong and generally incorrect assumption.

8.  Sometimes, the entirety of the outperformance can be attributable solely to changes in relative valuation. For example, the Generation 1 Value strategy included in the Research Affiliates Smart Beta Interactive tool outperformed the market by 7.5% over the five years ended December 2005, well above its longer-term excess return range of 1.0%. Over 100% of these cyclical excess returns came from the strategy becoming dramatically more expensive! Gen-1 Value started the period at a valuation discount of 44% to the broad market and finished at only a 15% discount, trading at only a slight discount to the broader market. Indeed, in late 2005 and early 2006 value was trading at peak relative-valuation levels not seen since the 1970s and late 1980s. Put another way, value was trading at a smaller discount to the market relative to growth than its historical norm, setting the stage for the headwinds value has faced over the last dozen years.

9.  Any estimate of real (after inflation) ending wealth needs to incorporate an estimate of the market’s return. In other words, what is the base real return upon which we add the expected excess return in order to compound over the investment horizon? For instance, we estimate global equity markets will provide a real return of 3.0% plus a 1.5% premium from factor exposures. A higher realized market return would allow for a greater differential on which the excess return could compound. A 1.5% premium over the past 10 years (global stocks compounded at 7.9% a year) would translate into a 32% real wealth advantage.

References

Arnott, Rob, Noah Beck, and Vitali Kalesnik. 2017. “Forecasting Factor and Smart Beta Returns (Hint: History Is Worse than Useless)” Research Affiliates Publications (February).

Arnott, Rob, Noah Beck, Vitali Kalesnik, and John West. 2016. “How Can ‘Smart Beta’ Go Horribly Wrong?” Research Affiliates Publications (February).

Arnott, Rob, Bradford Cornell, and Shane Shepherd. 2018. “Yes. It’s a Bubble. So What?” Research Affiliates Publications (April).

Berk, Jonathan, and Richard Green. 2004. “Mutual Fund Flows and Performance in Rational Markets.” Journal of Political Economy, vol. 112, no. 6 (December):1269–1295.

Brightman, Chris, Feifei Li, and Xi Liu. 2015. “Chasing Performance with ETFs.” Research Affiliates Fundamentals (November).

Chow, Tzee-Man, Jason Hsu, Vitali Kalesnik and Bryce Little. 2011. “A Survey of Alternative Equity Index Strategies.” Financial Analysts Journal, vol. 67, no. 5 (September/October):37–57.

Frazzini, Andrea, David Kabiller, and Lasse Pedersen. 2018. “Buffett’s Alpha.” Financial Analysts Journal, vol. 74, no. 4 (Fourth Quarter):35–55.

FTSE Russell. 2019. “Smart Beta: 2019 Global Survey Findings from Asset Owners.” Available at FTSE Russell.com.

Harvey, Campbell, and Yan Liu. 2015. “Backtesting.” Journal of Portfolio Management, vol. 42, no. 1 (Fall Issue):13-28.

———. 2016. “Does Scale Impact Skill?” Duke I&E Research Paper No. 2016-46.

Israel, Ronen, Sarah Jiang, and Adrienne Ross. 2017. “Craftsmanship Alpha: An Application to Style Investing.” Journal of Portfolio Management, vol. 44, no. 2 (December Multi-Asset Special Issue):23–39.

Johnson, Ben. 2019. “Strategic-Beta Exchange-Traded Products Suggest Maturity: The Latest Insights from Our Annual Report on the Strategic-Beta ETP Landscape.” The Big Picture blog, Morningstar.com (March 27).

Li, Feifei, Tzee-Man Chow, Alex Pickard, and Yadwinder Garg. 2019. “Transaction Costs of Factor-Investing Strategies.” Financial Analysts Journal, vol. 75, no. 2 (Second Quarter):62–78.

Li, Feifei, and John West. 2017. “Live from Newport Beach. It’s Smart Beta!” Research Affiliates Publications (August).

McLean, R. David, and Jeffrey Pontiff. 2016. “Does Academic Research Destroy Stock Return Predictability?” Journal of Finance, vol. 71, no. 1 (February):5–32.

Towers Watson. 2013. “Understanding Smart Beta.” Insights (July 23).

West, John, and Jason Hsu. 2018. “The Biggest Failure in Investment Management: How Smart Beta Can Make It Better or Worse.” Research Affiliates Publications (October).

West, John, and Amie Ko. 2014. “Hiring Good Managers Is Hard? Ha! Try Keeping Them.” Research Affiliates Fundamentals (November).