PDF Version: P123 Strategy Design Topic 4D – Using Quality Factors in Your Strategies
This is the final aspect of Quality, the one I think quite a few are waiting for: Earnings Quality. And that will be the final Topic in fundamentals since we’ll move next into Sentiment and then Momentum (I haven’t yet decided which sequence).
But although Topic 4D may not be focusing on what many may see as the good stuff, please give it serious consideration. Something as simple as ROE can go a long way toward creating a strong tailwind that can support anything else you’re doing. And considering how much harder it is to perform out of sample than in sample, we owe it to ourselves to understand whatever tailwinds we can find.
This is one of those topics that provides incredible, but often unappreciated, opportunities for creativity. In a purely Ivory Tower sense, companies with the highest quality metrics would always be the ones whose shares you’d want to own. How, after all, could one justify owning shares of other than the best firms? In the real world, though, these factors can often produce varied results. We’re always looking to the future, and when it comes to factors that can help us develop reasonable expectations of what’s to come, there often are other factors that do so with greater sense of immediacy. The key to successful use of Quality factors is to remember why we are using them and how they can work to enhance our probabilities.
Getting Started
Let’s start in our usual place, the DDM (Dividend Discount Model) which we re-jigger to define ideal PE as: 1 / (R – G):
- Lower interest rates drive R down and PE up. That’s pretty powerful. But aside from market timing (if you can do it), there’s no way for us to work with that since interest rates impact the market as a whole, and only a smaller number of stocks on an individual basis.
- As G (growth) rises, we know that’s good. But we’re talking about higher PEs and stronger stocks. But it’s future growth we care about.
However you feel about R and G influences, they both have the virtue of being pretty direct. Quality enters the equation through often-unnoticed back doors.
- On the one hand, it’s part of R based on its role in the risk-premium; the higher the quality, the lower the risk so all else being equal, the lower the R. That’s an important relationship. But it does not lend itself to a 30-second CNBC sound bite, especially since many segment producers are likely clueless about the relationship between quality and R and ideal PE (assuming they even know what PE is).
- Also, because Quality impacts consistency (in a good way), higher levels of Quality make it more likely that results will be more consistent, meaning that whatever assumptions about future G you develop through the only thing available (the past), they are more likely to be more effective as a basis for developing future expectations. Favorable expectations of G are fine but they are worth nothing to us if they don’t pan out. Again, however, this is not the sort of conversation you’re likely to see in the financial media.
- Finally, of course, Quality is indicative of a company’s ability to generate G in the first place.
Managing Expectations
In all three cases, we see that the workings of Quality tend to be hidden from the for-dummies level vantage points. We could try to study data and study some more and still more to find ideas that actually work. Or we could take the path of less resistance: Use quality as the market hands it to us, as a less conspicuous item that can give our main strategy, so to speak, an extra bit of oomph. Sometimes, Quality helps us make a good strategy better. Other times, it helps us convert a potentially lackluster strategy into something better. And don’t underestimate the possibility that Quality may simply help us narrow a good 350-500 stock strategy into a number that’s more investable, say 10-25.
This is why, speaking for myself and from my own experience, I’ve often found Quality factors to work best not necessarily when they are front-and-center in a strategy but when they take on a supporting role, as was demonstrated back when we covered Value. But for now, we’ll think of Quality as a primary goal.
Given the relationship between Quality and R risk component of R, don’t be at all surprised if you discover that adding good-quality-oriented factors into your models reduces simulated return. It won’t happen all the time, but given the natural relationship between lower risk and lesser return, don’t be surprised if you bump into it, often.
Test-Driving Return on Whatever
Lets’ start with Table 1, which summarizes a collection of screen backtests. Each screen started with a PRussell3000 universe, a MAX test period, a 4-week rebalancing assumption, and worked with all stocks that passed the screen (a setting of 0 for Max. No. Passing Stocks). Also, each screen contained one rule, as set forth in the table:
Rule | Basic Backtest | Rolling Backtest – Avg. Excess Return | ||||
Annl % Ret | Annl StDev | Beta | All Mkts | Up Mkts | Dn Mkts | |
FRank(“ROE%TTM”)>80 | 10.40 | 18.24 | 1.08 | 0.46 | 0.62 | 0.19 |
FRank(“ROE%TTM”)<20 | -0.91 | 34.68 | 1.80 | -0.05 | 2.14 | -3.46 |
FRank(“ROI%TTM”)>80 | 10.82 | 18.70 | 1.10 | 0.49 | 0.73 | 0.12 |
FRank(“ROI%TTM”)<20 | -1.52 | 35.44 | 1.82 | -0.07 | 2.16 | -3.56 |
FRank(“ROA%TTM”)>80 | 10.49 | 18.88 | 1.10 | 0.47 | 0.71 | 0.08 |
FRank(“ROA%TTM”)<20 | -1.59 | 35.68 | 1.84 | -0.07 | 2.22 | -3.65 |
FRank(“ROE%5YAvg”)>80 | 10.07 | 17.66 | 1.04 | 0.42 | 0.44 | 0.38 |
FRank(“ROE%5YAvg”)<30* | 3.91 | 25.88 | 1.36 | 0.15 | 1.15 | -1.41 |
FRank(“ROI%5YAvg”)>80 | 10.49 | 18.72 | 1.10 | 0.45 | 0.64 | 0.16 |
FRank(“ROI%5YAvg”)<30* | 1.75 | 30.06 | 1.63 | 0.03 | 1.71 | -2.60 |
FRank(“ROA%5YAvg”)>80 | 10.59 | 18.77 | 1.10 | 0.45 | 0.63 | 0.18 |
FRank(“ROA%5YAvg”)<30* | 1.40 | 30.05 | 1.63 | 0.01 | 1.74 | -2.69 |
* Threshold set at 30 to produce a reasonable number of stocks passing the screen
The table shows us some interesting things.
- While ROE, ROI and ROA are computed differently and provide different information, we can assume, unless otherwise dictated by the unique needs of a particular strategy, that the investment implications of the three ratios is, for all practical purposes, the same. (And we can presume likewise regarding the countless variations that can be found on Investopedia, Wikipedia and who-knows-how-many other sources.) We should not be surprised. If we look at the formulas, we should expect a high correlation in rankings from one ratio to another.
- In backtest, there is little difference between TTM and 5-year results and this is to be expected given the overall big-picture persistence of these return items. But that doesn’t mean we can flip a coin when modeling forward, as we must do when we think about real money. One who owns only 10-20 stocks needs to be sensitive to the exceptions that get papered over in larger studies such as this. So regardless of what academic-type tests show, there should still be a good reason for picking TTM or a five-year average (there is never an acceptable reason for even testing much less using a Q number in the context of Quality; if anything, it would be a Momentum factor and it will be discussed further when we reach that topic). Generally, a more return-oriented model can lean toward the more here-and-now TTM factor while one who is more interested in risk-control could work with longer-term averages.
- The gaps between best and worst raise the prospect that we can accomplish much in our work even if we do nothing else with these items other than to screen out the worst. This is one of countless reasons why its vital that you not obsess over ranking systems. It’s amazing how much you can accomplish even with plain-vanilla systems if you can run them against pre-qualified sub-universes that have already identified and weeded out potential trouble. One way to beat the market is to identify and overweight winners. Another equally effective way is to identify and underweight the dregs. That said . . . .
- Don’t be afraid of low quality if you understand what it means and want to work in that manner. Notice, the differences in rolling up- and down-market tests for the junk groups. There is a lot of downside risk; that’s obvious. But there is also serious upside potential. This happens because low quality is associated with poor consistency and if you are being aggressive, that’s what you want – poor consistency. Your job, in such a case, is to use screening rules to try to limit your sub-universe to situations more likely than not to capture the good part of inconsistency. Technical factors, momentum and sentiment can help a lot in this area.
- But through it all, if you just want a reasonably positive results and with no more than reasonable levels of risk, use of ROE, ROA or ROI can put a heavy tailwind at your back. This, by the way, is a big explanation for how Warren Buffett got to be Warren Buffett. He might not be able to create a 90%-alpha p123 sim. But still, he did pretty well for himself.
From the Big Picture to Investability
Like academic studies, the one above worked with large swaths of a large universe that identifies aggregate characteristics. We still have to work our way down to manageable-size portfolios. So the next set of experiments will examine what we can expect returns to accomplish for us if we limit our positions to 15. In all cases I’m going to work with ROE%TTM. If you want to repeat the experiments with 5 Year and/or with ROI or ROA, go for it.
I’m going to start with a very simple screen against the PRussell3000 universe (It’s been a while since I said this so a refresher couldn’t hurt: Please design your strategies with a universe no broader than the PRussell3000. Your goal is to come up with a strategy that can work with real money, not to produce eye-catching sims. So you do not help yourself if you design using a marshmallow universe. If you get something that works with the PRussell3000, you can always go back later to swap in the All Fundamentals universe and pop in some liquidity rules.)
Screening Rule:FRank(“ROE%TTM”)>80
Quick Rank:ROE%TTMhigher is better, pick Top 15
Here’s the result:
Figure 1
- Rolling Test Avg. Excess 4-week Return:
- 0.27% in all periods, 0.00 in up periods, and 0.69 in down periods
The result is positive, but a heck of a lot less appealing than what we saw in our academic-style testing. The basic numbers actually seem OK, but the picture alone suggests a lot less appeal. And the rolling numbers make it clear that we’re really dependent on bad markets, when this model might have some defensive appeal. But over the long time, we have more good periods than bad. So it seems that despite the aggregate virtues of ROE, when it comes to manageable size portfolios, we need to do more than rank and count down from the top.
What will follow now is a set of iterations, but these aren’t iterations in the sense discussed by statisticians. We’re not going to just change things around until we find a result we like and then say “We’re done.” Each iteration will be motivated by a specifically stated goal based on financial theory, specifically a search for what sort of datapoint is likely to point us in the direction of high ROEs that aren’t fluky but real. If rationally justified, you can have as many iterations as you want, limited only by your imagination.
I’ll offer seven iterations here in order to illustrate a thought-and-feedback process. Once you see what’s being done, you should be able to continue on your own. Let’s think of the above as Iteration #1 and move on.
Iteration #2
I know ROE%TTM works in the aggregate because financial theory tells me it works, and it helps that I saw it in the study whose results were shown in Table 1. But to make it work in the context of an investable portfolio, I need to do more than pick from the top. Knowing, as I do, that ROE works because of the way it impacts G and R in the DDM, I reason that ROE can’t accomplish what I hope it will if it’s in the process of trending downward. So I’ll experiment with this:
Screening Rules:
FRank(“ROE%TTM”)>80
ROE%TTM>ROE%5YAvg
Quick Rank:
ROE%TTMhigher is better, pick Top 15
Here’s the result:
Figure 2
- Rolling Test Avg. Excess 4-week Return:
- 0.62% in all periods, 0.99% in up periods, and 0.05% in down periods
The standard backtest is fine. I expected improvement (through the addition of a second rule that tried to weed out situations where ROE, although good, is trending lower) and got it. But it is a teeny bit imperfect – lackluster performance in the rolling down periods (most likely not different from zero to a significant degree). It’s not the end of the world; we do have more up than down periods.
But although we can live with what we have, why stop so quickly. The goal if testing is to learn what we can and can’t do as we translate ideas into p123 lingo. We can always come back to this and settle in if that’s what we ultimately decide to do.
Iteration #3
Screening Rules:
FRank(“ROE%TTM”)>80
OpMgn%TTM > OpMgn%5YAvg
Quick Rank:
ROE%TTMhigher is better, pick Top 15
One way to address trends in ROE is to drill down to its component parts, one of which is margin. As we know Net Margin is the version we’d consider if we want to strictly replicate the DuPont framework. But for our forward-looking purposes, we can swap in any other margin that we think will better illuminate the company’s potential future.
Let’s try Operating Margin. It’s high enough in the income statement to eliminate issues involving capitalization, special items, etc. but unlike gross margin, we’re spared the burden of deciding differences in how companies might allocate expenses as SG&A or COGS. As with Iteration #2, we seek a TTM figure in excess of the five-year average. Figure 3 shows the results.
Figure 3
- Rolling Test Avg. Excess 4-week Return:
- 0.49% in all periods, 0.99% in up periods, and -0.30% in down periods
It’s a mild step backwards. If we really want to work with margin, we’d need to consider why this moved us in the wrong direction. Maybe operating margin wasn’t the best choice. Maybe TTM>5Y is too lazy a way to articulate a trend; perhaps we need to get more granular and work year by year. (We might think of this with respect to Iteration #2 as well.) Maybe we should add something relating to turnover; either we drill down into DuPont components for real, or we don’t.
I’ll leave it to you to work further along these lines if you wish. What’s important to note, here, is how you move from one iteration to the next. It isn’t a matter of plugging in one thing or another to cover all possible bases. You have to talk to yourself, ask why the last thing wasn’t as good as you expected and what it might take to address the shortcoming.
Iteration #4
Screening Rules:
FRank(“ROE%TTM”)>80
FRank(“DbtTot2CapTTM”,#industry)<50
Quick Rank:
ROE%TTMhigher is better, pick Top 15
We know, from the DuPont framework, that leverage/debt is part of the package. We also know that all else being equal, less debt is better (be careful though, all else is a very broad concept). We also know that even if we hold operating income constant, we can boost ROE by making debt a bigger percent of the capital structure.
So perhaps we can mitigate the potential balance-sheet risk posed by a high-ROE model if we incorporate a leverage-related factor.
So how far should we go in limiting leverage. We have to start somewhere so I’ll pick an FRank<50 approach. And this would be a terrific time to spotlight an important issue that we haven’t yet addressed but which is always on the table: Should we do a complete sort, or should we so specialized sorts based on industries, sectors, etc.
There is no inherently right or wrong answer. If we want truly “better” companies, we should use an industry-type sort. Being able to operate with less leverage compared to others in the same business with similar balance sheet needs tells us something. If having a sector/industry diversified portfolio is important to us, we should work with such sorts. If you’re a pro running large accounts, you probably do need to go this way.
If you’re an individual, however, you can afford to refrain from this sort of thing. Sector imbalance is fine, if your model tilts you toward better sectors. A basic sort against the entire group will work for you here. (Diversification can be accomplished through multiple positions and use of multiple factors. Sector diversification is based heavily on stereotype and can increase risk if it forces you to increase exposure to high-risk businesses.)
Figure 4
- Rolling Test Avg. Excess 4-week Return:
- 0.49% in all periods, 0.86% in up periods, and -0.11% in down periods
That’s not so hot. It’s not horrible, but we’ve seen better.
I cheated and took out the #industry parameter, ran the test again, and got results closer to what we saw with Iteration #3, when we focused on operating margin. We definitely paid a price for having tried to be good citizens when it comes to industry exposure. But as with operating margin, we find that if we want to support a basic quest for high ROE with examination of individual DuPont components we’ll have to work harder at choosing and articulating them than I’ve so far done.
Iteration #5
Screening Rules:
FRank(“ROE%TTM”)>80
Rating(“Basic: Quality”)>90
Quick Rank:
ROE%TTMhigher is better, pick Top 15
Time for a change of pace: Let’s not try to support the basic high ROE rule with piecemeal DuPont considerations. Instead, let’s go whole hog with a lot of them at once. Hence use of the “Basic: Quality” ranking system as part of a screening rule. The factors are visible to you so if you check, you’ll see it covers a lot of territory.
Figure 5
- Rolling Test Avg. Excess 4-week Return:
- 0.42% in all periods, 0.56% in up periods, and 0.19% in down periods
We’ve made progress, very small progress but progress nonetheless. That’s seen in the rolling tests. This raises the prospect that we could do more along these lines, perhaps by tapping into more factors, if not through use of ranking systems than by use of more screening rules. Also, I could relax the 90 threshold. Remember, we can make progress simply by eliminating dogs.
Still, we are getting a bit Quality obsessed here . . . .
Iteration #6
Screening Rules:
FRank(“ROE%TTM”)>80
Rating(“Basic: Value”)>90
Quick Rank:
ROE%TTMhigher is better, pick Top 15
Here’s a change-up. We’ll leave the original ROE factors as they are, accept them for what they are pro and con, and broaden our strategy to go for Quality at a reasonable price, the Greenbatt philosophy. Greenblatt did it one way. This iteration illustrates another.
Figure 6
- Rolling Test Avg. Excess 4-week Return:
- 0.96% in all periods, 1.94% in up periods, and -0.56% in down periods
Our interest is piqued. Return is up. We’re not yet home; volatility is troublesome. And that is consistent with the DDM script (lower ideal valuation ratios are associated with greater risk), and with common sense (many low ratios are such because they deserve to be low; i.e. because the companies are bad). But Value is one of the most well established revered approaches. So rather than just run away from the risk, let’s see if we can tame it.
Iteration #7
Screening Rules:
FRank(“ROE%TTM”)>80
Rating(“Basic: Quality”)>90
Rating(“Basic: Value”)>90
Quick Rank:
ROE%TTMhigher is better, pick Top 15
We double-down on our use of Ratings-based Buy rules; we want stocks highly rated for Value and we control risk by insisting that they be strong in Quality too, a broad range of Quality and not just ROE TTM.
Figure 7
- Rolling Test Avg. Excess 4-week Return:
- 1.98% in all periods, 3.19% in up periods, and 0.09% in down periods
Well that’s eye-catching. We certainly got a higher return. But oh that volatility! What the heck is happening?
The answer is pretty easy to see.
We’re over-screening. The most recent run of the screen produced only 7 passing stocks. And we were similarly low, and lower, on many other occasions. We’re getting too crazy and overly exposing ourselves to too many things that have the functional equivalence of randomness.
That’s easy to fix . . .
Iteration #8
Screening Rules:
FRank(“ROE%TTM”)>80
Rating(“Basic: Quality”)>75
Rating(“Basic: Value”)>75
Quick Rank:
ROE%TTMhigher is better, pick Top 15
We just lower the screening-rule Rating thresholds from 90 to 75 (the latter number isn’t sacrosanct; I pulled it out of the wind). We now have 85 stocks that pass the screen. Maybe that’s fine (we ultimately sort our way down to 15) or maybe we need to lower the threshold a bit more. We can work with it. But let’s see if we’re at least on the right track.
Figure 8
- Rolling Test Avg. Excess 4-week Return:
- 0.98% in all periods, 1.45% in up periods, and 0.25% in down periods
Bingo. The train is moving forward again. We do see prolonged periods of sideways movement and then some serious rallies. That’s OK. Patience isn’t the worst thing in the world if the idea makes sense. And the fact that we’re bringing low valuation ratios into the picture suggests its unavoidable (value is an information arbitrage strategy, the idea being to look for situations in which the market has misjudged R and G, and sometimes it takes a while for that to play out).
We still have Iteration #2 in our pockets, so we don’t have to force ourselves to settle for this. We can (the overall returns are fine) but we may not have as much patience as we wish we did. It’s still an open question. More can be done. But hopefully, you now see how you can pursue answers.
It’s About The Process
One obvious goal of this Topic was to get you thinking the kinds of thoughts you need to be thinking as you work with Quality. Beyond that, though, I wanted to illustrate more closely the process of developing and testing a strategy.
The difference between curve fitting, data mining etc., practices that can make for great sims but rotten real-money results, is not to be found in statistical concepts (robustness, etc.). Actually, a robust model means you did a better job of predicting the past, so instead of having a mess, you may haver elevated it into a perfect mess. But a mess is still mess, and That’s not want to be doing.
A legitimately designed model is one that springs from rational ideas. If you’re data mining, 20 iterations can be too many. If you’re doing it right, 2,000 can still leave you with room for many more. And you know you’re doing it right if you can explain to yourself why you are choosing to test something and understanding why the test may have succeeded or fallen short.
And by the way, did you notice that none of these tests was disastrous. This is important. If your tests are deriving from sensible concepts, you should know, going in, that the results won’t be horrifying. Non-horrible outcomes should be presumed even before you click to “Run” a test. What you learn is whether it’s good enough.
Also, notice the DDM mentions. I didn’t spend a lot of time on them. But it’s important to recall how ivory-tower the DDM really is: D / (R-G) in a world where dividends may not be paid, where a non-infinite G might make for a negative P, etc. This Topic illustrates how, if we think of DDM as a conceptual anchor rather than a usable formula, we can build sensible models that test decently right from the first pass and which stand a good chance of succeeding with real money or being a springboard for something we can successfully use.
And hopefully, Table 1 motivates you to explore the full range of Quality and what you can do with it.
The next topic, the last in Quality, will be Earnings Quality.