by Dominic Corva, Social Science Research Director
Washington State’s approach to cannabis legalization provides many lessons for other states coming on line in the future. One of the really state-specific lessons we have learned over the last four years is that we have created a situation in which measured THC levels on packaged flowers are much higher than those observed during the period of unregulated medical cannabis markets. There are actually two significant problems with this. Please note that this analysis refers specifically to flowers — processed cannabis products can and usually are homogenized in the process, so test results are less of a concern.
- High THC levels, which approach and sometimes exceed what is botanically possible, are inaccurate and misleading.
- Labs that consistently deliver higher THC results than other labs “corner the market” since higher results make products more likely to sell, under Washington’s particular approach to legal cannabis packaging.
Both of these problems create suboptimal public policy results, with respect to scientific accuracy, on the one hand, and unfair market advantage for untrustworthy labs on the other. Let’s take them one at a time, before examining a simple, completely technical approach that could virtually eliminate these problems.
Suspiciously high THC results can be the consequence of intentional and unintentional behavior or decisions by producers, processors, labs, and rule makers. There’s no reason to point fingers when there are so many factors contributing to this outcome. Instead, let me list a few.
- Consumers that are not allowed to sample or smell the product they purchase, due to packaging and retail display rules, are deciding what to buy on very little information. Adult-use consumers clearly show a preference for high THC numbers, which are required numbers found on each package. And like consumers of other products, they are swayed by clever and attractive packaging, which has no inherent correlation to the quality of the product inside. Most packaging provides a fairly limited window to observe the “bag appeal” of the product inside. In fact, “bag appeal” in these conditions refers significantly to the appeal of the bag, not what’s in it, because that all they can see. This is an unintentional outcome of strict packaging and display rules, and there is no reason to suspect foul play. It’s structural, not the outcome of bad intent.
- Labs, by rule, have one broad direction with respect to how they go about testing product. They are instructed by rule to follow the guidelines laid out about cannabis in the American Herbal Pharmacopeia. These guidelines were not designed to standardize lab methods in any particular direction. So, Washington’s labs have each independently arrived at how they follow those guidelines. This is also a structural problem, and to their credit many of Washington’s labs have begun collaborating to develop “best practices” to help standardize methodologies.
- Lab reference standards — the individual cannabinoid samples they use to compare industry products in order to arrive at numbers — can vary for a number of reasons. Most labs source these from Restek, but there are other suppliers out there. This can introduce variability for obvious reasons. Some non-obvious reasons for reference standard variability include how those are stored, and how long they are used before they are replaced. The temperature and pressure at which reference standards are stored contribute greatly to the their stability. And of course labs may vary in how long they go before replacing “used up” or unreliable standards, for intentional and unintentional reasons. This introduces a lot of potential variability in test results across labs. It is a structural problem faced in any laboratory industry that tests products using chemical reference standards. It could be improved, but not likely eliminated. Here is where the lesson that scientific methods are not perfect, but they can be more transparent.
- Cannabis plants are highly variable from top cola to bottom buds. This is a problem of sampling a product that is not uniform. It can be a little helpful to require independent, third party samplers. But even when that happens, the plant is highly variable. It’s a plant, not an industrial product. This is a structural problem, but more of a structural problem for science in general rather than our particular approach to cannabis regulation.
There are other factors that influence the variability of THC lab results, but the lesson is: THC results can be precise, but how accurate they are always an open question. Precise THC numbers are always a guide, not a guarantee. It is absolutely crucial that consumers and policymakers understand this limit. Now, the problem of market capture via THC inflation.
- How do producers and processors choose which lab to use? There are many business decisions to make when choosing labs. These include convenience, professionalism, and consistency — but as Dr. Jim MacRae’s work has shown with a high degree of methodological confidence, it’s pretty clear that producers and processors are choosing labs based how high their THC results can be. One caveat: producers and processors are clearly behaving badly when they agree to pay labs more for higher numbers.
- Producers and processors are making market decisions when they do this. As mentioned above, THC levels correspond with product velocity and therefore more sales. They may be perfectly aware that this is unfortunate, but they also know that the precision of lab results does not necessarily reflect accuracy in any case. So, given the choice of more sales rather than less, they choose more. I call this a structural problem, although there is certainly an ethically questionable element. Some producer/processors want to evolve consumers away from the importance of precise THC numbers, and choose the more challenging road because they have a long-term vision of the industry in which they operate.
- Labs are making market decisions when they choose to follow methods that result in higher THC numbers, AND labs are making unethical decisions when they receive payment for higher numbers, for example, or intentionally introduce methodological variations that favor higher results. We have also encountered the “drylabbing” phenomenon in Washington State, whereby labs literally just make up numbers to report. That’s way beyond the pale, and one lab has already been shut down for this. Other labs that may have done this in the past have cleaned up their act.
So, how do we tame THC inflation? Obviously, improved governance has some effect — and that governance includes the public reporting work that Dr. MacRae has done, which has clearly had a strong effect on lab motivation to not look bad. Every lab test result in this state is public information, and any public servant with database skills can request WSLCB data and run the numbers themselves.
However, there is a “silver bullet” technical fix that could absolutely clean up most of the mess. Here it is.
Normalize lab results by lab, and require only the percentile of each result to be listed on the package rather than a precise percentage.
Each lab has its own data population and range of results. For all of the reasons listed above, the numbers they produced are not comparable across labs; nor are they comparable across time (hello, life cycle of reference standards).
For a lab that consistently ranges up to 32% but no higher, for example, a sample that tests at 32% would be reported at the 100th percentile. The number on the package would be 100.
For a lab that consistently ranges up to 25%, a sample that tests at 25% would also be reported at the 100th percentile. The number on the package would also be 100.
Doing this changes the comparisons away from apples to oranges (how we are doing it now), and towards apples to apples.
Simply doing this would eliminate shopping for the highest THC prices immediately. Producers and processors could then make decisions on what lab they used based on best business practices and convenience, rather than unscrupulous behavior.
It would take a simple line of code in the tracking software, and a small shift in packaging information. It would also of course need to by dynamically updated, perhaps by using only the last 100 results. New laboratories would use the industry average for 100 results, and then once they had significant data from their own results switch over.
Listing a normalized percentile instead of a precise quantity would also help shift consumer and policymaker understanding of what lab results mean and how they are used. They should be use primarily to indicate that something is strong, qualitatively, rather than quantitatively, because again, cannabis flowers are variable and precise numbers do not mean accurate numbers.
That’s it. It’s an immodest proposal — an actual silver bullet based on understanding how statistical variability influences market outcomes, and therefore provides “loopholes” that reward unethical behavior.
I want to thank Dr. Jim MacRae especially for making this issue one that the industry and policymakers can’t ignore any more. He’s got his own style for doing so, but you can’t argue with his methods which are absolutely testable by any citizen-scientist that cares to. Fortunately, his work has been amplified by two cannabis journalists whose approach makes it a bit harder for people to ignore, Bob Young of the Seattle Times and Tobias Coughlin-Bogue of Leafly.
Hi Dominic,
What a great article! I think you have a very interesting suggestion for addressing the Accuracy problem – if I may paraphrase, simply normalize it within each lab. But I think there is a bigger problem here that I don’t see much discussion on, probably because the data is lacking to do any real analysis, but the problem is real. And that problem is Precision.
You mention a couple of times how the labs’ results can be Precise, but not necessarily accurate. For those that don’t know the difference, here is a link to a graphic that pretty clearly lays it out:
https://www.google.com/imgres?imgurl=http://cdn.antarcticglaciers.org/wp-content/uploads/2013/11/precision_accuracy.png&imgrefurl=http://www.antarcticglaciers.org/glacial-geology/dating-glacial-sediments-2/precision-and-accuracy-glacial-geology/&h=1363&w=2040&tbnid=d92uiG4-wAF2aM:&tbnh=140&tbnw=211&usg=__5_SC4MoAraOOU7FSsHT6e0OAS2k=&vet=10ahUKEwjSw9iUx-vTAhUXwWMKHXvGDmgQ9QEIKzAA..i&docid=DP3vEoGBlFj3NM&sa=X&ved=0ahUKEwjSw9iUx-vTAhUXwWMKHXvGDmgQ9QEIKzAA
Simplifying a bit, Precision is repeat-ability. It is a measure of how consistent the results are, as opposed to whether they are actually on the mark or not. Suppose we have a sample that we somehow know is exactly 20%, and we get 5 samples tested, and the results are 14%, 15%, 15%, 16%, and 15%. These results are not very Accurate (they’re off by roughly 25% of the known value), but they ARE fairly precise, repeatable, and consistent, differing by +/- 1% from each other.
The degree of Precision is usually expressed in terms of digits, or orders of magnitude. For example, most college statistics classes will teach that 15% (two digits) is less precise than 15.0% (three digits), which in turn is less precise than 15.00% (four digits). Why is that? The lower number of digits implies a greater variability, which means less measured repeat-ability. For example, contrast the numbers above with a different sample set that tested at 15.01%, 15.00%, 15.00%, 14.99%, and 15.01%. The first data set is 15% +/- 1%, while the second data set is 15% +/- 0.01%. They’re both 15% (not very Accurate), but the second data set is 100x more Precise.
I would argue this is what our labs are missing (both across labs and within labs). Unfortunately, we may not be capturing the data that is necessary to examine Precision, and the fact that we see reporting happening with 4 digits of Precision by labs that don’t seem to be able to reliably re-produce results with more than 1 digit of Precision seems to me to be the bigger problem.
Please give some thought as to how we might be able to examine repeat-ability and Precision both within a lab, and across labs. I’m guessing that it would involve collecting a lot more data than we do today, so that we could somehow know that any given group of samples came from the same “thing”, and thus *should* be repeatable, as opposed to from different “things” that rightly ought to be different. Then we could group those samples, and take a standard deviation, or something like that.
But until we have repeat-ability, and degrees of Precision that match our reporting, I’m not sure how much Accuracy really matters. If we have two samples from the exact same flower, same location, same timing, same lab, and they test at 13% and 27%, how reliable are either of those numbers? Even normalizing won’t fix this.
THC, as a metric, w-a-y overrated. I generally ignore those numbers as absolutes, simply view as “relative to”. Terpene profile what it’s all about for me. THC specificity just one more stupid thing WA does.
Great article. Well written once again by Dominic. I remember when you mentioned this idea to me a month-or-so ago. It’s an elegant solution. Very unfortunate that it is necessary at all, but it would work without any intervention with the labs. To keep parity with current results, the scale could be 0-30 instead of 0-100, that would make most samples at most labs relatively unaffected by the change, which would be easier for consumers to grapple with. IMO, a rolling 100 to get your percentile bins is not enough. Some labs do 100+ samples in a day, so they could bias their results one day but not the next. And especially during harvest season, sometimes one producer can provide 100 samples to a lab, so in that case the producer would be being compared only to him/herself, instead of compared to their peers, as intended. You need an n of at least 1000 in my opinion. Most of the labs already have that large of a flower n already.
In fact, this paradigm would encourage unscrupulous labs to withhold results that are abnormally high (quite the reverse of their current temptations), as high values would have the effect of skewing all other results to be lower. To cheat under this paradigm, the lab would have to test a whole bunch of low-THC trim and call it flower in order to skew the rest of their results up. Doesn’t make cheating impossible, but would make it much, much more difficult. And – of course – this correction method would only work well with flower, where the population of samples is roughly the same between labs. Extracts vary quite widely in concentration between processors and between extraction methods and even between runs, so extracts would be more difficult to pin down this way.
I’ll bring it up with the packaging and labeling committee at the LCB. I fear I already know the LCB’s response: “traceability system constraint.” Their software isn’t ready for such complexity as this. Keep up the good work. This issue is a huge burden to the industry and a solution is desperately needed. It’s a black eye for all of us.
Thank you Nick, this is amazing feedback the details of which really improve the suggestion considerably! I’m absolutely in concurrence and would add what you say here as canonical improvements to the “silver bullet”!