When Algorithms Assume
Artificial Intelligence is a major topic today, not just in communications and Contact Centers, but across business. Most of the current crop of business-focused “AI” is in fact Machine Learning (ML). In ML, a large data set is used to “train” the algorithm to predict an outcome based on a range of interacting variables. Using modern data sets that have millions and even billions of test sets and thousands of variables, the ML program is able to train a neural network to predict an outcome. Once the parameters of the neural network are set, the ML algorithm can then be used to predict the probability of an outcome in future data sets.
The capability of ML is incredible, given a large and valid database. ML for speech recognition has improved dramatically. But AI/ML is in much of our everyday lives, too. The suggestions of movies to watch or products to buy based on your previous watching/buying patterns are used to predict other things you may like. The pop-ups on Amazon or the “for you” from your video provider are examples. Beyond shopping, AI/ML is at the core of services like Waze that use a combination of real-time data and historical data sets to make an amazingly accurate prediction of when you will arrive. This technology is being implemented in an array of applications, from predicting failures in parts to monitoring complex eco-systems. While there are many discussions about AI becoming sentient and threatening, a prediction made by an ML solution is only as good as the underlying data set. ML doesn’t create new data; it analyzes existing data to predict an outcome with a new data point.
When implementing your first AI/ML project, before beginning to “train” the algorithm, experts emphasize that the most important step is to “clean” the data. Any erroneous or mis-entered data needs to be cleaned/corrected. In fact, in most AI/ML projects, getting a clean valid data set is one of the most critical, and time consuming, steps. As they say, “garbage in, garbage out.” The quality of the ML prediction is only as good as the underlying data set and the quality of the data sets for which prediction is to be made.
But what happens if the well cleaned and valid data set that is being used is invalidated by an event that changes the actual data relationships so significantly that the huge historical data set is no longer valid? Recently I experienced two examples of this, and I believe it is an important part of assuring the validity of an AI/ML implementation.
The first was in late March, in the beginning of the shelter-in-place in California. I needed to make a trip to Moss Landing, south of San Francisco, about 80 miles southeast of where I live. As I was going to get some photographs, I felt I could stay socially distant. While there, I stopped at a local fish monger (selling from a boat in the harbor with great social distancing) and bought a couple of Dungeness crabs, with the warning that they should be cooked in no more than 60-90 minutes as I set off home. As it was a Friday, normal commute traffic would be abysmal, resulting in a 2-hour plus journey. I was not really concerned as the early shelter traffic was minimal. When I put my home address into Waze, I was surprised by the projected trip time, Waze said it would take 2 hours and 5 minutes to get home. But, as I drove, the arrival time continued to get earlier and earlier. In the end, it was just over an hour drive with almost no traffic on the road. This was what I expected, but not what Waze predicted.
Clearly, Waze must use the historical arrival rate of thousands of commuters to predict future daily traffic levels based on historical data. And the data set clearly showed that, with typical commute levels, it would be a terrible drive. But, with 95% of commuters staying home, the algorithm was fooled. Similarly, as cars returned to the roads, the most recent data would be less useful, though the rate of change would be more gradual.
The second change was in the weather forecast. During the August fires in the Bay Area we also had a predicted increase in temperatures. On one specific day, weather.com predicted over 90 degrees for the next day. We woke up to the red sky day of smoke from a plethora of local fires. In fact, the sun never broke through the smoke that day and the temperature never went above about 70 degrees. Again, the smoke was such a significant change that the weather data set predictions were no longer valid. Smokey skies were not included in the weather ML algorithm’s test sets as part of the neural networks’ outcome predictors.
Both were the result of so called “Black Swan” events: a change so dramatic it was beyond the prediction range of common knowledge. A Black Swan event is something that common knowledge/belief says cannot exist or will never happen. In the 1600s, Europeans could not imagine black swans as all European swans are white. When Europeans arrived in Australia and saw black swans for the first time, the “all swans are white algorithm” was upended.
Similarly, the concept of a large-scale office shutdown due to a pandemic or fires that blackened the sky were both unimaginable before they happened. In both cases, the algorithms that were incredibly accurate using historical test sets became virtually useless as the data set was no longer valid for current conditions. One clear lesson from this is that a major event that impacts sunlight (fires, volcanoes, asteroids) can really have a major impact on temperature and the global eco-system. It was easy to imagine what life would be like if that red sky day were to become normal or worse for a year or more as a major volcano eruption might produce.
Beyond the life lesson that Black Swan events can really happen, for enterprises implementing AI/ML, this brings up an important point to consider. When you define the AI/ML data set, have you considered the business circumstances where that data set may be invalid, either due to an external or internal Black Swan type event? Understanding and defining potential Black Swan events will enable creating guard rails that will signal if the data set is no longer valid due to factors that were not in the data but which have arisen since the ML algorithm was trained.
One way may be to install instant checks of verifiable predicted variables. For example, if Waze had generated estimated car counts or speeds for specific freeway areas during the commute, those could be compared to the actuals. If the real-time variance goes over a specified percentage error rate an alarm could be triggered that could introduce either automated or human intervention. For example, if Waze saw a large number of bad estimates on these test cases in the first day, immediately a message could pop up indicating that “estimates may be long due to unpredictable traffic decreases.” While the accuracy might not have immediately improved, it would warn the customer that normal high levels of accuracy were currently not being delivered.
Another analysis factor is to closely monitor the trending in outcomes. By establishing in-bounds ranges for outcome percentages, major shifts in outcome predictions can be a warning sign also. AI/ML can also be used for the analysis of the aberrations and whether they are a valid change or a change that is outside of the data set model. As the most critical time for algorithms and their predictive value may be in times of stress (like the time on Waze being used by a first responder), understanding the factors in your business and business process that might invalidate the AI/ML models is critical.