[Yesterday we discussed
how to model monks going in and out of a cafeteria and how it is like a hydrological model, with rainfall going in and streamflow going out. This is based on
my experience with a 10-day silent meditation program in Kathmandu, Nepal. Today we talk about how, even though simple models can tell most of the story, problems can still happen.]
So where does it all fall down? Why are there dozens upon dozens of different models out there? And why are they such a challenge to use for real-world forecasting?
Are we capturing all the processes?
Besides queuing and eating there are actually a fair number of other processes going on. When a server's station is empty (e.g., we're out of the brown bean slop), how long is the line stopped while the station gets refilled from the kitchen? What happens to the monks' plates when they are finished- do they have to wash dishes themselves and if so, what kind of delay does that cause? Maybe some monks show up, see too long of a line and decide to come back some other time or skip the meal? Occasionally,
a wild papadum-eating monkey wanders on the scene. What effect does
that have?
Sometimes these things are not an issue (e.g., there is always enough brown bean slop in Kathmandu) and so they do not have to be part of the model. For example, in Australia there is so little snow that nearly all streamflow models in use there do not bother modeling snow. However, you would not be able to get away with that in northern Alaska where snow rules the water cycle.
Maybe a process cannot be ignored, but its effect could be lumped in with some other process. Say it takes on average 15 minutes to eat and 3 minutes to wash dishes, a complex model might include both processes separately. A simpler model might combine them into one "eating and washing dishes" task with an average time of 15+3=18 minutes. Most importantly, if all you can do is watch monks go in and out and never know what is inside, it can be hard to distinguish all the individual internal processes from eachother. Similarly, I know how long it takes for food to pass through my body, but I would be stumped to tell you how long it spends in my stomach versus my lower gut.
In hydrology, for example, water leaves bare soil or lake surfaces to the sky through
evaporation.
Transpiration is the same thing, except it is done by plants. Often times these are lumped together under a single notion of "evapo-transpiration" and this is good enough for most purposes.
Modelers might also argue about the configuration of what is going on inside the cafeteria, e.g. are there many different lines or just one? Are there lines to wait for a seat? There might be special models needed for banquets, family meals, and so on.
Go to war with the army you have
In the case of the
Vipassana meditation center, I can say that breakfast was relatively short compared to lunch. One modeler might say that this was so because the monks wanted to finish quickly and race back to bed for some extra sleep (true story!). In that case, the rate of eating would depend on the time of day. Let's call the proponents of this theory "time-of-day-ists" (and its staunchest supporters are a professor from Miskatonic University and his clan of students).
Another school of thought might say that the mess hall was cold in the morning and that monks wanted to race back to bed where it was warm (also a true story!). These "temperature-ists" hail from Medfield College. Occasionally they meet at scientific conferences and get along for the most part, but they both hold strong convictions about how they believe the cafeteria behaves.
One test of who is correct would be to see if the length of lunchtime varied based on how cold it was, relative to other lunchtimes. An extreme case might be that strange day where it was colder during lunch than at breakfast. If the monks showed up to a cold lunch and decided to skip it altogether, the International Journal of Cafeteria Processes might be floodeded with articles from the "temperature-ists" politely lambasting the "time-of-day-ists" as misguided empiricist nitwits.
This is nice and all, but even if meal length depends more on temperature than time of day, sometimes time of day is going to have to be good enough because you do not actually have temperature measurements at your cafeteria (but everyone has a clock). As Donald Rumsfeld said, "
you go to war with the army you have, not the army you might want or wish to have at a later time." Temperature may be the "real" cause for how meal length varies, but if you do not have temperature data to give to the model, it is a moot point. Instead, you would need to find something that varies like temperature does, to approximate the effect (such as time of day).
In hydrology the parallel would be snow models. The rate that snow melts partly depends on how warm it is, but really it is a combination of radiation, windspeed, humidity and a whole host of factors. Some modelers recoil in horror at the idea of using a temperature index to figure out how much snow has melted but the problem is that nearly nobody measures all those other variables in the mountains. However there are lots of measurements of temperature in snowy areas going back dozens of years. So, operational forecasters have to resort to using temperature-based models. Their results might not be as good as they could be, but commonly they do not do too bad of a job. It is good enough for government work, as they say.
Lets throw another spanner in the works. Dinner at the Vipassana center was the shortest of all the meals. This was because returning students are only allowed hot lemon water for dinner while new students are allowed a banana and a small bowl of something like toasted savory rice krispies. Similarly, less food is typically served at breakfast (e.g. 3 kinds of gloop) than lunch (e.g. 5 kinds of gloop).
Right, so now the "time-of-day-ists" can return the volley from the "temperature-ists" because they can point to how it is typically just as cold at dinner as it is at lunch, even though the time to eat is often radically different... ("respectfully, who's the nitwit now?")
The reality is a mix of all these factors, both temperature and the kind of food being served at various meals. In a way, all the modelers are a little bit right and a little bit wrong. Time of day is not a direct factor but it is a good proxy for some other factors. Therefore, even models that include "unphysical" processes can still be effective. In the meantime, however, models multiply, dissertations are published and the trench warfare among modelers continues.
Turning the general into the specific: Parameters and calibration
A cafeteria model might include a general description of how a serving queue works. What it does not know straight off is the speed of the queue at a specific cafeteria (e.g. the one for students at Kathmandu's meditation center, as opposed to the cafeteria for staff at the Children's Hospital in Boston). Modelers tend to leave things like that as a "parameter". For example, serving lines serve X customers per minute. For the Kathmandu meditation center X = 5 customers per minute. Sometimes parameters are observable but sometimes they have to be inferred from historical data.
To model the meditation center, a cafeteria-ologist might observe people going in and out for a while and then back-calculate what he thinks the speed of the queue is. Maybe this historical data is not available and has to be guessed at by some other means or directly measured. In hydrology, things like the size of a watershed are readily measurable from maps. However, some things are not, such as the rate that water drains from the soils. That parameter would have to be found by trial and error and/or using human expertise.
How you come up with these parameters is a major subject of debate in the hydrological research and operations communities. I won't go into the details now because the issue is technical and complex enough to deserve its own post. However, the heart of the debate goes to how much humans should or should not be involved with the process.
When things aren't what they used to be
This too is another complex issue and so I won't go into detail, but in essence, what happens to your model when something about the cafeteria itself changes? What do you do if the server changed and the serving line is now noticeably slower or faster than it was before? (In the case of hydrology, this might be like after a major wildfire has swept through an area and many trees are destroyed).
What do you do if your cafeteria started with one serving line and a second line was added? This doesn't even necessarily have to happen in reality, there can be a case where the modeller is just asking a "what if?" scenario. Or what would happen if the number of monks that visit the cafeteria doubled from what had ever been seen before? Would some unprecedented behavior arise, like how there could suddenly be a shortage of tables and monks would have to wait for a seat? (Again, in the case of hydrology, this could be the case of trying to guess how the river will react to climate change or some very heavy rainfall that had never occurred before).
If your model had an accurate description of how all the cafeteria processes behaved, it might give the correct answer when it is pushed outside the envelope of what has happened before. But if the model was just set up to fit to some historical data and all the bits inside were just guesswork, then trouble can happen.
A cafeteria is a fairly simple system and even then we can run into all these troubles when trying to model one. Apply this to a river running through an entire landscape that is complex and varied (and is varying in the future), and you can see how there is the major potential for things to go pear-shaped in a hurry. However, we still need models to give us guidance and they are still the primary tool for forecasting. As the famous saying from statistician George Box goes, "
all models are wrong, but some are useful."