Friday, August 26, 2022

An Interview with Norm Crawford: Inventor of the First Computer Model of Rivers

The tops of the curvaceous hills above Stanford University look like someone caught in the middle of shaving. There are occasional patches of forest here and there, but the Open Space Preserves above the Portola Valley are mostly grasslands. Up close, one can stand in a single spot and see tall grass, lonely shrubs, lush ferns and twisted forest. Near the peaks, there are no flowing streams, but plenty of mud and moss in the shady areas. In places, the hiking trails have gouged away at the land, exposing rocks and soil.

1515446829_91d4ea6132_z
The Windy Hill Open Space Preserve in the hills above Stanford University
1516325762_e49fd5175e_z
The rocks are exposed in some places
1515475821_fbbf460c84_z
A path through the nearby woods
20111214_154451
Close-up of lichen growing on the trees

20111214_152349
Steps away from open land, mossy trees cover the pathway.

There are innumerable ways to measure out this landscape. The tallest peak is 1,905 feet high. A cup of soil beneath my feet weighs about two thirds of a pound. This spot is about a 40 minute drive from San Francisco (depending on traffic). With its infinite detail, reading aloud an inventory of the land could conceivably take forever. These exhaustive descriptions would not even include all the aspects that are changing. “Here a fallen tree is half decayed.” “After Wednesday's rain, there are 73 puddles left along the trail”.
These descriptions wouldn’t even be possible, anyway, since most of these details are unknown; some modern geologic maps of the San Francisco Bay Area are humble enough to include question marks in certain locations.
Science is about simplification and summarizing, however. There are so many details in the landscape, but which details are important? “Importance” naturally depends on purpose. For someone wanting to drive through the area, the roads are what matter the most. Descriptions of every inch of pavement are not even necessary, but rather knowing the length of roads and how they connect are probably enough for finding one's way.
Similarly, someone wanting to know how a stream flows could use a hydrologic model. Around 50 years ago, in the valley below Windy Peak, Norm Crawford was the first person to use a computer to simulate a river. Much has changed since then. My small laptop in 2010 is about 100 million times as powerful as the 1960s computers. Printers, alone, at this time were half the size of an automobile. Now, computer models are a nearly indispensable tool in river forecasting. What Norm developed (the Stanford Watershed Model, named after his Alma Mater) continues to evolve and is still used around the world. In particular, it forms the core of many water quality models.
Even though Norm was there at the birth of river simulation modeling, he still keeps a hand in the game. His consulting firm, Hydrocomp, makes river modeling software, sets up modeling systems and provides other services to government and private industry. He gives talks at international conferences, offering seasoned wisdom but also ambitious vision.
During a recent talk in Peru he ran his model in real-time “on the cloud” (i.e. through the Internet) rather than on his own computer. Decades ago, he would have had to schedule time with the computer operator to submit a job to run overnight. Norm has said “People underestimate what’s possible not only at the present time but in future time. My philosophy…is to build software systems that anticipate hardware that isn’t here yet.”
Norm and I met in the Netherlands last year to share his ideas on models and reflect on what he started. Earlier that week, we had participated in a workshop of hydrologists interested in testing ways to make river forecasts that are honest about their uncertainty. It was something like a “bake-off” where everyone brought their own technique to the table and applied it on a common dataset provided by the organizers. We were asked to pretend that we were making forecasts in a semi-realistic way (i.e. no peeking at the answers) so the results could be compared. In total, the team from CSIRO (Durga Lal Shrestha and I) churned out roughly ten billion forecasts by harnessing a network of thousands of desktop computers in Australia.
This week will include a series of posts with excerpts of our three-hour interview and some of the discussions that followed. Norm’s story has been chronicled a few times in the scientific literature, such as the History of the Stanford Watershed Model and a chapter from the book Watershed Models although these focus more on the connections between different researchers and how the original “DNA” of the Stanford Watershed Model can be found in other models used widely in the community. My comments and questions are in blue and Norm is in black.   
This interview has several parts, click on a link to jump ahead to that section:
What is a computer model?
What are these models used for?
What did people use before Norm Crawford invented the first river simulation model?
How did he become the inventor of the Stanford Watershed Model?
How does someone build a computer model?
Can models mislead people? Can people mislead models?
What is a computer model?
Norm Crawford: It’s the numerical equivalent of a physical simulator built to represent flying of a 747 [airplane]. If you were trying to be a pilot, you’d go and sit inside a reproduction of the [cockpit and its controls]. You can land and take off, and it recreates the entire environment of flying a plane... If you crash the simulator, nobody dies and you don’t lose a multi-million dollar airplane.
Digital simulation models represent the hydrologic processes that occur in a watershed. Measure one is moisture in the soil – when rainfall occurs, this computer model will calculate whether it infiltrates into the soil or [becomes runoff] as the soil becomes more saturated.... Then water, if it does run off, moves into a small tributary and then maybe into a larger tributary and is measured downstream at some stream gauge. And the computer model calculates the hydraulics of movement of flow into channels along the river until it gets to the stream gauge.
These models are calibrated [tuned to improve the results] by changing parameters that represent infiltration rates into the soil, rates of evaporation loss from the land surface, and transpiration from vegetation. It’s keeping track of what is referred to as a “water balance”. The model is just tracking a raindrop from the time that it falls and hits the ground, and then it keeps track of what happens to it to the point where it either evaporates or it flows in stream and flows out of the watershed that you’re interested in.
Tom Pagano: ...A lot of hydrologists would think of soil as being like a sponge...When it's dry and you put water in it, the water just soaks right into the sponge, it doesn’t leave. But once the sponge is full and saturated, any more water you put in will erupt over the top or drain out the bottom.

These models are a set of equations that describe how the water would flow through the sponge. When you talk about parameters- that’s how you relate those general equations to your specific catchment, like how big is your sponge, what's the texture of your sponge?

Norm Crawford: Right. Is it an open sponge with large holes in it or a dense sponge that doesn’t absorb much water?
Tom Pagano: Who would use these models? What would they use them for? You mentioned a flight simulator [for training pilots]; was it a teaching tool?

Norm Crawford: Initially, we did use it quite heavily as a teaching tool. It was a very good way for students in hydrology to get a feel for the way a watershed would behave. And we actually built a classroom, probably the first in the country, maybe the first in the world, where we had computer terminals that would link to the Stanford University mainframe. Students could assemble some data for rainfall, evaporation on a watershed, and then change the character of the watershed surface, like change the infiltration rate, to get an immediate response back as to how the watershed was dealing with the rainfall.
[Nowadays, hydrology models are used very widely, inside and outside universities. As mentioned above, they are used for teaching, but they are also used for academic research. They are also used by consultants, engineers and a host of other professionals.]

What did people use before Norm Crawford invented the first river simulation model?

Tom Pagano: Were there river forecasts before computers?

Norm Crawford: Oh yeah, sure. The standard methodology was called “coaxial correlation”. It was a big piece of paper with a series of lines on it and you would enter a amount of rainfall, an index called an API [Antecedent Precipitation Index], and a drainage area, and it would kick out a number [for the flow of the river].
[An API is like a index of how dry the catchment is, how much water is in the soils. For discussion of something similar to coaxial correlation, read about Manila hydrologists using such charts to forecast runoff from a typhoon]
Norm Crawford: The professor [Ray Linsley] that I had actually developed [the API] for the Weather Bureau and for the stream forecasting service. And he had spent some years doing flood forecasting. He was also the head of civil engineering in Stanford at the time, and published [the books “Applied Hydrology”, “Elements of Hydraulic Engineering”, “Water Resources Engineering”, and “Hydrology for Engineers”, the last two of which are still in print].

Tom Pagano: So [the forecasters had look-up tables as a way of converting rainfall to runoff]. Someone would phone up with the rainfall amount for certain areas... or was there real-time [automatically transmitted] data?

Norm Crawford: There was real-time data collection in a way. I remember Professor Linsley telling us about [river forecasting] in class one day, and he said this activity is done 24 hours a day, in the middle of the night, and on weekends. If heavy rainfall is occurring, flood forecasters are on duty trying to figure it out. And one guy in class said “why are you telling us about that?”, [as in:] “we’re not going to work in the middle of the night, you’re crazy!”. It was a typical student wise remark.
But the real-time data collection; they did have devices on some rivers based on the telephony system and you could call up and you’d get a beep response from a sensor. You could translate that [like Morse Code] into a stage [river depth] that was being measured at that location.
One story that Ray [Linsley] told was about one of those remotes gauges… It was the only way that they could get that kind of information in [to the office], in time to be useful. Now, on a really large river, you could depend on somebody observing the stage [river depth] and calling you, but they did have these other devices where no human needed to do anything except call a right number. And they had a gauge that was giving frequent false alarms. The little beep system wasn’t working very well.
And so this one day they called this gauge and it reported that the stage was five feet higher than normal. And in the flood forecasting office, they thought “well, that darn gauge is misbehaving again” and of course about three hours later the flood peak arrives and they realized the gauge was telling the truth.

How did he become the inventor of the Stanford Watershed Model?

[I asked about Norm's early influences and what shaped his thinking at the time. How did he get interested in science and computers? What was the context for his invention?]
Norm Crawford: I was born in the mid-1930s, and I grew up in Western Canada on the family farm and then in a small town of 1500 people. My family was basically farmers, although one of my grandfathers was an early graduate from the University of Ontario, and he was a minister.
He moved to Alberta in 1908. Prior to radio or modern communication one of the recreational activities was for people to go to a community hall to hear someone lecture. My grandfather was one of the few educated men in the countryside, and (although he was a minister) he would lecture on astronomy and geology and politics. He was a college graduate and was an educated fellow. He had that kind of knowledge that other people did not have. And moreover, he was a very entertaining speaker and very outgoing.
There was a defence line across Northern Canada to protect against Russian bombers flying over the pole with atomic bombs. There were these air bases with radars built in the middle of nowhere in Northern Alberta and the Northwest Territories. They were supposed to detect planes coming in, and they had fighters that would go up and engage them. One of the air bases that I visited with a number of other engineering students [from Alberta] had a very early and relatively small digital computer. These guys sitting up in the winter in that climate wouldn’t necessarily have a whole lot to do so they programmed that computer as an Artificial Intelligence [AI] machine. You could type into a typewriter and ask it questions like “how are you today?”, and the computer would answer, “I’m fine, how are you?”
The early AI code could play checkers and talk to you in its fashion. One of my classmates was a very clever guy and he managed to defeat the computer at this one game, and the computer said “Wow, I can’t beat you!”
Tom Pagano: And since it hadn’t been programmed to say that, it was truly remarkable! [laughter]
Norm Crawford: So that opened my eyes of possibilities that these machines had. I then went to Stanford University, a pioneer among universities in computers. This was an era [where some saw a limited long-range demand for computers]. Perhaps a few offense labs and maybe a bank or two and that was all the computers that the US would ever need. Stanford had one of those machines and I took a course in [computer] programming.
[The professor remarked that] revolutionary events happened in the way humans operate because of changes in speed. The impact of steam locomotives was such that you could build a railroad that would run fifteen times faster than a horse and carriage. That caused a revolution in a way that goods and services could be delivered around the country. He had a couple of other examples of the same kind.
He said already the few computers [that existed in 1958] operated 100 to 1,000 times faster than one could operate the huge mechanical hand calculators (that could add, and subtract, and multiply with some difficulty). The concept of stored memory also meant you didn’t have to punch the keys to add two and two, but you could put in an instruction to do the addition.
His statement [about changes in speed causing revolutions meant] changes in the way engineering science was done. Methods in engineering and science were tailored to the calculation speed that was existing. You could think of doing some calculation that would take ten years of typing, but you couldn’t actually do it.
His feeling was that new methods had to be developed to take advantage of this orders of magnitude change and speed of computation, and that the old methods were simply obsolete. I thought, “wow, that’s true that things have to be entirely different. You couldn’t just do the old things in your old way”. It was trivial to use an engineering method designed for pencil and paper and put that on a digital computer because it would take a digital computer no time to do it. However, you wouldn’t get a better result than doing it on pencil and paper [just quicker] so why do it that way? I thought that was very neat idea....
[Norm later recounted how he chose his graduate research topic. In his first year at Stanford University, he started working on an existing project that was trying to estimate the size of floods on small basins. The project was not very successful- the methods were not correct, the results not very good and the funding was ending.]

Norm Crawford: So [Advisor] Ray [Linsley] told me, “to heck with that stuff, just forget about it… why don’t you move to electrical engineering and they’ve got something over there called a digital computer. Well, I jumped on that because I was interested in digital computers. I went over and found a small room in electrical engineering where they had an IBM 650 machine, one of the 50 that existed in the country.
So out of that I started developing a methodology for representing the rainfall and runoff in river basins. I chose to do this in a way that [took advantage of not having to do the calculations by hand]. Repetitive computations were unlimited, as if you had this assistant who would calculate anything for you, and it didn’t matter how long it would take him because you could hardly think of anything that would take him terribly long. Running programs would take 10 to 20 minutes, and we had to sign up to use this machine. You’d go in and you’d actually operate the digital computer yourself, which a few years later was not allowed. The big deal about it was that the computer at the time cost $200 an hour in 1950 dollars, so that was a lot of money [and it was paid for by the National Science Foundation].
So I developed, as a PhD thesis, the first continuous digital computer model called the Stanford Watershed Model. It was the first of its kind and nobody really knew what to make of it. When I published my dissertation, it did receive some attention (that was in 1962). In 1966, I published an update to the model that was very widely distributed around the world. Some 10,000 copies of this technical report were published and distributed by the University. That became the foundation for continuous digital computer models.
[As a side note, the famous technical report describing the model calls it the “Stanford Watershed Model IV.” Model I was a daily-timestep model that apparently did not work very well. Model II was Norm's dissertation (which had an hourly-timestep). Model III was an unpublished incremental change in the model.]
Initially, people didn’t really quite know what it was. One person at a technical meeting around that era told me that it would set the science of hydrology back by 50 years… and he was serious, too! I can understand why he felt that way.
Tom Pagano: So we’re probably just recovering now?
Norm Crawford: Well, maybe never have recovered.
How does someone build a computer model?

Tom Pagano: You could say these models were a way of learning about reality [i.e. for use in the classroom and in doing research]. But at the same time, you had to put someone's idea of reality into the model. How did you figure out what should go in the model?
Norm Crawford: Well, there were two elements to it. One is what might be called the model structure... the physical processes that you choose to include in the model [the other element to model building is algorithms, which is discussed elsewhere]. A feature of the Stanford model was that it was comprehensive, including both surface and sub-surface flows. Previously, the common way of doing things was, if you were dealing with floods, to just worry about the surface runoff or immediate runoff.
But the Stanford model took the approach of representing all of the processes. And also, the common way of doing calculation for a flood, for example, was you would just calculate during the flood. After the rain stopped and evaporation started to occur, you just forget about it, and wait until the next flood. But the Stanford model operated continuously, and that was unique.
Tom Pagano: So in for metaphor of the sponge, surface runoff would be water flowing up over the top of the sponge that couldn’t get in. Base flow or subsurface flow is what’s draining out the bottom. During floods there’s lots of water coming out the top, and you don’t really care what’s coming out the bottom?
Norm Crawford: Yeah. Base flow is usually relatively small, at least in a major flood. Hydrology is full of exceptions, so what I just said is not always true.
Tom Pagano: Right. So how do you make a model if hydrology is full of exceptions?
Norm Crawford: Well, you represent the physical process as best you can. If you cover conditions that represent 98 or 99 percent of all the watersheds in the world, then that’s good enough for work, even if there are 1 or 2 percent that don’t work that way.
A major example is that for most watersheds (like 99 percent) there’s a strong relationship between moisture in the soil, and the infiltration rate (the rate at which water will move from the surface into the soil profile). Dry soil takes much more water than a saturated soil profile. And so the rate of movement of water from the land surface into the soil can change by two to three orders of magnitude from dry to wet....When the soil profile becomes saturated, you’ll get lateral flow that moves water... down a slope gradient [i.e. out of the soil and up on to the surface as it goes downhill].
[To put it another way, there's two effects going on here. Soil that is dry (“empty”) has more room to store new water in it. When the soil is wet, there is not as much room for additional water. Also, new water goes into dry soils relatively more quickly than it goes into dry soils. Think of it like someone eating until they're full. Someone with an empty stomach can eat a bigger meal than someone with a half full stomach. Also, as the person gets fuller, the rate that they can eat gets slower and slower. If the person can’t eat fast enough, food spills all over.]
Tom Pagano: So what happens to that other 1 percent [where the standard model of soil moisture and runoff doesn't apply]?
Norm Crawford: You say “too bad”! There are two conditions where this happens.
When you get into permafrost (areas of continuously frozen ground), the relationship between moisture and soil and runoff just isn’t there… It doesn’t work that way. The runoff will be more associated with temperatures. You might get a warm rain and only the top two or three inches of the permafrost may be frozen, and you get a lot of runoff like it’s flowing off of concrete.
The other [exception] is that there are some soils that become hydrophobic [water repellant] when they become extremely dry. That happens in desert areas. That water will bead up like water on windshield of your car, and it will form dusty drops, and those drops just roll across the surface, and the soil profile doesn’t pick it up – it effectively blocks infiltration in the extreme; the soil is totally hydrophobic. That second phenomenon is usually somewhat intermittent. And then the whole thing will switch, the soil profile’s hydrophobia goes away and all of a sudden it switches to acting in a more normal fashion....
Tom Pagano: You developed this model, based on some idea of how the world works. Did you ever find out that you were wrong, or that some part of the model doesn’t work? Do you ever get rid of parts of a model?
Norm Crawford: Yes. There’s almost a continuous process of trying to improve the model. In fact, we made significant changes in parts of the model a couple of months ago when we were doing some things with snow melt and snow heat exchange. [We did a process of improvement of the soil moisture model between 1960-1966] when I first started building that model, but then not many changes were made in the detail of how that’s done. After [1966] there had been considerable change [in other parts of the model], but that original core of the way the soil moisture and infiltration worked has not changed because I couldn’t find any way that it would work better.
Can models mislead people? Can people mislead models? 
[River forecasters often refer to model outputs as “guidance”. The phrase implies that the models can inform the forecast process, but shouldn’t have the final say on the product going to the customer. However, there are varying opinions about how much model outputs should change a forecaster’s mind, or how much a forecaster should use his mind to tinker with the model so it gives a better answer.]
Tom Pagano: What’s your feelings about the relationship between people and models in that sense?
Norm Crawford: Well, you don’t change [model] parameters in the middle of a flood. Model design is such that the parameters are supposed to be constant, and they’re developed by continuous simulation over a long period of time including a number of [historical] floods of different characters and so on.
The major thing, on the set of major flood, is that people are pretty shocked by what’s going on. And people will often be sleeping on the floor of their offices, making decisions that are unlike anything that they’ve done before.
In Tuolumne River [in California] in 1997, just right around Christmastime, there had been quite a lot of heavy rain. There was a series of storms that are known as Pineapple Express storms, were coming in off the Pacific into California.
The man who was responsible for Don Pedro Dam (which is a 600 foot high earth dam on fourth largest lake in California) was busy checking with the forecast office. He was told that there would be a break between the storms, and he figured he could go home and not worry about it for a couple of days. This turned up to not be true, and they started to get very heavy rain. And they were getting rain at high elevation as opposed to snow, which makes a great difference. And this man was busy running models and he was seeing risk [of failure] to the dam, and the dam at that point had been build about 30 years prior and had a large spillway. The spillway had never been used.
Sometime in that three day period, he’s talking with the head of the agency (Tuolumne Irrigation District owns the dam), and he’s talking to his boss. His boss was asking “are you sure about these numbers?”, “are you sure this has happened?”. [The dam manager] was saying “that’s what’s going on, we’re getting this huge flows coming out of the upper river!” And so they opened the spillway for the first time ever, wiped out a [well used] road immediately below [the dam]. The amount of water coming out tore out a 150 foot wide, 20 foot deep channel immediately. It took all that soil and just took out and caused flooding downstream in the city.
Events like that become legendary [and stick in the minds of water managers]… The first time that spill way was used, and [the operator] sleeping on the floor of his office for three nights. For people who make that kind of decision, it becomes a combination of “I hope I never have to go through that again in my life”, which is pretty likely that they won’t have to. And “it’s also very exciting time to go through”.
Tom Pagano: Maybe even there’s an addiction to the drama –
Norm Crawford: Yeah, it might be addictive, but it’s not something that you can readily form an addiction.
Tom Pagano: …We were talking about guidance and models and whether humans should trust them or try and change the model results…You’ve got an idea of what you think the model result should be, and maybe you adjust the model to make it match how you think it should behave. But then the rest of the result you get is [given to you by the model. For example], say you know what the peak flow should be, but you don’t know how quickly the river is going to drop after the peak… You let the model articulate all those other things that you don’t have time to figure out?
Norm Crawford: Well, I’ll tell you another instance. We have a model on a river in Southern California, with a fairly large dam upstream. A large flood developed – again, something bigger than anybody has seen in their lifetime. Based on model results, the downstream town’s Mayor and police department were told that they would expect the stream to go out onto its floodplain. Also, there are a number of people living there, and they should expect the peak to arrive in about eight hours. The Mayor was a fisherman and he knew from fishing in this river that if they released flow with this dam it would take 24 hours to get to the city.
So he told the police department “Relax, I know what the river’s doing, don’t worry about evacuating anybody”. What of course he didn’t know is that a high flow moves much faster than a low flow. The flood arrived in eight hours. It was pretty dicey. They had to go in and get people under emergency conditions just because the mayor didn’t understand channel hydraulics.
Should users be able to see the raw output of models? Or only know what the forecasters tell them? 
[In the age of powerful computers and the Internet, it is becoming much easier for users to access model outputs and make their own interpretations of the results. An analogy for this would be like patients doing their own research on the internet about their medical symptoms and what treatments should work. Independently of a doctor, a patient may self-diagnose that they have, for example, the flu… or maybe something more serious and exotic like pneumonia. They might be right, they might be wrong.]
Tom Pagano: What do you think about giving users model output? 
Norm Crawford: My inclination is to make the information available and just trust the public to read the caveats about it. That’s a better policy generally than to try to restrict the information. There’s always been in the United States [people interested in weather]. NOAA [the parent of the National Weather Service] takes advantage of an individual’s interest in those activities and will provide a rain gauge to someone [so they can measure rainfall at their house]. If [NOAA wants to] gauge in a certain area, they’ll go and find someone who would be the observer. Some of these observers traditionally were not paid. They just filled in a form every month with rainfall amounts and temperature [at their home] and mailed it in.  
There are citizen networks of automatic gauges [such as Weather Underground] that report in- there’s thousands of these stations within the U.S. were people are measuring temperature and precipitation and sometimes wind and other things. It isn’t much of a stretch for these people to acquire some software that would represent hydrology and set up their own personal forecasting system for flow in a little creek that goes by their house. That latter part, I’m not sure how much that is done but it’s a perfectly technically feasible to do.