ML-driven products have a default workflow that nobody talks about because nobody decided on it. It just happens.
You have a dataset. You train a model. The model produces something - a bounding box, a confidence score, a class label. A UI is built by someone that shows that output. The product ships.
What got decided there? Nothing, really. The interface by default became the internal structure of the pipeline. The model gave a float of 0-1 thus the UI represents a percentage. The output of the model is bounding boxes, hence the UI displays bounding boxes. No one sat down and questioned whether a percentage is what a human thinks about risk or whether a bounding box is the appropriate abstraction to a decision that must be made within 30 seconds.
I have begun to do it in the reverse. Not due to the ease of modification of interfaces, which is not easy, particularly once users have constructed workflows around them, but because the interface is the only place where you can pose the correct question before it is too late: what does a user actually need to do?
Here's a concrete version. Suppose you are constructing a geospatial alert system. The ML-first model begins with the model: we are presented with a detector that produces bounding boxes and confidence scores, we are going to present a ranked list of detections.
The interface-first method begins with the task: a field analyst must make a decision about dispatching a team within 30 seconds. What does that decision demand?
Those are not similar questions. The former generates a UI that is a mirror of the model. The second generates a UI which represents the decision - and then queries what the model must generate to justify it. Perhaps not a raw confidence float, but a change-delta of the previous revisit. Perhaps not a list of bounding boxes, but a risk tier with a description attached.
The output contract is different. The paradigm you would construct is different. And you would never know it, were you to begin with the model.
The saying that design should be user-centered, rather than technology-centered, is not new. It's not what I'm talking about.
This is specific to the medium and has three properties of ML systems not possessed by traditional software.
The output is probabilistic. Your interface must perform on the 94% (i.e, true positives and true negatives) case and the 6 percent (ie., false positives and false negatives) case at the same time. The majority of interface design occurs in success case. No one fakes up the appearance of the UI when the model is sure of being wrong - which, in the real world, it will be, frequently, in ways you did not expect when you were developing it.
There is a shift in the distribution of output. Training distribution is not production distribution. You have developed the interface based on high-resolution and clean images. This time you are in a place where the clouds are always there, another sensor, a different season that the training data did not cover. The interface continues to display model outputs using the same visual weight and confidence framing. The model has degraded. The interface is not aware of that.
The error mode is invisible. When the traditional software fails, the failure is generally visible -a crash, an error message or a blank screen. In case of a wrong ML model, a confident-looking, well-formatted, totally incorrect output is generated. It is also designed in such a way that you have to query what that looks like before you build the pipeline rather than after.
It is this very error, and it is the ML-specificity of this error, which makes it difficult to prevent, that the people who own the model and the people who own the interface are often different individuals, operating on different time lines, with different vocabularies. The model team delivers an output contract. It is built up by the frontend team. No one is in the middle to ask whether that was the right contract or not.
Check out two or three interface variants prior to committing to an output format. Design a pipeline to produce a certain output before you actually know which output a user can actually use. Either way you are going to change your mind. It is better to modify the Figma file than the training run.
The case is best made before production. The interface is the only thing that is expensive to modify once it is deployed and users have established actual workflows based on the interface. By this time the pipeline is likely to be the more flexible element.
And it is completely broken when you build the interface without having the slightest idea of what the model is capable of doing. The line is to be able to understand the data, not to be modelled by it - get your domain knowledge to the interface design, be aware of what is possible, design something you can implement.
However, the idea is to ensure that such decisions are made consciously, before the pipeline decides them on your behalf.
The model team ships a model that produces the outputs. The front-end team develops to it. Somewhere down the line a user makes a choice with a float between 0 and 1 that no one ever questioned whether a human being should act on.
It was logical that the model should generate. That's different.