A problem well stated is a problem half-solved

Lately, I’ve been reflecting on how we can prevent “things from getting worse” in a variety of settings — at work, within our families, and even more broadly in society. I’ve come to believe that what truly matters is our ability to course correct: to adjust and improve situations once we understand that they are deteriorating.

The difficulty, of course, lies precisely in that word — understand. Recognizing that things are getting worse is often far from easy. In many cases, it is hard even to articulate clearly what we mean when we say that something is “not going in the right direction.” And in most environments, it can be genuinely difficult to speak up and say, “Something is not right here.”

This is what brings to mind the quote often attributed to Charles Kettering, former head of research at General Motors: “A problem well stated is a problem half solved.” There is considerable wisdom in that statement, particularly when dealing with complex issues and when trying to mobilize groups of people toward meaningful course correction.

Why do I say this? Fundamentally because, in my experience, when problems or opportunities are not well stated, a host of negative dynamics tend to emerge.

People begin to adapt to problems instead of solving them — a powerful driver of many organizational and social failures.
A lack of clarity makes it difficult for competent individuals to take the lead.
Those who have adapted successfully to a flawed situation often resist change, even when the overall impact is negative.
Morale and energy decline as conditions worsen and collaboration becomes harder.

Within the limits of this post, I want to share some thoughts on what can help prevent this outcome. In particular, I want to highlight one family of tools that humans have developed to state problems with exceptional clarity: quantitative models. To be clear, the core point is not about mathematics per se, but about fostering clarity of language and transparency in order to enable course correction. Quantitative models are simply a particularly powerful way to achieve that.

Quantitative models: well-stating problems the hard way

Quantitative models are not always available — after all, they require measurable quantities — but when they are, they are remarkably effective. They force assumptions to be explicit, make key trade-offs visible, and provide a shared and precise language that greatly facilitates collaboration. It is no coincidence that the extraordinary progress of physics, chemistry, and engineering from the 16th century onward coincided with the widespread adoption of mathematical modeling.

In some cases, such as physics, we are even able to state incredibly complex problems with great precision without fully understanding them. Quantum mechanics is a striking example: we can formulate models that answer factual questions with astonishing accuracy, even when their interpretation in plain language remains deeply contested.

Another major benefit of mathematical models is that they uncover relationships that would otherwise remain hidden. A simple example illustrates this.

Suppose you run a sales effort in which you provide services at a loss, hoping that a fraction of prospects will eventually convert to paid customers. You would like to understand how to balance this investment in order to maximize profit. Assume that you are sophisticated enough to have an estimate of the probability of conversion for different prospect profiles.

A natural question arises: up to what probability of conversion should we be willing to provide services at a loss?

At a high level, the answer is intuitive: the gains from converted prospects must offset the losses from those who do not convert. In quantitative terms, this can be written as:

$Conv\% \cdot (ConvValue-Investment) \ge (1-Conv\%) \cdot Investment$

Solving this inequality at breakeven yields the minimum conversion probability required for profitability:

$Conv\% = \frac{Investment}{ConvValue}$

This expression immediately clarifies several things. Profitability depends only on the investment per prospect and the value of a converted customer. And if the investment exceeds the conversion value, there is simply no viable business.

A less obvious insight concerns sensitivity. The conversion threshold depends on investment and conversion value in exactly opposite relative terms: increasing investment by 10% raises the required conversion rate by 10%, while increasing customer value by 10% lowers the threshold by 10%. This kind of elasticity-based reasoning is extremely hard to see without writing the problem down explicitly.

Of course, this model is simplified. In practice, conversion rates often depend on investment — for example, offering a richer free trial may increase the likelihood of conversion. At first glance, this seems to make the problem much harder: the conversion rate depends on investment, but investment decisions depend on the conversion rate.

Yet writing this down actually simplifies the situation. If conversion probability is a function of $Conv\%(Investment)$ , profitability requires

$Conv\%(Investment) \ge \frac{Investment}{ConvValue}$

Rather than a fixed threshold, we now have a relationship defining a region of profitability. Far from being an obstacle, this opens the door to optimization: by segmenting prospects by expected value, we can refine investment levels and improve outcomes.

This is a general principle in quantitative modeling: relationships between variables may complicate the mathematics, but they expand the space of possible strategies.

From thresholds to overall profits

So far, the discussion has focused on whether to pursue a given prospect segment. But what about overall profitability once we act?

If conversion rates are not easily influenced by investment, total profits can be written as:

$Profits = MarketSize \cdot ( P(Converting)\cdot(ConvValue - Investment) - P(Not\ Converting)\cdot Investment )$

Suppose the baseline conversion rate is 30% and the long-term value of a converted customer is three times the investment. Plugging in the numbers yields a loss: on average, each prospect served generates a loss equal to 10% of the investment.

At this point, several strategic levers are available: improve the product without raising costs, improve it while raising costs but increasing conversion, reduce free-trial costs, or develop targeting models to focus on prospects more likely to convert.

How should a team — say, a group of founders with very different backgrounds — decide which lever to prioritize? Disagreement is inevitable, and mistakes will be made. This is precisely why course correction matters, and why developing a precise language around the problem is so important.

Consider targeting. Suppose we segment the market into two equal-sized groups: Segment A with a 40% conversion rate, and Segment B with a 20% conversion rate. Targeting only Segment A yields positive profits — a substantial improvement driven by a very rough segmentation (equivalent to a two-bin scorecard with a Gini of roughly 23%). See below:

$SegmentAProfits=\frac{1}{2}\cdot MarketSize\cdot(40\%\cdot(3\cdot Investment-Investment)-60\%\cdot Investment)=\frac{1}{2}\cdot MarketSize\cdot(20\%\cdot Investment)$

With further work, we could address questions such as: how valuable is improving targeting further? How does that compare with reducing free-trial costs or increasing customer lifetime value? Quantitative models allow us to ask — and answer — these questions systematically.

Clarity, knowledge, and course correction

One might object that quantitative models are difficult for many people to understand, and therefore limit broad participation in decision-making. This is a fair concern. But clarity is never free. Whether expressed mathematically or otherwise, precision requires effort.

Course correction depends on acquiring and applying new knowledge, and conversations about knowledge are rarely easy. We cannot hope to improve conversion through product enhancements without learning what users value most — and learning often requires time, attention, and risk. As Feynman put it, we must “pay attention” at the very least.

Recognizing knowledge, applying it, and revising beliefs accordingly is hard, even for experts. A well-known anecdote from Einstein’s career illustrates this. After developing general relativity, Einstein initially concluded — incorrectly — that gravitational waves did not exist. His paper was rejected due to a mistake, which he initially resisted. Yet within a year, through discussion and correction, he recognized the error and published a revision.

Even giants stumble. Progress depends not on being right the first time, but often on being willing — and able — to correct course.

Recommended book on how powerful quantiative models can be:

The Art of Planning, DeepSeek and Politics

A while back I discussed in a post some of the nuances of data driven decision making when we need to answer a straight question at a point in time e.g. : “shall we do this now?”.

That was a case presented as if our decision “now” would have no relationsip to the future, meaning that it would have no impact on our future decisions on the same problem.

Luckily, often, we do know that our decisions have an impact on the future, but the issue here is slightly different, we are looking not only on impact in the future but an impact on future decisions.

This is the problem of sequential decisioning. Instead of “shall we do this now?” answers the questions:

“Shall we adopt this strategy/policy over time?”

In other words it tries to solve a dynamical problem, so that the sequence of decisions made could not have been different no matter what happened. When you have an optimal solution to this problem, whatever decision is made, at any point in time, is a decision that one would never regret.

I will answer three questions here:

What is an example of such a problem?
Is there a way to solve such problems?
What is the relationship with DeepSeek and politics?

Sequential decisioning problem example – Customers Enagegement Management

A typical example could be that of working on marketing incentives on a customer base. The problem is that of deriving a policy for optimal marketing incentives depending on some measures of engagement in the customer base.

Why is it sequential? Because we have a feedback loop between what we plan to do, the marketing incentive, and the behaviour of our customers that triggers customer incentives: engagement.

Whenever we have such loops we cannot solve the problem at one point in time and regardless of the future.

The solution could look something like: “Invest $X in customer rewards whenever the following engagement KPIs, email open rate, web visits, product interaction etc.etc. drop below certain levels.

It is important to note that we need something to aim for, e.g. maximise return on marketing incentives.

Does a solution to such problems exist?

The good news is that yes, solutions exist, but we have been able to deal only with such cases that would be amenable to some mathematical/statistical modelling.

We can 100% find a solution If we have a good model that answers the following:

How does a given marketing incentive influence customers enagement on average?
Given a certain level of customer engagement, what is the expected value of the relationship with the customer?
How does the engagement of our customer evolves overtime, in absence of any incentive?

So we do need some KPIs that give us a good sense of what level of cusotmer engagement we have, and we need to have a sense of what is the expected value of the relationship with the customer given a particular set of KPIs. To be clear what we need is something true on average, and at least over a certain period of time.

Example, we should be able to say that, on average, a customer using our products daily for the past month will deliver X value over 2 years.

We also need to be able to say that a given marketing incentive, on average, increase customers daily enagement by X% and costs some defined amount of $

We also need to be able to say something like: our customer engagement rate tends to decrease X% every few months, again on average.

The above modelling of engagement and customer value is not something that most businesses would find difficutl today. Engagement rates, or attrition rates overtime, are easy to get, as well as results of marketing campaigns on engagement rates. Harder to get is a reliable estimate of lifetime value given current engagement metrics, but we can solve for a shorter time horizon in such cases.

Richar Bellman is the mathematician credited for having solved such problems in the most general way.

His equation, the Bellman equation, is all you need after you have a mathematical model of your problem.

The equation, in the form I’d like to present it is the one below, where V is the value of the optimal policy $pi*$ :

It says the following:

The optimal strategy, depending on parameters a (e.g. how $ spent on incentives), is that which maximizes the instant rewards now, as well as the expected future rewards (future rewards conditional on the policy), no matter what the variables of the probelm are at that point in time (e.g. customer engagement and marketing budget left).

It is a beautiful and intuitive mathematical statement which is familiar to most humans:

The best course of action is that which balances instant gratification with delayed gratification.

We all work this way, and this is arguably what makes us human.

To be clear this problem can be, and it is solved, in a large variety of sequential decisioning problems.

The Bellman equation is the basic tool for all planning optimizations in supply management, portfolio management, power grid management and more…

Politics?

Well, in some sense politics is the art of collective, at least representative, policy making. And here a problem obviously arises: How to dial well instant versus delayed gratification when the views of the collective might differ? What if the collective doesn’t even agree on the various aspects of the problem at hand? (e.g. the KPIs, the expected rewards etc.etc.).

A key aspect of this should be the following: LEARNING.

The illustration below should be of guidance:

There has to be a feedback loop as the Bellman equation gives conditions for optimality, but often the solution can only be found iteratively. Futhermore, we also need to be open minded on revising the overall model of the problem, if we gather evidence that the conditions for optimality do not seem to come about.

So we have multiple challenges, which are true for politics but also for business management. We need:

Consensus on how to frame the problem
A shared vision on the balance between instant versus delayed rewards (or my reward versus your reward)
A framework for adapting as we proceed

The above is what I would call as being “data driven”- basically “reality” driven, but it is hard to get there. We all obviously use data when we have it, but the real challenge is to operate in such a way that the data will be there when needed.

How about DeepSeek?

If you have followed thus far you might have understood that solving a sequential problem basically involves algorithms to learn the dynamics between a policy maker action (e.g. marketing $ spent) and the long term feedback of the policy. This is the principle behing “reinforcement learning” in the AI world, as one want the learning to be reinforced by incentives for moving in the right direction. This was considered as a very promising framework for AI in the early days, but it is indeed an approach that requires time, and a fair bit of imagination as well as data, and was not pursued in recent years… until DeepSeek stormed the AI world.

A key element of DeepSeek successful R1 generative AI model was that it leveraged reinforecement learning, which forced it to develop advanced reasoning at a lower cost.

This was achieved through very clever redesigning of the transformer architecture, but I won’t go through it (as I have not yet understood it well myself).

As usual let me point you to a great book on the above topics.

My favorite is “Economics Dynamics” by John Stachursky. Great resource on sequential problem solving under uncertainty. Pretty theoretical, but that’s inevitable here.

Basics of Mathematical Modelling – Shapes of Uncertainty

Most of us, especially in business, but also in our private lives, peform some basic mathematical modelling all the time. We can do it wrong or we can do it right, but we do it … constantly.

The most basic form of mathematical/statistical modelling is to give shapes to our uncertainty (ignoring the shape is also part of it).

Let me tell you a story that should clarify what I mean.

John and his team at SalesX, an online marketplace, are trying to outsource their call centre operations and are looking for potential suppliers. They sent out an RFP (request for proposals) and they received a few responses. (note: despite the name SalesX is not owned by Elon Musk).

John calls his team in his office:

John: “Hi, everyone, could you give me a clear high level summary of the various responses we received from the call centers?”

Jane: “Yes, we shortlisted 3 responses for you, although we are quite confident on a winner…”

John: “Ok, what are the average call handling times of the shortlisted suppliers?”

Jane: “They are all about 2 minutes, but one provider is considerably cheaper. It is quite clear we should go for them. SmartCalls is the name of the company and they are heavily relying on automation and AI to support their operations, which keeps their costs very low, a very clever team.”

John: “Ok, sounds pretty good, let’s meet them but still keep another response in the race as we dig deeper beyond performance and price.”

Laura in Jane’s team, is not convinced by such a swift decision, and questions Jane…

Laura: “Jane, 2 minutes is great but what about the % of calls that are not resolved in one session? that one is…”

Jane: ” Yes that number is higher for SmartCalls, but this is hard to compare across providers right? as their clients have different rules for re-routing inquiries in house…”

Laura: “John, Jane, before moving forward with SmartCalls, let me reach out to the shortlisted suppliers with requesting some additional information. It will be very hard to switch to another supplier once we are done with this…”

John: “Ok but I give you 3 days, I hope you are after something substantial… SmartCalls is super cheap. Handling calls is not rocket science…”

Laura goes back to the suppliers and asks them for a full view of the distribution of call handling times and she finds out the following, summarized in below chart, which she promptly shares with John:

John looks at it but he is not the too clear on what that means…

John: “Ok Laura, translate please…”

Laura: “Basically SmartCall business model is that of leveraging AI with an inexperienced and cheap workforce, they can deal quickly with a large number of calls relating to simple issues, but their operators lack the creativity or experience to deal with issues of medium complexity. The operators all work remotely with no chance of sharing information with each other…”

John: “Wait, wait… the chart first, what does that mean?”

Laura: “Oh… isn’t that obvious. SmartCall has a 2 minutes call average, yes, but this is driven by a larger number of very quick calls, when it comes customer satisfaction there’s a good number of calls that go beyond 3-4 minutes.”

John: “Ok I get it, their calls are either quick, or rather long whilst, iHelp for example, is able to be more consistent, with most calls handled in about 2 minutes right?”

Laura: “Yes, they avoid shortcutting the initial problem identification phase and have a longer list of mandatory screening questions, but this pays off. They are able to share the calls with specialized teams and…”

John: “Ok I get it indeed. I also see several of SmartCall going beyond 5 minutes, which is our threshold for bringing back customer calls in house… good work Laura. Jane, good work on hiring Laura”

Jane is a bit deflated, but she ultimately smiles as she is proud of having trained Laura well.

The quick story above is a good example of how we routinely perform unsophisticated statistical modeling, especially when we implicity anchor ourselves to some one dimensional metrics, like averages.

When John heard the average call times he implicitly assumed that those averages were comparable and meaningful in their own right. This means he assumed (modelled) the call times range and frequency to be similar in shape across suppliers, which is a fair bit of statistical modelling in practice (wrong modelling in this case).

After all, what can you really tell from comparing averages unless you make the rather strong assumption that those averages are informative in summarizing the underlying data apprpriately? If you do so, although you might not know it, you are doing statistical modelling.

Laura, instead, decided to look at the actual shape of the data and to avoid any high level, uninformed assumption, yet she is still assuming quite a few things, among which:

The data she received is representative (not skewed toward some specific types of calls)
The distribution of the data will be consistent over time
The distribution of the data basis how call centers are handling other clients calls is relevant to SalesX customer base

That’s basically statistical inference, and whenever you make a decision or judge data, although you might not think about it, you, as well, are doing statistics and data science.

The question is: “Do you know when you are doing that well, or when you are being ineffective?”

Another key aspect of giving shapes to uncertainty is the key question of what metrics to measure, whether average call times or abandon rate, or both, or other metrics. Which ones to chose?

This is somewhat the operational solution of the problem I presented above, when it comes to data that tracks a business activity or process, which are the metrics that truthfully summarize the shape of that data?

I can recommend two good books on this topic, one is a technical text, Optimal Control Theory by Donald E. Kirk, the other is a more accessible read focusing specifically designing meaningful metrics that track Objective Key Results (OKRs), Measure What Matters by John Doerr

Optimal Control Theory goes well beyond the subject of performance measures, but the first 50 pages are a good introduction to the overall framework.

Statistics, Machine Learning and Deep Learning…

Recently I am sure you must have often heard about AGI (Artificial General Intelligence) and that it has something to do with Deep Learning and all that, but, what is Deep Learning? What are the differences with things you might be more familiar with? Like Stats or Machine Learning?

Let me give you a tour, and in the end you’ll find out that Deep Learning is not has mysterious as it sounds (in fact the backbone is liner algebra plus the calculus you studied at college).

Patterns, Patterns and more Patterns:

Overall we are talking about patterns, and Maths is fundamentally the discipline that “speaks” about patterns. In a sense this is the most fundamental definition of Mathematics (see Hofstadter foreword of ‘Godel’s Proof by Nagel & Newman).

Let me put in front of you 3 apples, 3 pens, 3 books, 3 plates and 3 balls… now forget what ‘3’ means in my sentence (if you can), then answer this question: What is the pattern among the various objects that I have presented you? The answer should be that they all seem to be a group of… three distinct objects of a given category. There you have it, a definition of the most basic pattern, and that’s what we call a number. It might seem I am digressing but I want you to get a bit abstract here so let me indulge.

There are then hundredths of specific disciplines that study patterns and relationships and properties of patterns, calculus for example, is the branch of mathematics that studies patterns between infinitely small quantities and how those patterns generalise, geometry can be looked at the study of patterns in shapes (or distances) etc.etc.

From this perspective Statistics, Machine Learning and Deep learning are of the same family, they sit in a branch of Mathematics that studies how to reconstruct or recognise patterns. For example, I could give you the below series of numbers:

2, 4, 8, 16, 32 etc.etc.

And we could try and understand whether the pattern is: number at position n+1 is twice that at position number n.

How Statistics, Machine Learning and Deep Learning differ is at how they Represent patterns, and what type of questions about patterns they focus answering.

With “representation” I mean something pretty basic, in essence a map that allows you to work with the pattern. For example for the square function you can use several representations:

X^2, or X*X, or you could draw a square with a side of length x, or go more abstract and draw a parabola that meets the origin, or as a derivative, or an integral, or any other way you want really, so long as that representation is useful to you.

That representation will only be a useful map to you so long as it helps you move toward your objective, e.g. are you trying to draw a trajectory? Or prove a geometrical theorem? or solve an equation? etc.etc.etc.

Statistics:

Patterns are represented by parameters of special functions we call Distributions, so statistics focuses on finding parameters and telling us how confident we should be that the pattern we identified can be mapped by a given relationship which is determined by some parameters. We don’t always want to “predict” in Statistics, but more likely want to make sure we are onto something substantial or not (e.g. is the proposed medicine sure to drive benefits? Is the defendant DNA matching the blood sample?).

Statistical Inference, to use an example, would like to answer the question whether the income of a customer has an impact on how much the customer is likely to spend on your products. It will give you a yes or no answer (yes for Jewellery… not so much for bottled water perhaps?).

Statistics is a highly developed discipline that is able to define boundaries and provide indisputable proofs, as it is a branch of Mathematics. For example it can answer the following general question:

Under certain assumptions on my data, what is the “best” test to verify my hypothesis? What is an admissible test? What is the dynamic policy that maximises utility? Best strategy to win a game? etc.etc.

Machine Learning (or Statistical Learning):

With Machine Learning we are interested in teaching computers to identify patterns, for the purpose of making predictions. For example, if the income of my customer is X how much is she likely to spend on our products? Our task will be that of giving as much data as we can, and a learning framework, for the machine to establish that pattern between income and spend (if there is one).

In Machine Learning it is also important how patterns are represented. We can only use representations that can be implemented in computer programs efficiently.

For example decision trees represent patterns as a series of if-else statements… if this -> then this else that etc.etc. So, in the case of decision trees, the pattern needs to be amenable to mapping to simple if-else statements (e.g. if income is between 100 and 200 then spend will be 10, if greater than 200 then 20 etc.etc.), but overall all Machine Learning algorithms have their own specific ways of representing patterns.

To give another example K Nearest Neighbours is an algorithm where patterns are identified “spatially” by mapping the input data to an abstract space where we can define distances. In our example a KNN learning approach would put incomes and spend in a two dimensional space and try to find, when you give it only income, the spend that puts the point in a region of space that “fits” with the geometry of that shape the training data drew.

Unlike Statistics (and Decision Theory as a consequence) Machine Learning does not give close answers to defined questions. There is no theorem that goes something like: given some assumptions on your data this specific learning approach is guaranteed to give you the best prediction.

Machine Learning is therefore closer to engineering than Mathematics, a lot of the effort (and fun) in Machine Learning is that you have to play with the tools to go somewhere and find new opportunities.

Obviously the algorithms are not found by trial and error only, there is a lot of pure statistics behind Machine Learning but it is mixed with research in computer science as well.

Deep Learning (or Deep Neural Networks):

Deep Learning is a specific branch of Machine Learning where patterns are to be represented by Networks of fundamental computational units called “Neurons”.

Such Neurons are not that complicated, as they perform basic algebra + a simple transformation (to be decided upfront), but the overall architecture of the networks plus the size can make them incredibly powerful. In fact Neural Networks have a nice theorem to them, the universal approximation theorem, which states that, under certain assumptions, neural nets can indeed represent any well behaved function, guaranteed! (we do not have such theorems for decision trees or KNN).

Chat GPT, for example, has millions of Neurons and hundredths of billions of parameters that are only devoted to the task of predicting what sentence from a given language best fits the “pattern” of the conversation that a given prompt initiated (in a human like way).

As with Machine Learning, and even more so, Deep Learning is closer to Computer Science than Statistics since we basically need to proceed experimentally within technical constraints, we do not have solid theoretical foundations for any results that are obtained in Deep Learning, we literally do not know what Chat GPT is doing in much details but, through trial and error, a certain Network architecture (called the transformer) has been found to be effective, but this is driven also by what architectures we can implement within the constraints of existing Computers and computer languages, so Deep Learning is also about the hardware.

A lot of current research focuses on architectures and how to set up various layers of the network.

To be clear this is not “trial and error” in the common language sense, but works more like experimental Physics, with a close interplay with theoretical advances and areas of inquiry.

I hope you found the above useful but for anything a bit deeper on Deep Learning, I recommend the two books below:

Both very readable introductions, and can also give a sense of what might be needed to work on AI as opposed to consume AI, or what might be needed to develop AI in house (lots of compute power, computer scientists and mathematicians for sure).

odlumresearch

Marketing research, Statistics, Analytics

Menu

Tag Archives: technology

A problem well stated is a problem half-solved

The Art of Planning, DeepSeek and Politics

Basics of Mathematical Modelling – Shapes of Uncertainty

Statistics, Machine Learning and Deep Learning…