Consider the following problem:
You have a set of stochastic function values observation in domain R^n. Can you approximate the minimum of the underlying function based on data observation alone? There may be outliers in the data. The algorithm should be "robust" to spurious optima, i.e.: minimum should be "statistically evidenced" by the neighboring points. Any surrogate model building is not allowed because it presumes the model of the underlying function. The function may be multimodal and its noise changing based on sampled value, but you can assume noise is bounded.
It is of utmost importance such an algorithm should be found, because we can solve many real-world stochastic problems by simply observing lots of data and by strategically picking the optimal solution by using such an algorithm.
Turns out such a problem cannot be solved:
Statement
In the absence of any surrogate-model assumption, any purely data-driven procedure that (1) identifies a global minimizer in R^n from scattered noisy samples and (2) rejects isolated outliers must introduce at least one scale or density parameter.
Necessity of a Scale Parameter
Continuum of Scales
- The domain R^n admits arbitrarily fine or coarse feature sizes.
- To decide whether a low observed value at x is “supported” by nearby samples (and hence not an outlier), one must compare distances against some reference scale.
Neighborhood vs. Isolation
- Distinguishing a true local cluster of low values from an isolated "rogue" sample requires defining "nearby".
- Without a distance threshold or a density requirement (e.g. “at least k neighbors within radius r”), no algorithm can tell whether two samples are part of the same local minimum basin or merely chance proximity in high dimensions.
Impossibility in the Adversarial Case
- An adversary could place two samples:
- one at the true minimizer x∗,
- one at some distant point x_out with an equally low observation by noise alone.
- With no notion of “closeness,” any algorithm must treat both equally—so cannot systematically reject the outlier.
Minimal Hyperparameter: The Sampling Radius or Density
Any practical, modelless minima-finder must choose at least one of the following:
- Radius r: require that a candidate xi have at least one other sample within distance r.
- Neighbor count k: require that each candidate have ≥ k samples in its k–nearest-neighbors.
- Scale-adaptive rule: scan over all possible radii and select minima “persistent” over a range of scales (this itself depends on how persistence is measured).
Conclusion
A truly parameter-free, modelless global-minimum criterion cannot exist under the problem’s generality (continuous domain, adversarial noise bounded only by δ, arbitrary sampling). Any method that both
- identifies minima and
- rejects isolated outliers
must introduce at least one notion of “locality” or “support,” i.e. a scale or density parameter.
The impossibility of identifying a global minimizer from mere noisy samples, without any assumption of scale, smoothness, distributional form or model structure, is an instance of the “no-free-lunch” principle in optimization and statistical inference. In formal terms, the No-Free-Lunch (NFL) Theorems [Wolpert & Macready 1997] state that, when all possible objective functions are treated as equally likely (i.e. no prior bias toward any class of functions), every optimization algorithm has identical average performance over that space of functions.
But we still need to be able to solve it!
Given mild assumptions, a relaxed conclusion can be obtained:
Under i.i.d. symmetric noise with unknown variance, uniform prior over function values, and 0–1 loss, the decision rule "choose the sample with the smallest observed value" is both necessary and sufficient for Bayes‐optimal betting on the minimizer. The intuition that "all samples have equal opportunity to produce a spuriously low draw, so the lowest realized draw is most likely genuine" is therefore correct and rests on the exchangeability and symmetry of the noise process.