Scholar iON
Academic Synthesis
This selection of scholarly papers addresses diverse aspects of optimization, modeling, and theoretical frameworks across distinct domains. Caio Gomes's work redefines Software as a Service (SaaS) products as akin to insurance models, suggesting a novel operational framework for pricing capped-usage services, distinct from traditional unit economics. Vy Bui and colleagues introduce a double-Bayesian learning framework to optimize neural network training, proposing a theoretically optimal learning rate to enhance performance across tasks. Yann Dauphin et al. tackle the challenges posed by saddle points in high-dimensional non-convex optimization, presenting the saddle-free Newton method as a superior alternative to traditional descent methods. Lastly, Thomas Luu and Ulf-G. MeiΓner respond to critiques about effective field theories, defending their stance on reductionism and weak emergence in particle physics. Collectively, these studies contribute to advancing methodologies in their respective fields, highlighting innovations in modeling, learning rate optimization, and theoretical clarifications.
Capped-usage SaaS products -- LLM subscriptions such as Claude Code and ChatGPT, cloud platforms such as Vercel and Cloudflare Workers, corporate benefit platforms, identity-verification services with liability transfer -- share a structural signature with insurance products: a fixed premium decoupled from realized consumption, stochastic per-user demand with heavy-tailed severity, a non-fungible cap that resets on a fixed schedule, and a portfolio-level exposure that requires reserve adequacy under tail risk. We argue that this is not an analogy. It is the same operational problem actuarial science has been tooled for decades to address, restated with new dependent variables (tokens, bandwidth bytes, function-invocations, gym check-ins) in place of medical claims. This paper proposes a modeling framework for capped-usage SaaS pricing built from frequency-severity decomposition, premium calculation principles, and Monte Carlo reserve adequacy. We map the framework to publicly observable subscription tiers in two domains (LLM services and cloud platforms), ground it in canonical health-insurance economics (Arrow 1963; Pauly 1968; Manning et al. 1987; Brot-Goldberg et al. 2017), and demonstrate divergence from traditional unit economics through a worked example. The contribution is operational rather than theoretical: not a new theorem, but vocabulary and tools currently absent from cs.LG/stat.ML practice.
Backpropagation with gradient descent is a common optimization strategy employed by most neural network architectures in machine learning. However, finding optimal hyperparameters to guide training has proven challenging. While it is widely acknowledged that selecting appropriate parameters is crucial for avoiding overfitting and achieving unbiased outcomes, this choice remains largely based on empirical experiments and experience. This paper presents a new probabilistic framework for the learning rate, a key parameter in stochastic gradient descent. The framework develops classic Bayesian statistics into a double-Bayesian decision mechanism involving two antagonistic Bayesian processes. A theoretically optimal learning rate can be derived from these two processes and used for stochastic gradient descent. Experiments across various classification, segmentation, and detection tasks corroborate the practical significance of the theoretically derived learning rate. The paper also discusses the ramifications of the proposed double-Bayesian framework for network training and model performance.
A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum. Here we argue, based on results from statistical physics, random matrix theory, neural network theory, and empirical evidence, that a deeper and more profound difficulty originates from the proliferation of saddle points, not local minima, especially in high dimensional problems of practical interest. Such saddle points are surrounded by high error plateaus that can dramatically slow down learning, and give the illusory impression of the existence of a local minimum. Motivated by these arguments, we propose a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods. We apply this algorithm to deep or recurrent neural network training, and provide numerical evidence for its superior optimization performance.
In an earlier paper~\cite{Luu:2019jmb} we discussed emergence from the context of effective field theories, particularly as related to the fields of particle and nuclear physics. We argued on the side of reductionism and weak emergence. George Ellis has critiqued our exposition in~\cite{Ellis:2020vij}, and here we provide our response to his critiques. Many of his critiques are based on incorrect assumptions related to the formalism of effective field theories and we attempt to correct these issues here. We also comment on other statements made in his paper. Important to note is that our response is to his critiques made in archive versions arXiv:2004.13591v1-5 [physics.hist-ph]. That is, versions 1-5 of this archive post. Version 6 has similar content as versions 1-5, but versions 7-9 are seemingly a different paper altogether (even with a different title).
Following Max Planck's hypothesis of quanta (quant-ph/0012069) and the matter wave idea of Louis de Broglie (quant-ph/9911107), Erwin Schroedinger proposed, at the beginning of 1926, the concept of wavefunction and wave equation for it. Though endowed with a realistic undular interpretation by its father, the wavefunction could not be considered as a real "matter wave" and has been provided with only abstract, formally probabilistic interpretation. In this paper we show how the resulting "mysteries" of usual theory are solved within the unreduced, dynamically multivalued description of the underlying, essentially nonlinear interaction process (quant-ph/9902015, quant-ph/9902016), without artificial modification of the Schroedinger equation. The latter is rigorously derived instead as universal expression of unreduced interaction complexity. Causal, totally realistic wavefunction is obtained as a dynamically probabilistic intermediate state of a simple system with interaction performing dynamically discrete transitions between its localised, incompatible "realisations" ("corpuscular" states). Causal wavefunction and Schroedinger equation are then extended to arbitrary level of world dynamics. We outline some applications of the obtained causal description, such as genuine quantum chaos (quant-ph/9511034-36) and realistic quantum devices (physics/0211071), and emphasize the basic difference of the proposed dynamically multivalued theory from dynamically single-valued imitations of causality and complexity. The causally complete wavefunction concept, representing the unified essence of unreduced (multivalued) complex dynamics, provides a clear distinctive feature of realistic science, absent in any its unitary imitation.
Many macroscopic physical processes are known to occur in a time-directed way despite the apparent time-symmetry of the known fundamental laws. A popular explanation is to postulate an unimaginably atypical state for the early universe -- a "Past Hypothesis" (PH) -- that seeds the time-asymmetry from which all others follow. I will argue that such a PH faces serious new difficulties. First I strengthen the grounds for existing criticism by providing a systematic analytic framework for assessing the status of the PH. I outline three broad categories of criticism that put into question a list of essential requirements of the proposal. The resulting analysis paints a grim picture for the prospects of providing an adequate formulation for an explicit PH. I then provide a new argument that substantively extends this criticism by showing that any time-independent measure on the space of models of the universe must necessarily break one of its gauge symmetries. The PH then faces a new dilemma: reject a gauge symmetry of the universe and introduce a distinction without difference or reject the time-independence of the measure and lose explanatory power.
We examine a covariant quantization of electromagnetic fields by using an operator derived from a constant scalar that can be called extended Lorentz gauge. The quantization can avoid an inconsistency between Lorentz gauge and a commutation relation, which can eliminate the need for introduction of physical state defined by a subsidiary condition and auxiliary field in Lagrangian density in Lorentz gauge. By using this quantization and indefinite metric straightforwardly, all quantum phenomena can be provided without enigmatic and paradoxical "probability interpretation".
Here within the basic design for a ground-based instrument for measuring the magnitude of the Earth's time-retarded transverse gravitational vector potential is described. The formula for the Earth's transverse vector potential is derived from the known formula for the neoclassical time-retarded transverse gravitational field (arXiv:0904.0383v2 [physics.gen-ph] 25May2010). The device senses the relativistic shift in the frequency of laser-diode oscillators set into circular motion at the tips of a two-arm rotor. The instrument employs fiber optics and a digital electronic interferometer/spectrometer to measure the effect of the relativistic time dilation on the frequency-modulated (FM) harmonic amplitudes in the beat signals between the tip-diodes and a stationary reference diode. The FM amplitudes depend on the orientation of the rotor. For the vertical-east-west orientation with a rotor frequency of 73.9 Hz, the predicted FM amplitudes for overtones at 148 Hz, 222 Hz, and 296 Hz are respectively 7x10^-10 Hz, 4x10^-11 Hz, and 9x10^-11 Hz. The overtones in the beat signals can be amplified and observed with a tunable FM digital audio amplifier. The measured values for the harmonics of the vector potential can be determined by back-calculating what the amplitudes must have been at the input to the amplifier. The instrument can be used to establish the speed of the Earth's gravitational field and to study the structure of the Earth's mantle and outer core.
We study several aspects of the behaviours produced by instruction sequences under execution in the setting of the algebraic theory of processes known as ACP. We use ACP to describe the behaviours produced by instruction sequences under execution and to describe two protocols implementing these behaviours in the case where the processing of instructions takes place remotely. We also show that all finite-state behaviours considered in ACP can be produced by instruction sequences under execution.
The Virtual Garbage Collector (VGC) proposes a zone-based memory management architecture aimed at improving execution predictability and memory behavior in Python runtimes. The design explores a dual-layer model consisting of an Active VGC, responsible for managing runtime object lifecycles, and a Passive VGC, intended as a compile-time optimization layer for static allocation planning. Rather than relying on traditional heap traversal or generational heuristics, VGC introduces memory zoning and checkpoint-based state evaluation to reduce allocation churn and constrain garbage collection scope. Execution partitioning is experimentally evaluated to isolate workloads and localize memory pressure, enabling more deterministic behavior under loop-intensive, recursive, and compute-heavy workloads. This work presents the architectural principles, execution model, and experimental observations of VGC within a partition-aware runtime context. While the full realization of the dual-layer design is an ongoing effort, the results indicate that zone-based allocation and partitioned execution provide a viable foundation for improving scalability and memory predictability in Python-oriented systems.