Scholar iON
Academic Synthesis
The collection of scholarly papers reflects a broad spectrum of advancements in statistical methods and their applications across diverse fields, including quantum computing, black hole physics, artificial intelligence, and statistical significance. Vallury and Hollenberg's work extends the Quantum Computed Moments (QCM) method to estimate arbitrary ground state observables, highlighting its advantage in handling noise and suboptimal trial states, thus emphasizing its potential for near-term quantum hardware applications. Bellucci and Tiwari explore the geometric and statistical properties of black holes within the framework of string theory, underlining the stability and attractive nature of these configurations through state-space manifold analysis. Marra et al. provide a comprehensive survey of the integration of learning and reasoning in AI, particularly through neurosymbolic and statistical relational approaches, identifying shared dimensions that offer insights into the convergence of symbolic reasoning and neural networks. Zhu's paper addresses the construction of statistical significance in experiments, proposing a correlation with normal distribution integral probabilities. Collectively, these studies enhance the foundational understanding of complex systems and contribute to the development of more accurate computational and analytical methodologies.
The determination of ground state properties of quantum systems is a fundamental problem in physics and chemistry, and is considered a key application of quantum computers. A common approach is to prepare a trial ground state on the quantum computer and measure observables such as energy, but this is often limited by hardware constraints that prevent an accurate description of the target ground state. The quantum computed moments (QCM) method has proven to be remarkably useful in estimating the ground state energy of a system by computing Hamiltonian moments with respect to a suboptimal or noisy trial state. In this paper, we extend the QCM method to estimate arbitrary ground state observables of quantum systems. We present preliminary results of using QCM to determine the ground state magnetisation and spin-spin correlations of the Heisenberg model in its various forms. Our findings validate the well-established advantage of QCM over existing methods in handling suboptimal trial states and noise, extend its applicability to the estimation of more general ground state properties, and demonstrate its practical potential for solving a wide range of problems on near-term quantum hardware.
We study a class of fluctuating higher dimensional black hole configurations obtained in string theory/ $M$-theory compactifications. We explore the intrinsic Riemannian geometric nature of Gaussian fluctuations arising from the Hessian of the coarse graining entropy, defined over an ensemble of brane microstates. It has been shown that the state-space geometry spanned by the set of invariant parameters is non-degenerate, regular and has a negative scalar curvature for the rotating Myers-Perry black holes, Kaluza-Klein black holes, supersymmetric $AdS_5$ black holes, $D_1$-$D_5$ configurations and the associated BMPV black holes. Interestingly, these solutions demonstrate that the principal components of the state-space metric tensor admit a positive definite form, while the off diagonal components do not. Furthermore, the ratio of diagonal components weakens relatively faster than the off diagonal components, and thus they swiftly come into an equilibrium statistical configuration. Novel aspects of the scaling property suggest that the brane-brane statistical pair correlation functions divulge an asymmetric nature, in comparison with the others. This approach indicates that all above configurations are effectively attractive and stable, on an arbitrary hyper-surface of the state-space manifolds. It is nevertheless noticed that there exists an intriguing relationship between non-ideal inter-brane statistical interactions and phase transitions. The ramifications thus described are consistent with the existing picture of the microscopic CFTs. We conclude with an extended discussion of the implications of this work for the physics of black holes in string theory.
This survey explores the integration of learning and reasoning in two different fields of artificial intelligence: neurosymbolic and statistical relational artificial intelligence. Neurosymbolic artificial intelligence (NeSy) studies the integration of symbolic reasoning and neural networks, while statistical relational artificial intelligence (StarAI) focuses on integrating logic with probabilistic graphical models. This survey identifies seven shared dimensions between these two subfields of AI. These dimensions can be used to characterize different NeSy and StarAI systems. They are concerned with (1) the approach to logical inference, whether model or proof-based; (2) the syntax of the used logical theories; (3) the logical semantics of the systems and their extensions to facilitate learning; (4) the scope of learning, encompassing either parameter or structure learning; (5) the presence of symbolic and subsymbolic representations; (6) the degree to which systems capture the original logic, probabilistic, and neural paradigms; and (7) the classes of learning tasks the systems are applied to. By positioning various NeSy and StarAI systems along these dimensions and pointing out similarities and differences between them, this survey contributes fundamental concepts for understanding the integration of learning and reasoning.
A definition for the statistical significance by constructing a correlation between the normal distribution integral probability and the p-value observed in an experiment is proposed, which is suitable for both counting experiment and continuous test statistics.
Data science has become increasingly essential for the production of official statistics, as it enables the automated collection, processing, and analysis of large amounts of data. With such data science practices in place, it enables more timely, more insightful and more flexible reporting. However, the quality and integrity of data-science-driven statistics rely on the accuracy and reliability of the data sources and the machine learning techniques that support them. In particular, changes in data sources are inevitable to occur and pose significant risks that are crucial to address in the context of machine learning for official statistics.
This paper gives an overview of the main risks, liabilities, and uncertainties associated with changing data sources in the context of machine learning for official statistics. We provide a checklist of the most prevalent origins and causes of changing data sources; not only on a technical level but also regarding ownership, ethics, regulation, and public perception. Next, we highlight the repercussions of changing data sources on statistical reporting. These include technical effects such as concept drift, bias, availability, validity, accuracy and completeness, but also the neutrality and potential discontinuation of the statistical offering. We offer a few important precautionary measures, such as enhancing robustness in both data sourcing and statistical techniques, and thorough monitoring. In doing so, machine learning-based official statistics can maintain integrity, reliability, consistency, and relevance in policy-making, decision-making, and public discourse.
In this paper, we introduce an approach to the protein folding problem from the point of view of statistical physics. Protein folding is a stochastic process by which a polypeptide folds into its characteristic and functional 3D structure from random coil. The process involves an intricate interplay between global geometry and local structure, and each protein seems to present special problems. We introduce CSAW (conditioned self-avoiding walk), a model of protein folding that combines the features of self-avoiding walk (SAW) and the Monte Carlo method. In this model, the unfolded protein chain is treated as a random coil described by SAW. Folding is induced by hydrophobic forces and other interactions, such as hydrogen bonding, which can be taken into account by imposing conditions on SAW. Conceptually, the mathematical basis is a generalized Langevin equation. To illustrate the flexibility and capabilities of the model, we consider several examples, including helix formation, elastic properties, and the transition in the folding of myoglobin. From the CSAW simulation and physical arguments, we find a universal elastic energy for proteins, which depends only on the radius of gyration $R_{g}$ and the residue number $N$. The elastic energy gives rise to scaling laws $R_{g}\sim N^Ξ½$ in different regions with exponents $Ξ½=3/5,3/7,2/5$, consistent with the observed unfolded stage, pre-globule, and molten globule, respectively. These results indicate that CSAW can serve as a theoretical laboratory to study universal principles in protein folding.
While many good textbooks are available on Protein Structure, Molecular Simulations, Thermodynamics and Bioinformatics methods in general, there is no good introductory level book for the field of Structural Bioinformatics. This book aims to give an introduction into Structural Bioinformatics, which is where the previous topics meet to explore three dimensional protein structures through computational analysis. We provide an overview of existing computational techniques, to validate, simulate, predict and analyse protein structures. More importantly, it will aim to provide practical knowledge about how and when to use such techniques. We will consider proteins from three major vantage points: Protein structure quantification, Protein structure prediction, and Protein simulation & dynamics.
In this chapter we explore basic physical and chemical concepts required to understand protein folding. We introduce major (de)stabilising factors of folded protein structures such as the hydrophobic effect and backbone entropy. In addition, we consider different states along the folding pathway, as well as natively disordered proteins and aggregated protein states. In this chapter, an intuitive understanding is provided about the protein folding process, to prepare for the next chapter on the thermodynamics of protein folding. In particular, it is emphasized that protein folding is a stochastic process and that proteins unfold and refold in a dynamic equilibrium. The effect of temperature on the stability of the folded and unfolded states is also explained.
The prediction of protein stability changes following single-point mutations plays a pivotal role in computational biology, particularly in areas like drug discovery, enzyme reengineering, and genetic disease analysis. Although deep-learning strategies have pushed the field forward, their use in standard workflows remains limited due to resource demands. Conversely, potential-like methods are fast, intuitive, and efficient. Yet, these typically estimate Gibbs free energy shifts without considering the free-energy variations in the unfolded protein state, an omission that may breach mass balance and diminish accuracy. This study shows that incorporating a mass-balance correction (MBC) to account for the unfolded state significantly enhances these methods. While many machine learning models partially model this balance, our analysis suggests that a refined representation of the unfolded state may improve the predictive performance.
Multi-Agent Pathfinding (MAPF) plays a critical role in various domains. Traditional MAPF methods typically assume unit edge costs and single-timestep actions, which limit their applicability to real-world scenarios. MAPFR extends MAPF to handle non-unit costs with real-valued edge costs and continuous-time actions, but its geometric collision model leads to an unbounded state space that compromises solver efficiency. In this paper, we propose MAPFZ, a novel MAPF variant on graphs with non-unit integer costs that preserves a finite state space while offering improved realism over classical MAPF. To solve MAPFZ efficiently, we develop CBS-NIC, an enhanced Conflict-Based Search framework incorporating time-interval-based conflict detection and an improved Safe Interval Path Planning (SIPP) algorithm. Additionally, we propose Bayesian Optimization for Graph Design (BOGD), a discretization method for non-unit edge costs that balances efficiency and accuracy with a sub-linear regret bound. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods in runtime and success rate across diverse benchmark scenarios.
Higher-order spacing statistics in the $m$ superposed spectra of circular random matrices of the same class are studied numerically. We conjecture that for given $m$ (or order $k$) and $Ξ²$, the sequence of modified Dyson index $Ξ²'(k)$ (or $Ξ²'(m)$) obtained using the sum of absolute differences between the cumulative distribution functions method (denoted as $D(Ξ²')$) is unique. Also, for a given $k$, the distribution tends to the corresponding $k$-th order Poisson statistics in the limit $m\rightarrow \infty$. The quantum chaotic kicked top model for various Hilbert space dimensions is studied, and it is found to satisfy our conjecture. This involves the numerical verification of $m=2$ case of COE results. Our result can be used as a tool for the characterization of a system and to determine the symmetry structure of the system without desymmetrization of the spectra. Additionally, the comparative study of the higher-order spacing and ratio distributions in both $m=1$ and $m=2$ cases of COE as well as GOE is performed within and across these ensembles numerically using the $D(Ξ²')$ method. This study is carried out both by varying the dimension and keeping the number of realizations constant, and vice-versa. The same asymptotic higher-order statistics are observed across COE and GOE in terms of a given spectral fluctuation measure. But, within a given ensemble of COE or GOE, the results of higher-order spacing and ratio distributions agree with each other only up to some lower $k$, and beyond that, they start deviating from each other. Further, the spectral fluctuations of the intermediate map of various dimensions are studied. Various important observations and discussions from the analysis of our extensive numerical computations are presented.