UNiON Scholar
UNiON Web Scholar iON AI About Scholar
188 scholarly results for stat.AP
Scholar iON Academic Synthesis
The selected scholarly papers collectively emphasize the integration of advanced computational techniques in statistical and biological research, highlighting the evolving role of data science and machine learning in diverse fields. "Changing Data Sources in the Age of Machine Learning for Official Statistics" underscores the challenges and precautions necessary for adapting machine learning to official statistics, focusing on the integrity and reliability of changing data sources. In the realm of protein folding, both "Protein Folding: A Perspective From Statistical Physics" and "Introduction to Protein Folding" explore stochastic processes and computational models, such as CSAW, to understand protein structures and dynamics, emphasizing the intricacies of folding pathways and stability under varying conditions. Meanwhile, "Mass Balance Approximation of Unfolding Improves Potential-Like Methods for Protein Stability Predictions" highlights enhancements in predictive models for protein stability through mass-balance corrections, underscoring the significance of accurate representations of unfolded states. Collectively, these studies illustrate the critical intersection of computational methods and domain-specific applications, advancing both the precision of statistical analysis and the understanding of complex biological processes.
πŸŽ“ Deep dive with Scholar iON β†’
arxiv.org Β· scholarly article
Changing Data Sources in the Age of Machine Learning for Official Statistics
Cedric De Boom; Michael Reusens
2023 arXiv Open Access
Data science has become increasingly essential for the production of official statistics, as it enables the automated collection, processing, and analysis of large amounts of data. With such data science practices in place, it enables more timely, more insightful and more flexible reporting. However, the quality and integrity of data-science-driven statistics rely on the accuracy and reliability of the data sources and the machine learning techniques that support them. In particular, changes in data sources are inevitable to occur and pose significant risks that are crucial to address in the context of machine learning for official statistics. This paper gives an overview of the main risks, liabilities, and uncertainties associated with changing data sources in the context of machine learning for official statistics. We provide a checklist of the most prevalent origins and causes of changing data sources; not only on a technical level but also regarding ownership, ethics, regulation, and public perception. Next, we highlight the repercussions of changing data sources on statistical reporting. These include technical effects such as concept drift, bias, availability, validity, accuracy and completeness, but also the neutrality and potential discontinuation of the statistical offering. We offer a few important precautionary measures, such as enhancing robustness in both data sourcing and statistical techniques, and thorough monitoring. In doing so, machine learning-based official statistics can maintain integrity, reliability, consistency, and relevance in policy-making, decision-making, and public discourse.
arxiv.org Β· scholarly article
Protein Folding: A Perspective From Statistical Physics
Jinzhi Lei; Kerson Huang
2010 arXiv Open Access
In this paper, we introduce an approach to the protein folding problem from the point of view of statistical physics. Protein folding is a stochastic process by which a polypeptide folds into its characteristic and functional 3D structure from random coil. The process involves an intricate interplay between global geometry and local structure, and each protein seems to present special problems. We introduce CSAW (conditioned self-avoiding walk), a model of protein folding that combines the features of self-avoiding walk (SAW) and the Monte Carlo method. In this model, the unfolded protein chain is treated as a random coil described by SAW. Folding is induced by hydrophobic forces and other interactions, such as hydrogen bonding, which can be taken into account by imposing conditions on SAW. Conceptually, the mathematical basis is a generalized Langevin equation. To illustrate the flexibility and capabilities of the model, we consider several examples, including helix formation, elastic properties, and the transition in the folding of myoglobin. From the CSAW simulation and physical arguments, we find a universal elastic energy for proteins, which depends only on the radius of gyration $R_{g}$ and the residue number $N$. The elastic energy gives rise to scaling laws $R_{g}\sim N^Ξ½$ in different regions with exponents $Ξ½=3/5,3/7,2/5$, consistent with the observed unfolded stage, pre-globule, and molten globule, respectively. These results indicate that CSAW can serve as a theoretical laboratory to study universal principles in protein folding.
arxiv.org Β· scholarly article
Introduction to Protein Folding
Juami H. M. van Gils; Erik van Dijk; Ali May; Halima Mouhib; Jochem Bijlard; Annika Jacobsen; Isabel Houtkamp; K. Anton Feenstra; Sanne Abeln
2023 arXiv Open Access
While many good textbooks are available on Protein Structure, Molecular Simulations, Thermodynamics and Bioinformatics methods in general, there is no good introductory level book for the field of Structural Bioinformatics. This book aims to give an introduction into Structural Bioinformatics, which is where the previous topics meet to explore three dimensional protein structures through computational analysis. We provide an overview of existing computational techniques, to validate, simulate, predict and analyse protein structures. More importantly, it will aim to provide practical knowledge about how and when to use such techniques. We will consider proteins from three major vantage points: Protein structure quantification, Protein structure prediction, and Protein simulation & dynamics. In this chapter we explore basic physical and chemical concepts required to understand protein folding. We introduce major (de)stabilising factors of folded protein structures such as the hydrophobic effect and backbone entropy. In addition, we consider different states along the folding pathway, as well as natively disordered proteins and aggregated protein states. In this chapter, an intuitive understanding is provided about the protein folding process, to prepare for the next chapter on the thermodynamics of protein folding. In particular, it is emphasized that protein folding is a stochastic process and that proteins unfold and refold in a dynamic equilibrium. The effect of temperature on the stability of the folded and unfolded states is also explained.
arxiv.org Β· scholarly article
Mass Balance Approximation of Unfolding Improves Potential-Like Methods for Protein Stability Predictions
Ivan Rossi; Guido Barducci; Tiziana Sanavia; Paola Turina; Emidio Capriotti; Piero Fariselli
2025 arXiv Open Access DOI: 10.1002/pro.70134
The prediction of protein stability changes following single-point mutations plays a pivotal role in computational biology, particularly in areas like drug discovery, enzyme reengineering, and genetic disease analysis. Although deep-learning strategies have pushed the field forward, their use in standard workflows remains limited due to resource demands. Conversely, potential-like methods are fast, intuitive, and efficient. Yet, these typically estimate Gibbs free energy shifts without considering the free-energy variations in the unfolded protein state, an omission that may breach mass balance and diminish accuracy. This study shows that incorporating a mass-balance correction (MBC) to account for the unfolded state significantly enhances these methods. While many machine learning models partially model this balance, our analysis suggests that a refined representation of the unfolded state may improve the predictive performance.
arxiv.org Β· scholarly article
Multi-Agent Pathfinding with Non-Unit Integer Edge Costs via Enhanced Conflict-Based Search and Graph Discretization
Hongkai Fan; Qinjing Xie; Bo Ouyang; Yaonan Wang; Zhi Yan; Jiawen He; Zheng Fang
2026 arXiv Open Access
Multi-Agent Pathfinding (MAPF) plays a critical role in various domains. Traditional MAPF methods typically assume unit edge costs and single-timestep actions, which limit their applicability to real-world scenarios. MAPFR extends MAPF to handle non-unit costs with real-valued edge costs and continuous-time actions, but its geometric collision model leads to an unbounded state space that compromises solver efficiency. In this paper, we propose MAPFZ, a novel MAPF variant on graphs with non-unit integer costs that preserves a finite state space while offering improved realism over classical MAPF. To solve MAPFZ efficiently, we develop CBS-NIC, an enhanced Conflict-Based Search framework incorporating time-interval-based conflict detection and an improved Safe Interval Path Planning (SIPP) algorithm. Additionally, we propose Bayesian Optimization for Graph Design (BOGD), a discretization method for non-unit edge costs that balances efficiency and accuracy with a sub-linear regret bound. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods in runtime and success rate across diverse benchmark scenarios.
arxiv.org Β· scholarly article
Higher-order spacings in the superposed spectra of random matrices with comparison to spacing ratios and application to complex systems
Sashmita Rout; Udaysinh T. Bhosale
2025 arXiv Open Access
Higher-order spacing statistics in the $m$ superposed spectra of circular random matrices of the same class are studied numerically. We conjecture that for given $m$ (or order $k$) and $Ξ²$, the sequence of modified Dyson index $Ξ²'(k)$ (or $Ξ²'(m)$) obtained using the sum of absolute differences between the cumulative distribution functions method (denoted as $D(Ξ²')$) is unique. Also, for a given $k$, the distribution tends to the corresponding $k$-th order Poisson statistics in the limit $m\rightarrow \infty$. The quantum chaotic kicked top model for various Hilbert space dimensions is studied, and it is found to satisfy our conjecture. This involves the numerical verification of $m=2$ case of COE results. Our result can be used as a tool for the characterization of a system and to determine the symmetry structure of the system without desymmetrization of the spectra. Additionally, the comparative study of the higher-order spacing and ratio distributions in both $m=1$ and $m=2$ cases of COE as well as GOE is performed within and across these ensembles numerically using the $D(Ξ²')$ method. This study is carried out both by varying the dimension and keeping the number of realizations constant, and vice-versa. The same asymptotic higher-order statistics are observed across COE and GOE in terms of a given spectral fluctuation measure. But, within a given ensemble of COE or GOE, the results of higher-order spacing and ratio distributions agree with each other only up to some lower $k$, and beyond that, they start deviating from each other. Further, the spectral fluctuations of the intermediate map of various dimensions are studied. Various important observations and discussions from the analysis of our extensive numerical computations are presented.
arxiv.org Β· scholarly article
History of Lattice Field Theory from a Statistical Perspective
Wolfgang Bietenholz
2024 arXiv Open Access
Researchers working in lattice field theory constitute an established community since the early 1990s, and around the same time the online open-access e-print repository arXiv was created. The fact that this field has a specific arXiv section, hep-lat, which is comprehensively used, provides a unique opportunity for a statistical study of its evolution over the last three decades. We present data for the number of entries, $E$, published papers, $P$, and citations, $C$, in total and separated by nations. We compare them to six other arXiv sections (hep-ph, hep-th, gr-qc, nucl-th, quant-ph, cond-mat) and to two socio-economic indices of the nations involved: the Gross Domestic Product (GDP) and the Education Index (EI). We present rankings, which are based either on the Hirsch Index H, or on the linear combination $Ξ£= E + P + 0.05 C$. We consider both extensive and intensive national statistics, i.e. absolute and relative to the population or to the GDP.
semanticscholar.org Β· scholarly article
Towards Optimal pH of the Skin and Topical Formulations: From the Current State of the Art to Tailored Products
M. Lukić; I. Pantelić; S. Savić
2021 Cosmetics πŸ“– Cited 404 times Open Access DOI: 10.3390/cosmetics8030069
Acidic pH of the skin surface has been recognized as a regulating factor for the maintenance of the stratum corneum homeostasis and barrier permeability. The most important functions of acidic pH seem to be related to the keratinocyte differentiation process, the formation and function of epidermal lipids and the corneocyte lipid envelope, the maintenance of the skin microbiome and, consequently, skin disturbances and diseases. As acknowledged extrinsic factors that affect skin pH, topically applied products could contribute to skin health maintenance via skin pH value control. The obtained knowledge on skins’ pH could be used in the formulation of more effective topical products, which would add to the development of the so-called products β€˜for skin health maintenance’. There is a high level of agreement that topical products should be acidified and possess pH in the range of 4 to 6. However, formulators, dermatologists and consumers would benefit from some more precise guidance concerning favorable products pH values and the selection of cosmetic ingredients which could be responsible for acidification, together with a more extensive understanding of the mechanisms underlaying the process of skin acidification by topical products.
semanticscholar.org Β· scholarly article
Minimum Probabilistic Finite State Learning Problem on Finite Data Sets: Complexity, Solution and Approximations
E. Paulson; C. Griffin
2014 πŸ“– Cited 2 times
arxiv.org Β· scholarly article
The space of local equivalence classes of mixed two-qubit states
Anthony Sudbery
2000 arXiv Open Access
This paper is withdrawn by the author. It is superseded by Makhlin's paper quant-ph/0002045.