Scholar iON
Academic Synthesis
The collected research underscores the diversity and depth of statistical methodologies applied across various domains, from large-scale computational frameworks to theoretical physics and systems biology. Mitra's "Spark-LLM-Eval" highlights innovations in scalable evaluation of large language models, emphasizing statistical rigor and efficiency through distributed computing, which is vital for practical applications in diverse fields. Majorana's historical exploration of statistical laws bridges physics and social sciences, reflecting ongoing debates about the philosophical implications of statistical mechanics. Meanwhile, Margolin et al.'s work on multivariate dependence in genetic networks and Minichini and Sciarrino's mutation models illustrate the application of advanced statistical techniques to uncover complex interactions in biological systems, demonstrating the capacity of statistical frameworks to resolve intricate biological phenomena. Collectively, these studies showcase the essential role of statistical coherence in advancing theoretical understanding and practical applications across disciplines.
Evaluating large language models at scale remains a practical bottleneck for many organizations. While existing evaluation frameworks work well for thousands of examples, they struggle when datasets grow to hundreds of thousands or millions of samples. This scale is common when assessing model behavior across diverse domains or conducting comprehensive regression testing. We present Spark-LLM-Eval, a distributed evaluation framework built natively on Apache Spark. The system treats evaluation as a data-parallel problem, partitioningexamplesacrossexecutorsandaggregatingresultswithproperstatistical accounting. Beyond raw throughput, we emphasize statistical rigor: every reported metric includes bootstrap confidence intervals, and model comparisons come with appropriate significance tests (paired t-tests, McNemar's test, or Wilcoxon signed-rank, depending on the metric type). The framework also addresses the cost problem inherent in LLM evaluation through content-addressable response caching backed by Delta Lake, which allows iterating on metric definitions without re-running inference. We describe the system architecture, the statistical methodology, and report benchmark results showing linear scaling with cluster size. The framework and all evaluation code are available as open source.
The mentioned article was written by Ettore Majorana, in a partially educational way, for a journal of Sociology; but he gave up publishing it (and threw it away). It appeared posthumous, thanks to Giovanni Gentile Jr. (the inventor of "parastatistics") in "Scientia" 36 (1942) 58-66. It has not been re-published, in Italian, till the beginning of 2006, when we made known some abridgements of it by Italian newspapers and by the journal "Fisica in Medicina". We don't know when was it written: perhaps in 1930. However, its central theme was still alive in Majorana's mind in 1934: in fact, on July 27, 1934, he will write to G.Gentile Jr. to expect that <<soon it will be generally understood that science ceased to be a justification for the vulgar materialism>>. Here, in Part I, we present a suitable reduction, edited by us, of Majorana's article; while in Part II we add a complete transcription of it. [Since the paper which appeared in "Scientia" contains some errors in the interpretation of Majorana's handwriting, the present versions have been very slightly "corrected" by us]. For the translations into English of Majorana's paper, see Refs.[5,6] below. A more extended Summary (in English, besides in Italian) can be found at the beginning of the present e-print. The interested reader can found all the known biographical documents --apart from the ones discovered during the last two years-- in the book by E.Recami, "Il Caso Majorana: Epistolario, Testimonianze, Documenti" (Mondadori, Milan, 1987 and 1991; Di Renzo Editore, Rome, 2000 and 2002); and in the e-prints arXiv:physics/9810023v4 [physics.hist-ph]; arXiv:0708.2855v1 [physics.hist-ph]; and arXiv:0709.1183 [physics.hist-ph].
A critical task in systems biology is the identification of genes that interact to control cellular processes by transcriptional activation of a set of target genes. Many methods have been developed to use statistical correlations in high-throughput datasets to infer such interactions. However, cellular pathways are highly cooperative, often requiring the joint effect of many molecules, and few methods have been proposed to explicitly identify such higher-order interactions, partially due to the fact that the notion of multivariate statistical dependency itself remains imprecisely defined. We define the concept of dependence among multiple variables using maximum entropy techniques and introduce computational tests for their identification. Synthetic network results reveal that this procedure uncovers dependencies even in undersampled regimes, when the joint probability distribution cannot be reliably estimated. Analysis of microarray data from human B cells reveals that third-order statistics, but not second-order ones, uncover relationships between genes that interact in a pathway to cooperatively regulate a common set of targets.
A nucleotides sequence is identified, in the two (four) letters alphabet, by the the labels of a vector state of an irreducible representation of U_q(sl(2)) (U_q(sl(2) + sl(2))), in the limit q -> 0. A master equation for the distribution function is written, where the intensity of the one-spin flip is assumed to depend from the variation of the labels of the state. In the two letters approximation, the numerically computed equilibrium distribution for short sequences is nicely fitted by a Yule distribution, which is the observed distribution of the ranked short oligonucleotides frequency in DNA. The four letter alphabet description, applied to the codons, is able to reproduce the form of the fitted rank ordered usage frequencies distribution.
We develop a theory of aggregation using statistical mechanical methods. An example of a complicated aggregation system with several levels of structures is peptide/protein self-assembly. The problem of protein aggregation is important for the understanding and treatment of neurodegenerative diseases and also for the development of bio-macromolecules as new materials. We write the effective Hamiltonian in terms of interaction energies between protein monomers, protein and solvent, as well as between protein filaments. The grand partition function can be expressed in terms of a Zimm-Bragg-like transfer matrix, which is calculated exactly and all thermodynamic properties can be obtained. We start with two-state and three-state descriptions of protein monomers using Potts models that can be generalized to include q-states, for which the exactly solvable feature of the model remains. We focus on n X N lattice systems, corresponding to the ordered structures observed in some real fibrils. We have obtained results on nucleation processes and phase diagrams, in which a protein property such as the sheet content of aggregates is expressed as a function of the number of proteins on the lattice and inter-protein or interfacial interaction energies. We have applied our methods to AΞ²(1-40) and Curli fibrils and obtained results in good agreement with experiments.
The evolution of the user's content still remains a problem for an accurate recommendation.This is why the current research aims to design Recommender Systems (RS) able to continually adapt information that matches the user's interests. This paper aims to explain this problematic point in outlining the proposals that have been made in research with their advantages and disadvantages.
We formulate option market making as a constrained, risk-sensitive control problem that unifies execution, hedging, and arbitrage-free implied-volatility surfaces inside a single learning loop. A fully differentiable eSSVI layer enforces static no-arbitrage conditions (butterfly and calendar) while the policy controls half-spreads, hedge intensity, and structured surface deformations (state-dependent rho-shift and psi-scale). Executions are intensity-driven and respond monotonically to spreads and relative mispricing; tail risk is shaped with a differentiable CVaR objective via the Rockafellar--Uryasev program. We provide theory for (i) grid-consistency and rates for butterfly/calendar surrogates, (ii) a primal--dual grounding of a learnable dual action acting as a state-dependent Lagrange multiplier, (iii) differentiable CVaR estimators with mixed pathwise and likelihood-ratio gradients and epi-convergence to the nonsmooth objective, (iv) an eSSVI wing-growth bound aligned with Lee's moment constraints, and (v) policy-gradient validity under smooth surrogates. In simulation (Heston fallback; ABIDES-ready), the agent attains positive adjusted P\&L on most intraday segments while keeping calendar violations at numerical zero and butterfly violations at the numerical floor; ex-post tails remain realistic and can be tuned through the CVaR weight. The five control heads admit clear economic semantics and analytic sensitivities, yielding a white-box learner that unifies pricing consistency and execution control in a reproducible pipeline.
This study investigated the dynamic connectivity patterns between EEG and fMRI modalities, contributing to our understanding of brain network interactions. By employing a comprehensive approach that integrated static and dynamic analyses of EEG-fMRI data, we were able to uncover distinct connectivity states and characterize their temporal fluctuations. The results revealed modular organization within the intrinsic connectivity networks (ICNs) of the brain, highlighting the significant roles of sensory systems and the default mode network. The use of a sliding window technique allowed us to assess how functional connectivity varies over time, further elucidating the transient nature of brain connectivity. Additionally, our findings align with previous literature, reinforcing the notion that cognitive states can be effectively identified through short-duration data, specifically within the 30-60 second timeframe. The established relationships between connectivity strength and cognitive processes, particularly during different visual states, underscore the relevance of our approach for future research into brain dynamics. Overall, this study not only enhances our understanding of the interplay between EEG and fMRI signals but also paves the way for further exploration into the neural correlates of cognitive functions and their implications in clinical settings. Future research should focus on refining these methodologies and exploring their applications in various cognitive and clinical contexts.
We present a new explanation for a quantum eraser. Mathematical description of the traditional explanation needs quantum-superposition states. However, the phenomenon can be explained without quantum-superposition states by introducing unobservable potentials which can be identified as an indefinite metric vector. In addition, a delayed choice experiment can also be explained by the interference between the photons and unobservable potentials, which seems like an unreal long-range correlation beyond the causality.
The analysis of the USA 2001 income distribution shows that it can be described by at least two main components, which obey the generalized Tsallis statistics with different values of the q parameter. Theoretical calculations using the gas kinetics model with a distributed saving propensity factor and two ensembles reproduce the empirical data and provide further information on the structure of the distribution, which shows a clear stratification. This stratification is amenable to different interpretations, which are analyzed. The distribution function is invariant with the average individual income, which implies that the inequity of the distribution cannot be modified by increasing the total income.