Scholar iON
Academic Synthesis
This collection of scholarly papers highlights diverse advancements in computational modeling across various scientific domains, emphasizing the integration of deep learning, constraint solving, and educational tools. The paper on protein folding introduces the CSAW model, which leverages statistical physics principles to simulate protein dynamics, revealing universal scaling laws applicable to different folding stages. Meanwhile, the dual-branch transformer framework for precipitation forecasting employs sophisticated machine learning techniques to enhance prediction accuracy over traditional systems, showcasing significant improvements in moderate to heavy rainfall forecasting. Additionally, the GrayStar application offers a novel, accessible platform for simulating and visualizing stellar atmospheres, underscoring the importance of user-friendly educational tools in facilitating real-time scientific exploration. Collectively, these studies demonstrate a commitment to leveraging computational innovations for solving complex problems and enhancing educational methodologies.
In this paper, we introduce an approach to the protein folding problem from the point of view of statistical physics. Protein folding is a stochastic process by which a polypeptide folds into its characteristic and functional 3D structure from random coil. The process involves an intricate interplay between global geometry and local structure, and each protein seems to present special problems. We introduce CSAW (conditioned self-avoiding walk), a model of protein folding that combines the features of self-avoiding walk (SAW) and the Monte Carlo method. In this model, the unfolded protein chain is treated as a random coil described by SAW. Folding is induced by hydrophobic forces and other interactions, such as hydrogen bonding, which can be taken into account by imposing conditions on SAW. Conceptually, the mathematical basis is a generalized Langevin equation. To illustrate the flexibility and capabilities of the model, we consider several examples, including helix formation, elastic properties, and the transition in the folding of myoglobin. From the CSAW simulation and physical arguments, we find a universal elastic energy for proteins, which depends only on the radius of gyration $R_{g}$ and the residue number $N$. The elastic energy gives rise to scaling laws $R_{g}\sim N^Ξ½$ in different regions with exponents $Ξ½=3/5,3/7,2/5$, consistent with the observed unfolded stage, pre-globule, and molten globule, respectively. These results indicate that CSAW can serve as a theoretical laboratory to study universal principles in protein folding.
This work is devoted to constraint solving motivated by the debugging of constraint logic programs a la GNU-Prolog. The paper focuses only on the constraints. In this framework, constraint solving amounts to domain reduction. A computation is formalized by a chaotic iteration. The computed result is described as a closure. This model is well suited to the design of debugging notions and tools, for example failure explanations or error diagnosis. In this paper we detail an application of the model to an explanation of a value withdrawal in a domain. Some other works have already shown the interest of such a notion of explanation not only for failure analysis.
Accurate medium-range precipitation forecasting is crucial for hydrometeorological risk management and disaster mitigation, yet remains challenging for current numerical weather prediction (NWP) systems. Traditional ensemble systems such as the Global Ensemble Forecast System (GEFS) struggle to maintain high skill, especially for moderate and heavy rainfall at extended lead times. This study develops a deep learning-based ensemble framework for multi-step precipitation prediction through joint modeling of a comprehensive set of atmospheric variables. The model is trained on ERA5 reanalysis data at 0.25$^{\circ}$ spatial resolution, with precipitation labels from NASA's Integrated Multi-satellite Retrievals for Global Precipitation Measurement (GPM) constellation (IMERG), incorporating 57 input variables, including upper-air and surface predictors. The architecture employs a patch-based Swin Transformer backbone with periodic convolutions to handle longitudinal continuity and integrates time and noise embeddings through conditional layer normalization. A dual-branch decoder predicts total precipitation and other variables, with targeted freezing of encoder-decoder pathways for specialized training. Training minimizes a hybrid loss combining the Continuous Ranked Probability Score (CRPS) and weighted log1p mean squared error (log1pMSE), balancing probabilistic accuracy and magnitude fidelity. During inference, the model ingests real-time Global Forecast System (GFS) initial conditions to generate 15-day forecasts autoregressively. Evaluation against GEFS using IMERG data demonstrates higher Critical Success Index (CSI) scores at precipitation thresholds of 0.1 mm, 1 mm, 10 mm, and 20 mm, highlighting improved performance for moderate to heavy rainfall.
GrayStar is a stellar atmospheric and spectral line modelling, post-processing, and visualisation code, suitable for classroom demonstrations and laboratory-style assignments, that has been developed in Java and deployed in JavaScript and HTML. The only software needed to compute models and post-processed observables, and to visualise the resulting atmospheric structure and observables, is a common Web browser. Therefore, the code will run on any common PC or related X86 (-64) computer of the type that typically serves classroom data projectors, is found in undergraduate computer laboratories, or that students themselves own, including those with highly portable form-factors such as net-books and tablets. The user requires no experience with compiling source code, reading data files, or using plotting packages. More advanced students can view the JavaScript source code using the developer tools provided by common Web browsers. The code is based on the approximate gray atmospheric solution and runs quickly enough on current common PCs to provide near-instantaneous results, allowing for real time exploration of parameter space. I describe the computational strategy and methodology as necessitated by Java and JavaScript. In an accompanying paper, I describe the user interface and its inputs and outputs and suggest specific pedagogical applications and projects. I have made the application itself, and the HTML, CSS, JavaScript, and Java source files available to the community. The Web application and source files may be found at www.ap.smu.ca/~ishort/GrayStar.
Prostate cancer (PCa) is the most frequently diagnosed malignancy in men and the eighth leading cause of cancer death worldwide. Multiparametric MRI (mpMRI) has become central to the diagnostic pathway for men at intermediate risk, improving de-tection of clinically significant PCa (csPCa) while reducing unnecessary biopsies and over-diagnosis. However, mpMRI remains limited by false positives, false negatives, and moderate to substantial interobserver agreement. Time-dependent diffusion (TDD) MRI, a novel sequence that enables tissue microstructure characterization, has shown encouraging preclinical performance in distinguishing clinically significant from insignificant PCa. Combining TDD-derived metrics with machine learning may provide robust, zone-specific risk prediction with less dependence on reader training and improved accuracy compared to current standard-of-care. This study protocol out-lines the rationale and describes the prospective evaluation of a home-developed AI-enhanced TDD-MRI software (PROSTDAI) in routine diagnostic care, assessing its added value against PI-RADS v2.1 and validating results against MRI-guided prostate biopsy.