Scholar iON
Academic Synthesis
The selected scholarly papers collectively explore diverse applications and implications of machine learning and related technologies across various domains. Haque and Ginsparg (2009) investigate the positional effects on citation and readership on arXiv, highlighting visibility and self-promotion's impact on academic influence, with machine learning aiding in correlating early readership with long-term citations. Olde Scheper (2023) introduces Criticality Analysis, a bio-inspired method for nonlinear data representation that facilitates optimal data encoding relevant for machine learning through scale-free representations. Meanwhile, Oh et al. (2025) leverage continuous sensing technologies in the ETRI Lifelog Dataset to enhance understanding of human experiences, indicating potential applications in predicting sleep quality and stress via machine learning models. Lastly, Ismail and Buyya (2023) discuss the evolving Metaverse landscape, outlining its potential to seamlessly integrate real and virtual experiences, with machine learning playing a crucial role in realizing scalable and real-time virtual environments. These studies collectively underscore the transformative potential of machine learning in enhancing data processing, predictive modeling, and interactive virtual experiences.
arXiv.org mediates contact with the literature for entire scholarly communities, both through provision of archival access and through daily email and web announcements of new materials, potentially many screenlengths long. We confirm and extend a surprising correlation between article position in these initial announcements, ordered by submission time, and later citation impact, due primarily to intentional "self-promotion" on the part of authors. A pure "visibility" effect was also present: the subset of articles accidentally in early positions fared measurably better in the long-term citation record than those lower down. Astrophysics articles announced in position 1, for example, overall received a median number of citations 83\% higher, while those there accidentally had a 44\% visibility boost. For two large subcommunities of theoretical high energy physics, hep-th and hep-ph articles announced in position 1 had median numbers of citations 50\% and 100\% larger than for positions 5--15, and the subsets there accidentally had visibility boosts of 38\% and 71\%.
We also consider the positional effects on early readership. The median numbers of early full text downloads for astro-ph, hep-th, and hep-ph articles announced in position 1 were 82\%, 61\%, and 58\% higher than for lower positions, respectively, and those there accidentally had medians visibility-boosted by 53\%, 44\%, and 46\%. Finally, we correlate a variety of readership features with long-term citations, using machine learning methods, thereby extending previous results on the predictive power of early readership in a broader context. We conclude with some observations on impact metrics and dangers of recommender mechanisms.
The representation of arbitrary data in a biological system is one of the most elusive elements of biological information processing. The often logarithmic nature of information in amplitude and frequency presented to biosystems prevents simple encapsulation of the information contained in the input. Criticality Analysis (CA) is a bio-inspired method of information representation within a controlled self-organised critical system that allows scale-free representation. This is based on the concept of a reservoir of dynamic behaviour in which self-similar data will create dynamic nonlinear representations. This unique projection of data preserves the similarity of data within a multidimensional neighbourhood. The input can be reduced dimensionally to a projection output that retains the features of the overall data, yet has much simpler dynamic response. The method depends only on the rate control of chaos applied to the underlying controlled models, that allows the encoding of arbitrary data, and promises optimal encoding of data given biological relevant networks of oscillators. The CA method allows for a biologically relevant encoding mechanism of arbitrary input to biosystems, creating a suitable model for information processing in varying complexity of organisms and scale-free data representation for machine learning.
Improving human health and well-being requires an accurate and effective understanding of an individual's physical and mental state throughout daily life. To support this goal, we utilized smartphones, smartwatches, and sleep sensors to collect data passively and continuously for 24 hours a day, with minimal interference to participants' usual behavior, enabling us to gather quantitative data on daily behaviors and sleep activities across multiple days. Additionally, we gathered subjective self-reports of participants' fatigue, stress, and sleep quality through surveys conducted immediately before and after sleep. This comprehensive lifelog dataset is expected to provide a foundational resource for exploring meaningful insights into human daily life and lifestyle patterns, and a portion of the data has been anonymized and made publicly available for further research. In this paper, we introduce the ETRI Lifelog Dataset 2024, detailing its structure and presenting potential applications, such as using machine learning models to predict sleep quality and stress.
With the emergence of Cloud computing, Internet of Things-enabled Human-Computer Interfaces, Generative Artificial Intelligence, and high-accurate Machine and Deep-learning recognition and predictive models, along with the Post Covid-19 proliferation of social networking, and remote communications, the Metaverse gained a lot of popularity. Metaverse has the prospective to extend the physical world using virtual and augmented reality so the users can interact seamlessly with the real and virtual worlds using avatars and holograms. It has the potential to impact people in the way they interact on social media, collaborate in their work, perform marketing and business, teach, learn, and even access personalized healthcare. Several works in the literature examine Metaverse in terms of hardware wearable devices, and virtual reality gaming applications. However, the requirements of realizing the Metaverse in realtime and at a large-scale need yet to be examined for the technology to be usable. To address this limitation, this paper presents the temporal evolution of Metaverse definitions and captures its evolving requirements. Consequently, we provide insights into Metaverse requirements. In addition to enabling technologies, we lay out architectural elements for scalable, reliable, and efficient Metaverse systems, and a classification of existing Metaverse applications along with proposing required future research directions.
The history of Artificial Intelligence (AI) is a narrative of waves -- rising optimism followed by crashing disappointments. AI winters, such as the early 2000s, are often remembered as barren periods of innovation. This paper argues that such a perspective overlooks a crucial wave of AI that seems to be forgotten: the rise of the Semantic Web, which is based on knowledge representation, logic, and reasoning, and its interplay with intelligent Software Agents. Fast forward to today, and ChatGPT has reignited AI enthusiasm, built on deep learning and advanced neural models. However, before Large Language Models (LLMs) dominated the conversation, another ambitious vision emerged -- one where AI-driven Software Agents autonomously served Web users based on a structured, machine-interpretable Web. The Semantic Web aimed to transform the World Wide Web into an ecosystem where AI could reason, understand, and act. Between 2000 and 2010, this vision sparked a significant research boom, only to fade into obscurity as AI's mainstream narrative shifted elsewhere. Today, as LLMs edge toward autonomous execution, we revisit this overlooked wave. By analyzing its academic impact through bibliometric data, we highlight the Semantic Web's role in AI history and its untapped potential for modern Software Agent development. Recognizing this forgotten chapter not only deepens our understanding of AI's cyclical evolution but also offers key insights for integrating emerging technologies.
This study investigates the adoption and effectiveness of AI-based anomaly detection in cross-provider electronic health record (EHR) environments. It aims to (1) identify the organisational and digital capabilities required for successful implementation and (2) evaluate the performance and interpretability of lightweight anomaly detection approaches using contextual audit data. A semi-systematic scoping synthesis is conducted to derive a four-pillar readiness framework covering governance, infrastructure/interoperability, workforce, and AI integration, operationalised as a 10-item checklist with measurable indicators. This is complemented by a simulation of cross-provider audit logs incorporating contextual features such as provider mismatch, time of access, days since discharge, session duration, and access frequency. A rule-based approach is benchmarked against Isolation Forest, with SHAP used to explain model behaviour. Results show that rule-based methods achieve high recall but generate higher alert volumes, while Isolation Forest reduces alert burden at the cost of lower sensitivity. SHAP analysis highlights provider mismatch and off-hours access as dominant anomaly drivers. The study proposes a staged deployment strategy combining rules for coverage and machine learning for prioritisation, supported by explainability and continuous monitoring. The findings contribute a practical readiness framework and empirical insights to guide the implementation of AI-based anomaly detection in multi-provider healthcare environments.
We apply techniques in natural language processing, computational linguistics, and machine-learning to investigate papers in hep-th and four related sections of the arXiv: hep-ph, hep-lat, gr-qc, and math-ph. All of the titles of papers in each of these sections, from the inception of the arXiv until the end of 2017, are extracted and treated as a corpus which we use to train the neural network Word2Vec. A comparative study of common n-grams, linear syntactical identities, word cloud and word similarities is carried out. We find notable scientific and sociological differences between the fields. In conjunction with support vector machines, we also show that the syntactic structure of the titles in different sub-fields of high energy and mathematical physics are sufficiently different that a neural network can perform a binary classification of formal versus phenomenological sections with 87.1% accuracy, and can perform a finer five-fold classification across all sections with 65.1% accuracy.
Prostate cancer (PCa) is the most frequently diagnosed malignancy in men and the eighth leading cause of cancer death worldwide. Multiparametric MRI (mpMRI) has become central to the diagnostic pathway for men at intermediate risk, improving de-tection of clinically significant PCa (csPCa) while reducing unnecessary biopsies and over-diagnosis. However, mpMRI remains limited by false positives, false negatives, and moderate to substantial interobserver agreement. Time-dependent diffusion (TDD) MRI, a novel sequence that enables tissue microstructure characterization, has shown encouraging preclinical performance in distinguishing clinically significant from insignificant PCa. Combining TDD-derived metrics with machine learning may provide robust, zone-specific risk prediction with less dependence on reader training and improved accuracy compared to current standard-of-care. This study protocol out-lines the rationale and describes the prospective evaluation of a home-developed AI-enhanced TDD-MRI software (PROSTDAI) in routine diagnostic care, assessing its added value against PI-RADS v2.1 and validating results against MRI-guided prostate biopsy.
Water autoionization plays a critical role in determining pH and properties of various chemical and biological processes occurring in the water mediated environment. The strikingly unsymmetrical potential energy surface of the dissociation process poses a great challenge to the mechanistic study. Here, we demonstrate that reliable sampling of the ionization path is accessible through nanosecond timescale metadynamics simulation enhanced by machine learning of the neural network potentials with ab initio precision, which is proved by quantitatively reproduced water equilibrium constant (p$K_\mathrm{w}$=14.14) and ionization rate constant (1.566$\times10^{-3}$ s$^{-1}$). Statistical analysis unveils the asynchronous character of the concerted triple proton transfer process. Based on conditional ensemble average calculations, we propose a dual-presolvation mechanism, which suggests that a pair of hypercoordinated and undercoordinated waters bridged by one \ce{H2O} cooperatively constitutes the initiation environment for autoionization, and contributes majorly to the local electric field fluctuation to promote water dissociation.