UNiON Scholar
UNiON Web Scholar iON AI About Scholar
64 scholarly results for machine learning
Scholar iON Academic Synthesis
This collection of scholarly papers underscores the transformative role of machine learning and artificial intelligence in advancing biotechnological and computational methodologies, particularly in protein stability prediction and CRISPR guide RNA design. Rossi et al. highlight the integration of mass-balance corrections to enhance protein stability predictions, bridging gaps left by traditional potential-like methods, while Abbaszadeh and Shahlai emphasize the potential of deep learning and explainable AI to improve the specificity and safety of CRISPR applications. Rostami et al. contribute to this discourse by proposing an ensemble model that enhances sgRNA design accuracy and generalizability, crucial for effective clinical applications. Leiter et al.'s report contextualizes these advancements within the broader AI landscape, noting a shift in research focus from natural language processing to computer vision and general machine learning, reflecting the field's dynamic evolution. Collectively, these studies illustrate a consensus on the importance of integrating AI to enhance predictive accuracy and application safety, while also highlighting ongoing debates about resource demands and model interpretability.
πŸŽ“ Deep dive with Scholar iON β†’
arxiv.org Β· scholarly article
Mass Balance Approximation of Unfolding Improves Potential-Like Methods for Protein Stability Predictions
Ivan Rossi; Guido Barducci; Tiziana Sanavia; Paola Turina; Emidio Capriotti; Piero Fariselli
2025 arXiv Open Access DOI: 10.1002/pro.70134
The prediction of protein stability changes following single-point mutations plays a pivotal role in computational biology, particularly in areas like drug discovery, enzyme reengineering, and genetic disease analysis. Although deep-learning strategies have pushed the field forward, their use in standard workflows remains limited due to resource demands. Conversely, potential-like methods are fast, intuitive, and efficient. Yet, these typically estimate Gibbs free energy shifts without considering the free-energy variations in the unfolded protein state, an omission that may breach mass balance and diminish accuracy. This study shows that incorporating a mass-balance correction (MBC) to account for the unfolded state significantly enhances these methods. While many machine learning models partially model this balance, our analysis suggests that a refined representation of the unfolded state may improve the predictive performance.
arxiv.org Β· scholarly article
CRISPR: Ensemble Model
Mohammad Rostami; Amin Ghariyazi; Hamed Dashti; Mohammad Hossein Rohban; Hamid R. Rabiee
2024 arXiv Open Access
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) is a gene editing technology that has revolutionized the fields of biology and medicine. However, one of the challenges of using CRISPR is predicting the on-target efficacy and off-target sensitivity of single-guide RNAs (sgRNAs). This is because most existing methods are trained on separate datasets with different genes and cells, which limits their generalizability. In this paper, we propose a novel ensemble learning method for sgRNA design that is accurate and generalizable. Our method combines the predictions of multiple machine learning models to produce a single, more robust prediction. This approach allows us to learn from a wider range of data, which improves the generalizability of our model. We evaluated our method on a benchmark dataset of sgRNA designs and found that it outperformed existing methods in terms of both accuracy and generalizability. Our results suggest that our method can be used to design sgRNAs with high sensitivity and specificity, even for new genes or cells. This could have important implications for the clinical use of CRISPR, as it would allow researchers to design more effective and safer treatments for a variety of diseases.
arxiv.org Β· scholarly article
Artificial Intelligence for CRISPR Guide RNA Design: Explainable Models and Off-Target Safety
Alireza Abbaszadeh; Armita Shahlai
2025 arXiv Open Access
CRISPR-based genome editing has revolutionized biotechnology, yet optimizing guide RNA (gRNA) design for efficiency and safety remains a critical challenge. Recent advances (2020--2025, updated to reflect current year if needed) demonstrate that artificial intelligence (AI), especially deep learning, can markedly improve the prediction of gRNA on-target activity and identify off-target risks. In parallel, emerging explainable AI (XAI) techniques are beginning to illuminate the black-box nature of these models, offering insights into sequence features and genomic contexts that drive Cas enzyme performance. Here we review how state-of-the-art machine learning models are enhancing gRNA design for CRISPR systems, highlight strategies for interpreting model predictions, and discuss new developments in off-target prediction and safety assessment. We emphasize breakthroughs from top-tier journals that underscore an interdisciplinary convergence of AI and genome editing to enable more efficient, specific, and clinically viable CRISPR applications.
arxiv.org Β· scholarly article
NLLG Quarterly arXiv Report 09/24: What are the most influential current AI Papers?
Christoph Leiter; Jonas Belouadi; Yanran Chen; Ran Zhang; Daniil Larionov; Aida Kostikova; Steffen Eger
2024 arXiv Open Access
The NLLG (Natural Language Learning & Generation) arXiv reports assist in navigating the rapidly evolving landscape of NLP and AI research across cs.CL, cs.CV, cs.AI, and cs.LG categories. This fourth installment captures a transformative period in AI history - from January 1, 2023, following ChatGPT's debut, through September 30, 2024. Our analysis reveals substantial new developments in the field - with 45% of the top 40 most-cited papers being new entries since our last report eight months ago and offers insights into emerging trends and major breakthroughs, such as novel multimodal architectures, including diffusion and state space models. Natural Language Processing (NLP; cs.CL) remains the dominant main category in the list of our top-40 papers but its dominance is on the decline in favor of Computer vision (cs.CV) and general machine learning (cs.LG). This report also presents novel findings on the integration of generative AI in academic writing, documenting its increasing adoption since 2022 while revealing an intriguing pattern: top-cited papers show notably fewer markers of AI-generated content compared to random samples. Furthermore, we track the evolution of AI-associated language, identifying declining trends in previously common indicators such as "delve".
arxiv.org Β· scholarly article
Sentra-Guard: A Real-Time Multilingual Defense Against Adversarial LLM Prompts
Md. Mehedi Hasan; Sk Tanzir Mehedi; Ziaur Rahman; Rafid Mostafiz; Md. Abir Hossain
2025 arXiv Open Access
This paper presents a real-time modular defense system named Sentra-Guard. The system detects and mitigates jailbreak and prompt injection attacks targeting large language models (LLMs). The framework uses a hybrid architecture with FAISS-indexed SBERT embedding representations that capture the semantic meaning of prompts, combined with fine-tuned transformer classifiers, which are machine learning models specialized for distinguishing between benign and adversarial language inputs. It identifies adversarial prompts in both direct and obfuscated attack vectors. A core innovation is the classifier-retriever fusion module, which dynamically computes context-aware risk scores that estimate how likely a prompt is to be adversarial based on its content and context. The framework ensures multilingual resilience with a language-agnostic preprocessing layer. This component automatically translates non-English prompts into English for semantic evaluation, enabling consistent detection across over 100 languages. The system includes a HITL feedback loop, where decisions made by the automated system are reviewed by human experts for continual learning and rapid adaptation under adversarial pressure. Sentra-Guard maintains an evolving dual-labeled knowledge base of benign and malicious prompts, enhancing detection reliability and reducing false positives. Evaluation results show a 99.96% detection rate (AUC = 1.00, F1 = 1.00) and an attack success rate (ASR) of only 0.004%. This outperforms leading baselines such as LlamaGuard-2 (1.3%) and OpenAI Moderation (3.7%). Unlike black-box approaches, Sentra-Guard is transparent, fine-tunable, and compatible with diverse LLM backends. Its modular design supports scalable deployment in both commercial and open-source environments. The system establishes a new state-of-the-art in adversarial LLM defense.
arxiv.org Β· scholarly article
Predicting Research Trends From Arxiv
Steffen Eger; Chao Li; Florian Netzer; Iryna Gurevych
2019 arXiv Open Access
We perform trend detection on two datasets of Arxiv papers, derived from its machine learning (cs.LG) and natural language processing (cs.CL) categories. Our approach is bottom-up: we first rank papers by their normalized citation counts, then group top-ranked papers into different categories based on the tasks that they pursue and the methods they use. We then analyze these resulting topics. We find that the dominating paradigm in cs.CL revolves around natural language generation problems and those in cs.LG revolve around reinforcement learning and adversarial principles. By extrapolation, we predict that these topics will remain lead problems/approaches in their fields in the short- and mid-term.
arxiv.org Β· scholarly article
Sig-SDEs model for quantitative finance
Imanol Perez Arribas; Cristopher Salvi; Lukasz Szpruch
2020 arXiv Open Access
Mathematical models, calibrated to data, have become ubiquitous to make key decision processes in modern quantitative finance. In this work, we propose a novel framework for data-driven model selection by integrating a classical quantitative setup with a generative modelling approach. Leveraging the properties of the signature, a well-known path-transform from stochastic analysis that recently emerged as leading machine learning technology for learning time-series data, we develop the Sig-SDE model. Sig-SDE provides a new perspective on neural SDEs and can be calibrated to exotic financial products that depend, in a non-linear way, on the whole trajectory of asset prices. Furthermore, we our approach enables to consistently calibrate under the pricing measure $\mathbb Q$ and real-world measure $\mathbb P$. Finally, we demonstrate the ability of Sig-SDE to simulate future possible market scenarios needed for computing risk profiles or hedging strategies. Importantly, this new model is underpinned by rigorous mathematical analysis, that under appropriate conditions provides theoretical guarantees for convergence of the presented algorithms.
arxiv.org Β· scholarly article
High-performance automatic categorization and attribution of inventory catalogs
Anton Kolonin
2022 arXiv Open Access
Techniques of machine learning for automatic text categorization are applied and adapted for the problem of inventory catalog data attribution, with different approaches explored and optimal solution addressing the tradeoff between accuracy and performance is selected.
arxiv.org Β· scholarly article
ISLAND: In-Silico Prediction of Proteins Binding Affinity Using Sequence Descriptors
Wajid Arshad Abbasi; Fahad Ul Hassan; Adiba Yaseen; Fayyaz Ul Amir Afsar Minhas
2017 arXiv Open Access DOI: 10.1186/s13040-020-00231-w
Determination of binding affinity of proteins in the formation of protein complexes requires sophisticated, expensive and time-consuming experimentation which can be replaced with computational methods. Most computational prediction techniques require protein structures which limit their applicability to protein complexes with known structures. In this work, we explore sequence based protein binding affinity prediction using machine learning. Our paper highlights the fact that the generalization performance of even the state of the art sequence-only predictor of binding affinity is far from satisfactory and that the development of effective and practical methods in this domain is still an open problem. We also propose a novel sequence-only predictor of binding affinity called ISLAND which gives better accuracy than existing methods over the same validation set as well as on external independent test dataset. A cloud-based webserver implementation of ISLAND and its Python code are available at the URL: http://faculty.pieas.edu.pk/fayyaz/software.html#island.
arxiv.org Β· scholarly article
WiCV 2019: The Sixth Women In Computer Vision Workshop
Irene Amerini; Elena Balashova; Sayna Ebrahimi; Kathryn Leonard; Arsha Nagrani; Amaia Salvador
2019 arXiv Open Access
In this paper we present the Women in Computer Vision Workshop - WiCV 2019, organized in conjunction with CVPR 2019. This event is meant for increasing the visibility and inclusion of women researchers in the computer vision field. Computer vision and machine learning have made incredible progress over the past years, but the number of female researchers is still low both in academia and in industry. WiCV is organized especially for the following reason: to raise visibility of female researchers, to increase collaborations between them, and to provide mentorship to female junior researchers in the field. In this paper, we present a report of trends over the past years, along with a summary of statistics regarding presenters, attendees, and sponsorship for the current workshop.