Program · International Joint Conference on Learning & Reasoning

Accepted Papers

Privisional List of Accepted Papers

Main Track
Slimane Larabi Can Mental Imagery Improve the Thinking Capabilities of AI Systems? Traditional text representations like embeddings and bag-of-words hinder rule learning and other interpretable machine learning methods due to high dimensionality and poor comprehensibility. This article investigates using Large Language Models (LLMs) to extract a small number of interpretable text features. We propose two workflows: one fully automated by the LLM (feature proposal and value calculation), and another where users define features and the LLM calculates values. This LLM-based feature extraction enables interpretable rule learning, overcoming issues like spurious interpretability seen with bag-of-words. We evaluated the proposed methods on five diverse datasets (including scientometrics, banking, hate speech). LLM-generated features yielded predictive performance similar to the SciBERT embedding model but used far fewer, interpretable features. Most generated features were considered relevant for the corresponding prediction tasks by human users. We illustrate practical utility on a case study focused on mining recommendation action rules for the improvement of research article quality and citation impact.
Alexander Schneider; Ute Schmid Aleph Plays Codenames: Flexible Semantic Grouping by Humans, LLMs, and ILP A challenging problem for the semantic grouping of nouns is that this task depends on context. In this paper, we explore different approaches to identify semantic relationships between nouns by providing a single category hint in the context of the board game "Codenames". The Inductive Logic Programming (ILP) approach, implemented using the Aleph framework, leverages GermaNet to structure background knowledge and generate logical hypotheses. More specifically, we compare the ILP approach, along with Large Language Models (LLM), and human-generated hints. The evaluation, based on precision, error distribution, and improvement over random guessing, indicates that human players outperform all computational methods in both precision and strategic hint generation. Among multiple ILP configurations, the merged background knowledge showed the best overall performance, while ChatGPT achieved competitive results without explicit semantic structures. Despite its slower runtime and occasional inability to provide solutions, ILP demonstrated strong interpretability and lower resource consumption compared to LLMs, emphasizing its potential as a robust and transparent alternative for specific semantic reasoning tasks. The findings underscore the long-lasting value of symbolic AI methods in scenarios that require explainable and verifiable outputs.
Tony Ribeiro; Maxime Folschette; Morgan Magnin; Katsumi Inoue; Tuan Nguyen; Kotaro Okazaki; Kuo-Yen Lo; Jérémie Poschmann; Antoine Roquilly Counterfactual Explanations Under Learning From Interpretation Transition Counterfactual explanations are instrumental in helping humans gain insight into the decision-making processes of artificial intelligence systems by illustrating the effects of altering specific input variables. By presenting hypothetical scenarios, they foster transparency in artificial intelligence, enabling us to comprehend its operations more deeply and cultivate confidence in its dependability. This transparency is essential for the development of ethical artificial intelligence systems that are both equitable and accountable. In this paper, we expand upon the Learning From Interpretation Transitions framework by proposing a theoretical modeling of counterfactual explanations for dynamic multi-valued logic programs. Furthermore, we introduce an efficient algorithm called CELOS that leverages logic rules properties to compute all minimal counterfactual explanations. We show through theoretical results the correctness of our approaches. Practical evaluation is performed on benchmarks from biological literature and synthetic instances.
Rojina Panta; Vedant Khandelwal; Celeste Veronese; Amit Sheth; Daniele Meli; Forest Agostinelli Inductive logic programming for heuristic search Pathfinding problems are found through computing, chemistry, mathematics, and robotics. Solving pathfinding problems is typically achieved through heuristic search, which is guided by a heuristic function, which can be learned using deep neural networks. However, since deep neural networks are typically not explainable, the extraction of new knowledge from these learned heuristic functions is cumbersome. On the other hand, it has yet to be shown how heuristic functions represented as logic programs, which have been shown to be explainable, can be learned. In this work, we present an algorithm to learn heuristic functions represented as logic programs using dynamic programming and inductive logic programming. Furthermore, we build on dynamic programming concepts to improve the learned logic programs by reusing predicates learned for solving simpler pathfinding problem instances to solve more complex instances. We use the 8-puzzle to demonstrate the effectiveness of our algorithm.
Dany Varghese; Alireza Tamaddoni-Nezhad Symbolic Regression via Inductive Logic Programming: An Explainable Alternative to Black-Box Models This paper introduces a symbolic regression framework based on Inductive Logic Programming (ILP) to address the growing demand for interpretable machine learning models in sensitive and regulation-intensive domains. Unlike black-box regressors such as ensemble methods or neural networks, our approach learns human-readable rules that explain how input features relate to output predictions using logic-based representations. We leverage the PyGol, a novel ILP system, to perform multi-class symbolic regression through a one-vs-rest strategy, where continuous targets are either preserved or discretised into symbolic labels. Each label is represented by a distinct set of logic rules defined over feature intervals, facilitating transparent and modular reasoning. A Bayesian-inspired scoring mechanism extends inference to noisy or partially matching instances, enhancing robustness. Through empirical evaluations on benchmark regression datasets, we demonstrate that PyGol achieves competitive predictive performance compared to state-of-the-art regressors while offering superior transparency and traceability. We further present sample learned rules and interpret their behaviour, highlighting the system's explanatory potential. This work affirms the value of ILP-based symbolic models as viable alternatives to black-box approaches, particularly where accountability and decision interpretability are paramount.
Lun Ai Boolean Matrix Logic Programming on the GPU Traditional logic programming relies on symbolic computation on the CPU, which can limit performance for large-scale inference tasks. Recent advances in GPU hardware enable high-throughput matrix operations, motivating a shift toward parallel logic inference. Boolean Matrix Logic Programming (BMLP) introduces a novel approach to datalog query evaluation using Boolean matrix algebra, well-suited to GPU acceleration. Building on this paradigm, we present two GPU-accelerated BMLP algorithms for bottom-up inference over linear dyadic recursive datalog programs. We further extend its theoretical framework to support general linear recursion with binary predicates. Empirical evaluations on reachability queries in large directed graphs and the Freebase 15K dataset show that our methods achieve 1–4 orders of magnitude speed up over state-of-the-art systems. These results demonstrate that Boolean matrix-based reasoning can significantly advance the scalability and efficiency of logic programming on modern hardware.
Bhavan Vasu; Guiseppe Raffa; Prasad Tadepalli Local-to-Global Logical Explanations for Deep Vision Models While deep neural networks are extremely effective at classifying images, they remain opaque and hard to interpret. We introduce local and global explanation methods for black-box models that generate explanations in terms of human-recognizable primitive concepts. Both the local explanations for a single image and the global explanations for a set of images are cast as logical formulas in monotone disjunctive-normal-form (MDNF), whose satisfaction guarantees that the model yields a high score on a given class. We also present an algorithm for explaining the classification of examples into multiple classes in the form of a monotone explanation list over primitive concepts. Despite their simplicity and interpretability we show that the explanations maintain high fidelity and accuracy with respect to the blackbox models they seek to explain in challenging vision datasets.
Elisabetta Gentili; Alice Bizzarri; Damiano Azzolini; Fabrizio Riguzzi The Gradient Semiring for Probabilistic Answer Set Programming and Its Application to Parameter Learning Semirings have been widely used in the last years to describe in an abstract way various tasks in Statistical Relational Artificial Intelligence. Here, we focus on the Probabilistic Answer Set Programming (PASP) formalism and propose an algebraic characterization of parameter learning, where the goal is to tune the probabilities of a program to model a set of examples. We introduce the gradient semiring for PASP and implement it on top of a state-of-the-art solver. A preliminary experimental evaluation shows the validity of our approach.
Shraddha Surana; Ashwin Srinivasan; Michael Bain Structured Program Synthesis using LLMs: Results and Insights from the IPARC Challenge The IPARC Challenge, inspired by ARC, provides controlled program synthesis tasks over synthetic images to evaluate automatic program construction, focusing on sequence, selection, and iteration. This set of 600 tasks has resisted automated solutions. This paper presents a structured inductive programming approach with LLMs that successfully solves tasks across all IPARC categories. The controlled nature of IPARC reveals insights into LLM-based code generation, including the importance of prior structuring, LLMs' ability to aid structuring (requiring human refinement), the need to freeze correct code, the efficiency of code reuse, and how LLM-generated code can spark human creativity. These findings suggest valuable mechanisms for human-LLM collaboration in tackling complex program synthesis.
Moitree Basu Specification of Declarative Privacy Constraints in Artificial Intelligence The increasing complexity of systems has resulted in the need for interpretable systems that protect the users' privacy, without sacrificing utility. To address this concern, we propose a novel approach that can model complex systems and specify privacy requirements for them. Furthermore, we propose a synthesizer that automatically produces solutions meeting those requirements. This synthesizer automatically chooses among privacy-preserving techniques and optimizes the parameters associated with the chosen technique. We demonstrate the application of our specification language in the context of a medical problem where useful statistics are computed on sensitive information in a cost-effective, interpretable, and privacy-preserving way.
Yun-Ze Li; Wang-Zhou Dai; Hao Meng; Xia Nu; Yi-Fei Xiao; Zhe-Li Hu; Yu-Cong He Enhancing LLM-Base Knowledge Retrieval by Automatic Workflow Induction Modern Large Language Models (LLMs) achieved great success in complex real-world tasks, such as question-answering, code generation, information retrieval, planning, etc. Agentic workflows further enhance the capability of LLM in domains that lack enough data or computing resources for performing a full-fetched pre-training, and empower the LLM with the ability to use tools. However, it is difficult for end users to manually design an optimal workflow requires substantial human effort and domain knowledge. Therefore, we propose a novel approach to learn abstraction and induce agentic workflows from a handful of training examples, i.e., the interaction traces between the LLM and the task environment (e.g., human users), such as dialogue logs in conversational systems. Specifically, our approach leverages the successful traces to derive structured workflows without handcrafted knowledge bases. We formalize the agentic workflow learning task as a logic rule induction problem, which could learn complex workflows. Instead of asking users for a set of pre-defined primitive predicates, our approach abstract the primitives logic from the training execution traces. Experiments on conversational clarification demonstrate significant improvements over LLM-based Autonomous Agents and Monte Carlo Tree Search Workflow Generation Method, achieving higher success rates and robustness with limited dialog interactions.
Stephen Muggleton ReDuce: Linear-time Inductive Compression using Greedy Folding This paper describes a new Inductive Logic Programming system, ReDuce, which is a linear time variant of a key element of the author’s Deeplog system. ReDuce differs from Deeplog by replacing the Meta-Interpretive Learning hypothesis generator by a novel inductive variant of an existing near-optimal grammar-based greedy text compression algorithm. Having identified minimal length input/output certificate sequences for the examples ReDuce iteratively applies star (repeat) and fold operations in order to generalise the examples and invent referenced sub-predicates. This compaction process is guaranteed by construction to converge, and results in a deeply structured recursive program which both compacts and generalises the examples. The approach is proved to run in time and space which are linear with respect to the certificate sequence lengths. The identified regular expressions are used to generate an H22 logic program from the learned folded certificates. Experiments indicate hypothesis construction time to be, in some case, more than ten-thousand-fold faster than corresponding examples in the author’s previously published Deeplog paper. As with Deeplog, an existing Bayesian sample complexity result shows low generality hypotheses can be learned with high expected accuracy from small numbers of positive examples. In further work the author aims to explore identification of noisy examples using a modified text compression approach.
Zora Wurm; Kilian Rückschloß; Felix Weitkämper From probability to causality in probabilistic logic programming Probabilistic logic programming is a formalism of statistical relational artificial intelligence that supports causal queries, which do not reduce to computations in an associated probability distribution. This work studies interventions on a system, such as policy actions or other changes not accounted for by the system itself. It turns out that the effect of interventions is uniquely determined by the distribution and the order of cause and effect. When the structure of a probabilistic logic program is inferred from data, only probabilistic information is taken into account. Generally, one probability distribution can be compatible with several different causal orders. This raises the question of when the causal order, and thus the correct interventional reasoning, is uniquely determined. In the setting of causal Bayesian networks, this question has motivated the field of causal structure discovery. In this contribution, we exploit the relationship between acyclic probabilistic logic programs and Bayesian networks to derive conditions under which the probabilistic information encoded in a probabilistic logic program suffices to determine the causal order. One of the key motivations for probabilistic logic programming is its support for compact relational specifications. In the context of learning, these specifications convey that some parts of the model must behave the same way. We incorporate such constraints by taking into account prescribed sets of causal symmetries derived from the underlying relational vocabulary. Overall, our approach verifies the interventional reasoning given by a program, whose structure has been determined from data.
Simon Flügel; Martin Glauer; Till Mossakowski; Fabian Neuhaus ChemLog: Making MSOL Viable for Ontological Classification and Learning Despite its prevalence, in many domains, OWL is not expressive enough to define ontology classes. In this paper, we present an approach that allows to use monadic second-order formalisations for ontology classification. As a case study, we have applied our approach to 14 peptide-related classes from the chemistry ontology ChEBI. For these classes, a monadic second-order logic formalisation has been developed and applied both to ChEBI as well as to 119 million molecules from the chemistry database PubChem. While this logical approach alone is limited to classification for the specified classes (in our case, (sub)classes of peptides), transformer deep learning models scale classification to the whole of the ChEBI ontology. We show that when using the classifications obtained by the logical approach as training data, the performance of the deep learning models can be significantly enhanced."
Sopam Dasgupta; Sadaf MD Halim; Joaqu'in Arias; Elmer Salazar; Gopal Gupta P2C: Path to Counterfactuals Machine-learning models are increasingly driving decisions in high-stakes settings, such as finance, law, and hiring. Thus highlighting the need for transparency. However, the key challenge is to balance transparency—--clarifying `why' a decision was made—--with recourse: providing actionable steps on `how' to achieve a favourable outcome from an unfavourable outcome. Counterfactual explanations reveal `why' an undesired outcome occurred and `how' to reverse it through targeted feature changes (interventions). Current counterfactual approaches have limitations: 1) they often ignore causal dependencies between features, and 2) they typically assume all interventions can happen simultaneously, an unrealistic assumption in practical scenarios where actions are typically taken in a sequence. As a result, these counterfactuals are often not achievable in the real world. We present P2C (Path-to-Counterfactuals), a model-agnostic framework that produces a plan (ordered sequence of actions) converting an unfavourable outcome to a causally consistent favourable outcome. P2C addresses both limitations by 1) Explicitly modelling causal relationships between features and 2) Ensuring that each intermediate state in the plan is feasible and causally valid. P2C uses the goal-directed Answer Set Programming system s(CASP) to generate the plan accounting for feature changes that happen automatically due to causal dependencies. Furthermore, P2C refines cost (effort) computation by only counting changes actively made by the user, resulting in realistic cost estimates. Finally, P2C highlights how its causal planner outperforms standard planners, which lack causal knowledge and thus generate illegal actions.
Fadwa Idlahcen; Peter Jung; Giuseppa Marra; Ondrej Kuzelka Neural Markov Logic Networks with Tree Axiom Neural Markov Logic Networks (NMLNs) integrate neural potentials with the logical structure of Markov Logic Networks (MLNs), enabling differentiable learning over relational domains. We extend NMLNs with a tree axiom: a hard constraint ensuring that a binary predicate defines a tree. Our approach combines neural potentials with combinatorial algorithms to enable exact sampling of tree structures, supporting lifted inference in NMLNs with structural constraints.
Felix Vossel; Till Mossakowski; Björn Gehrke Advancing Natural Language formalization to First Order Logic with Fine-tuned LLMs Automating the translation of natural language to first-order logic (FOL) is crucial for knowledge representation and formal methods, yet remains challenging. We present a systematic evaluation of fine-tuned LLMs for this task, comparing architectures (encoder-decoder vs. decoder-only) and training strategies. Using the MALLS and Willow datasets, we explore techniques like vocabulary extension, predicate conditioning, and multilingual training, introducing metrics for exact match, logical equivalence, and predicate alignment. Our fine-tuned Flan-T5-XXL achieves 70% accuracy with predicate lists, outperforming GPT-4o and even the DeepSeek-R1-0528 model with CoT reasoning ability aswell as symbolic systems like ccg2lambda. Key findings show: (1) predicate availability boosts performance by 15-20%, (2) T5 models surpass larger decoder-only LLMs, and (3) models generalize to unseen logical arguments (FOLIO dataset) without specific training. While structural logic translation proves robust, predicate extraction emerges as the main bottleneck.
Nijesh Upreti Satisfiability Modulo Theory Meets Inductive Logic Programming Inductive Logic Programming (ILP) is a powerful framework that bridges logic programming and machine learning. Despite its strengths, traditional ILP systems are generally limited to reasoning over discrete-valued predicates, often expressed through Horn clauses. This paper explores an enhanced approach to ILP by leveraging Satisfiability Modulo Theories (SMT), which offers richer representational capabilities, especially for continuous and hybrid domains. We propose a more expressive foundation for inductive declarative programming and outline how SMT solvers can extend ILP's reach.
Aswathy Wilson; J Anitha; Dany Varghese Explainable and Verifiable ASD Detection via Inductive Logic Programming: A Comparative Study with SHAP and LIME Autism Spectrum Disorder (ASD) diagnosis relies on integrating heterogeneous behavioral and cognitive indicators, demanding AI systems that are not only accurate but also interpretable and verifiable. In this study, we present an explainable ASD detection framework based on Inductive Logic Programming (ILP), using phenotypic data from the ABIDE dataset. Unlike black-box models, ILP produces symbolic rules in first-order logic, supporting clinical transparency and auditability. We evaluate ILP against standard machine learning models (e.g., Random Forest, SVM, Gradient Boosting) using 10-fold cross-validation and report competitive accuracy, with ILP demonstrating superior specificity and high precision—critical metrics in clinical screening. We further compare the interpretability of ILP explanations with state-of-the-art post-hoc methods, SHAP and LIME, using a held-out test instance. While all methods identify consistent predictive features, ILP offers globally consistent, human-readable rules that are more accessible to non-expert users. Our findings affirm ILP as a viable and trustworthy alternative for ASD classification, providing both predictive utility and symbolic transparency. Future work will extend this approach to incorporate fMRI-derived features, enabling richer multimodal reasoning in neurodevelopmental diagnostics.

Recently Published Papers Track
Cainã F. Pereira; Daniel S. Menasché; Gerson Zaverucha; Aline Paes; Valmir C. Barbosa A Utility-Driven Approach to Instance-Based Transfer Learning for Relational Domains Statistical relational learning involves exploring a complex search space of objects, their relationships, and probability parameters to find an optimal model. To reduce search complexity, previous work has explored taking advantage of a learned model in a source domain and transfer it to a target domain. However, these models are not always available and imperfect learning in the source domain can hinder the performance in the target domain. This paper proposes to leverage the instances of a source domain instead of its learned model. A simple solution, such as concatenating instances from both domains, is likely ineffective due to the potential negative impact of irrelevant or poor-quality instances. We address this by framing instance selection as a task of fair resource allocation, where utilities are parameterized to capture the relevance of each instance. We introduce a method called UTIL-BRDN, which applies this utility-driven approach to Boosted Relational Dependency Networks (RDN-Boost). Our experimental results show that UTIL-BRDN effectively transfers knowledge by reusing instances from other domains and is robust against negative transfer. Our contributions include introducing instance-based transfer learning to statistical relational learning, developing a utility-driven approach to instance selection, extending RDN-Boost to handle multiple domains and utilities, and conducting an extensive empirical evaluation of the proposed method.
Pat Langley Learning Hierarchical Task Knowledge for Planning In this paper, I review approaches for acquiring hierarchical knowledge to improve the effectiveness of planning systems. First I note some benefits of such hierarchical content and the advantages of learning over manual construction. After this, I consider alternative paradigms for encoding and acquiring plan expertise before turning to hierarchical task networks. I specify the inputs to HTN learners and three subproblems they must address: identifying hierarchical structure, unify- ing method heads, and finding method conditions. Finally, I pose seven challenges the community should pursue so that techniques for learning HTNs can reach their full potential.
Victor Verreet; Lennert De Smet; Luc De Raedt; Emanuele Sansone EXPLAIN, AGREE, LEARN: Scaling Learning for Neural Probabilistic Logic Neural probabilistic logic systems follow the neuro-symbolic (NeSy) paradigm by combining the perceptive and learning capabilities of neural networks with the robustness of probabilistic logic. Learning corresponds to likelihood optimization of the neural networks. However, to obtain the likelihood exactly, expensive probabilistic logic inference is required. To scale learning to more complex systems, we therefore propose to instead optimize a sampling based objective. We prove that the objective has a bounded error with respect to the likelihood, which vanishes when increasing the sample count. Furthermore, the error vanishes faster by exploiting a new concept of sample diversity. We then develop the EXPLAIN, AGREE, LEARN (EXAL) method that uses this objective. EXPLAIN samples explanations for the data. AGREE reweighs each explanation in concordance with the neural component. LEARN uses the reweighed explanations as a signal for learning. In contrast to previous NeSy methods, EXAL can scale to larger problem sizes while retaining theoretical guarantees on the error. Experimentally, our theoretical claims are verified and EXAL outperforms recent NeSy methods when scaling up the MNIST addition and Warcraft pathfinding problems.
Akihiro Yamamoto Implementing Derivations of Definite Logic Programs with Self-Attention Networks: Revised and Extended Verison In this paper we propose that a restricted version of logical inference can be implemented with self-attention networks. We are aiming at showing that LLMs (Large Language Models) constructed with transformer networks can make logical inference. We would reveal the potential of LLMs by analyzing self-attention networks, which are main components of transformer networks. Our approach is not based on semantics of sentences in natural languages but operations of logical inference. We show that hierarchical constructions of self-attention networks with feed forward networks (FFNs) can implement top-down derivations for a class of logical formulae. We discuss how to extend the implementation for a larger class. We also show bottom-up derivations are also implemented for the same class. We believe that our results show that LLMs implicitly have the power of logical inference.
Felix Weitkämper Scaling the weight parameters in Markov logic networks and relational logistic regression models Extrapolation with domain size has received plenty of attention recently, both in its own right and as part of the broader issue of scaling inference and learning to large domains. We consider Markov logic networks and relational logistic regression as two fundamental representation formalisms in statistical relational artificial intelligence that use weighted formulas in their specification. However, Markov logic networks are based on undirected graphs, while relational logistic regression is based on directed acyclic graphs. We show that when scaling the weight parameters with the domain size, the asymptotic behaviour of a relational logistic regression model is transparently controlled by the parameters, and we supply an algorithm to compute asymptotic probabilities. We show using two examples that this is not true for Markov logic networks. We also discuss using several examples, mainly from the literature, how the application context can help the user to decide when such scaling is appropriate and when using the raw unscaled parameters might be preferable.
Felix Weitkämper The generalised distribution semantics and projective families of distributionsn We generalise the distribution semantics underpinning probabilistic logic programming by distilling its essential concept, the separation of a free random component and a deterministic part. This abstracts the core ideas beyond logic programming as such to encompass frameworks from probabilistic databases, probabilistic finite model theory and discrete lifted Bayesian networks. To demonstrate the usefulness of such a general approach, we completely characterise the projective families of distributions representable in the generalised distribution semantics and we demonstrate both that large classes of interesting projective families cannot be represented in a generalised distribution semantics and that already a very limited fragment of logic programming (acyclic determinate logic programs) in the deterministic part suffices to represent all those projective families that are representable in the generalised distribution semantics at all.
Lucie Dvořáčková; Marcin Joachimiak; Michal Černý; Adriana Kubecová Vilém Sklenák;Tomas Kliegr Explaining word embeddings with perfect fidelity: A case study in predicting research impact The best-performing approaches for scholarly document quality prediction are based on embedding models. In addition to their performance when used in classifiers, embedding models can also provide predictions even for words that were not contained in the labelled training data for the classification model, which is important in the context of the ever-evolving research terminology. Although model-agnostic explanation methods, such as Local interpretable model-agnostic explanations, can be applied to explain machine learning classifiers trained on embedding models, these produce results with questionable correspondence to the model. We introduce a new feature importance method, Self-model Rated Entities (SMER), for logistic regression-based classification models trained on word embeddings. We show that SMER has theoretically perfect fidelity with the explained model, as the average of logits of SMER scores for individual words (SMER explanation) exactly corresponds to the logit of the prediction of the explained model. Quantitative and qualitative evaluation is performed through five diverse experiments conducted on 50,000 research articles (papers) from the CORD-19 corpus. Through an AOPC curve analysis, we experimentally demonstrate that SMER produces better explanations than LIME, SHAP and global tree surrogates.
Vojtěch Balek;Lukáš Sýkora;Vilém Sklenák;Tomas Kliegr LLM-based feature generation from text for interpretable machine learning Traditional text representations like embeddings and bag-of-words hinder rule learning and other interpretable machine learning methods due to high dimensionality and poor comprehensibility. This article investigates using Large Language Models (LLMs) to extract a small number of interpretable text features. We propose two workflows: one fully automated by the LLM (feature proposal and value calculation), and another where users define features and the LLM calculates values. This LLM-based feature extraction enables interpretable rule learning, overcoming issues like spurious interpretability seen with bag-of-words. We evaluated the proposed methods on five diverse datasets (including scientometrics, banking, hate speech). LLM-generated features yielded predictive performance similar to the SciBERT embedding model but used far fewer, interpretable features. Most generated features were considered relevant for the corresponding prediction tasks by human users. We illustrate practical utility on a case study focused on mining recommendation action rules for the improvement of research article quality and citation impact.
Daniel Cyrus; Dany Varghese; Alireza Tamaddoni_nezhad Numerical-Symbolic Learning from Biomedical Data Learning from small datasets is crucial in biomedical research due to the limited availability of large, annotated data in many domains. Inductive Logic Programming (ILP) offers a robust framework for integrating symbolic reasoning with machine learning, enabling the generation of interpretable models. In this work, we explore the application of numerical-symbolic learning approaches to biomedical data using ILP systems such as NumLog, PyGol, and NumSynth. These systems demonstrate superior efficiency in handling numerical features and extracting meaningful rules compared to traditional rule-learning and machine learning methods. We evaluate these approaches on two datasets: a neurodegenerative dataset for Alzheimer's disease detection from fundus images and the benchmark Breast Cancer dataset. The results underscore the potential of ILP-based numerical-symbolic learning in identifying complex relationships within biomedical data, providing actionable insights for advancing precision medicine and disease diagnosis.

Late-breaking Papers, Posters and Demo Track
Vedat Yasar; Kishore Srinivasan; Sheila Favaedi; Shiva Favaedi; Harsh Marthak; Aqib Hafiz; Ali Shahebrahimi; Graeme Gourlay; Alireza Tamaddoni Nezhad Integrating Language Models into Inductive Logic Programming: Enhancing Knowledge Integration and Human-Centric Explainability Inductive Logic Programming (ILP) enables knowledge-driven learning and interpretability by generating symbolic rules that make machine learning decisions transparent. However, ILP relies on expert-generated or pre-defined encoding of background knowledge. Moreover, its output, while formally explainable, can be difficult for humans to read and interpret, limiting their practical utility. We propose a hybrid framework in which LLMs are integrated with ILP both to generate logic-compatible knowledge representations from raw data and to verbalise symbolic rules into natural language. These enhancements are treated independently, allowing us to evaluate their distinct contributions to explainability and learning performance. Through empirical studies grounded in real-world datasets, we explore the potential of language models not only as surface-level communicators, but as active participants in symbolic reasoning workflows. The rule translation task focuses on whether language models can express logic clauses in ways that are faithful to their original structure but significantly easier for humans to interpret. In parallel, we compare ILP performance using LLM-generated background knowledge against pre-defined baselines. Our findings suggest promising directions for combining the strengths of symbolic logic and neural generation to build AI systems that are both formally grounded and accessible to a broader range of users.
Stephen Roth; Lennart Baur; Derian Boer; Stefan Kramer Enhancing Symbolic Machine Learning by Subsymbolic Representations The goal of neuro-symbolic AI is to integrate symbolic and subsymbolic AI approaches, to overcome the limitations of either. Prominent systems include Logic Tensor Networks (LTN) or DeepProbLog, which offer neural predicates and end-to-end learning. The versatility of systems like LTNs and DeepProbLog, however, makes them less efficient in simpler settings, for instance, for discriminative machine learning, in particular in domains with many constants. Therefore, we follow a different approach: We propose to enhance symbolic machine learning schemes by giving them access to neural embeddings. In the present paper, we show this for TILDE and embeddings of constants used by TILDE in similarity predicates. The approach can be fine-tuned by further refining the embeddings depending on the symbolic theory. In experiments in three real-world domain, we show that this simple, yet effective, approach outperforms all other baseline methods in terms of the F1 score. The approach could be useful beyond this setting: Enhancing symbolic learners in this way could be extended to similarities between instances (effectively working like kernels within a logical language), for analogical reasoning, or for propositionalization.
Tony Ribeiro; Yin Jun Phua; Tuan Nguyen; Katsumi Inoue Transformers Can Admit Mistakes and Backtrack Transformer models excel in language processing but lack intrinsic logical reasoning. Our proposed solution equips generative models with enhanced logical inference via integrated backtracking capabilities for multiple algorithms. Our proof-of-concept Transformer learn efficient tree structure traversal, demonstrating backtracking feasibility. This foundational work aims to establish trustworthy AI systems that can critically think and revise assumptions, with potential implications for precise decision-making in various fields.
Dominique Bouthinon; Junkang Li; Véronique Ventos Kimind: a new test bed for learning and reasoning Kimind is a game similar to Mastermind. It is designed to be a test bed for exploring and validating various symbolic approches to learning and reason about user behaviour in similar environments and to provide human understandable explanation for AI's decision-making.
Kateřina Hrudková; Tomas Kliegr ILP Meets RDF: Enabling Interoperability Between Popper and AMIE Graph Rule Learning One of the most commonly cited bottlenecks of Inductive Logic Programming (ILP) is scalability and the need for negative examples, which are often scarce in real data. In contrast, RDF rule learning excels at effectively processing larger datasets, does not require negative examples, but lacks ILP's expressivity. Bridging these paradigms has been hindered by a critical gap: the lack of tools for data format interoperability. We present two novel approaches, implemented as Python libraries (popper2rdf, rdf2popper), for converting Prolog atoms to RDF triples and vice versa, specifically designed to handle challenges like $n$-ary predicates and representing positive/negative examples. Evaluation using the Popper (ILP) and RDFRules/AMIE systems on four benchmark datasets (Michalski’s trains, two zendo variations, imdb3) confirms high fidelity (precision/recall) and the feasibility of interoperability. Our work provides practical means, available on GitHub, to bridge these important relational learning paradigms.
Nikolai-Iraj Sanamrad; Carlos Monserrat; Maria José Ramírez-Quintana ASAp: Automated Supervision Application for Student Task Monitoring Human-guided supervision in instructional tasks is a power- ful form of interactive knowledge transfer, but its scalability is limited by the high demand on expert time and attention. As a scalable alterna- tive, automated supervision uses models of expert behaviour to emulate human feedback efficiently. In this work, we present ASAp (Automated Supervision Application), a system that monitors students as they per-form complex, multi-step tasks, and provides real-time, targeted feed- back. ASAp models expertise by mapping student task executions onto reference patterns, either defined by domain experts or inferred from ex- pert traces. Each student action is evaluated on the fly, triggering fine- grained, pattern-based feedback that approximates human intervention. To ensure transparency and interpretability, both student behaviours and expert task models are visualised as interactive, human-readable graphs. This last induced and explained using a combination of process mining and a Maude implementation.
Zahra Chaghazardi Neurosymbolic Approaches for Robust and Explainable Traffic Sign Recognition in Autonomous Driving Autonomous vehicles require perception systems that are not only accurate but also robust and explainable. Deep learning is highly effective for visual recognition but remains vulnerable to adversarial noise and ambiguity, whereas symbolic reasoning provides interpretability yet struggles to scale to raw perceptual data. This research introduces two neurosymbolic approaches that integrate these complementary strengths. The first, Deep Learning for Logic (DL4L), employs deep neural networks to extract perceptual features, which are then reasoned over by ILP for reliable traffic sign recognition. The second, Logic for Deep Learning (L4DL), embeds ILP-derived rules as logical constraints within neural network training, promoting consistency and robustness under adversarial conditions. Together, these methods enable safer and more interpretable traffic sign recognition, advancing trust in AI-driven mobility.
Matthew Woodruff; Alireza Tamaddoni Nezhad An ILP Approach to Interpretable Educational Risk Assessment We present a novel application of PyGol, an Inductive Logic Programming (ILP) system based on Meta Inverse Entailment, to student safeguarding in UK secondary education. PyGol’s inherent interpretability addresses the unique challenges of high-stakes educational decision-making where explainability is paramount. Our experiments on 2,031 students reveal PyGol’s distinctive capability to achieve recall rates exceeding 93% while generating human-readable logical rules. When combined with domain-specific features, PyGol achieves competitive performance (F1: 0.775), outperforming Explainable Boosting Machines. This work contributes to the growing body of ILP applications in sensitive domains and highlights the potential of logic-based approaches for regulatory-compliant AI systems.
James Trewern Prolog2: Meta-Interpretive Learning system Inductive logic programming (ILP) is a branch of machine learning which aims to learn logic programs from background knowledge and examples. Meta-interpretive learning(MIL) is a form of ILP which uses second-order logic to define meta-rules(or meta-clauses) where predicates may be variable. MIL has advantages over earlier ILP systems when learning recursive clauses or inventing new predicates. Prolog2 is a MIL sytem formulated as a superset of Prolog, which extends SLD-resolution to second-order SLD-resolution, with first-order negation as failure. The introduction of second-order resolution alone is not enough to learn a hypothesis, so meta-clauses are introduced. Meta-clauses allow for the derivation of new clauses during resolution through the application of meta-substitutions, a subset of the substitution found during the unification step of resolution.
Dany Varghese;Alfie Anthony Treloar; Shubhi Verma; Alireza Tamaddoni-Nezhad; Alan Hunter Human–Machine Learning for Safe and Legal Autonomous Navigation using Inductive Logic Programming This paper explores human–machine learning for autonomous maritime navigation through inductive logic programming (ILP), introducing a framework in which vessels query the International Regulations for Preventing Collisions at Sea (COLREGs) and refine their behaviour with human oversight. Leveraging the ILP system PyGol, we show how vessels can learn COLREG Rule 13 (Overtaking) from discretised bearing data and extend this to capture exceptions to Rule 15 (Crossing) based on maritime case law. The results highlight ILP’s ability to produce interpretable, auditable, and legally grounded rules, enabling transparent decision-making. This work lays the foundation for safe and legally compliant maritime autonomy through symbolic human–machine learning.
Shubhi Verma; Dany Varghese; Alfie Anthony Treloar; Alan Hunter; Alireza Tamaddoni-Nezhad From Rules to Learning and Reasoning: A Case Study in Explainable Legal Compliance for Autonomous Systems Using Inductive Logic Programming Autonomous systems in safety-critical domains must make decisions that are not only technically reliable but also explainable, auditable, and legally defensible. We present an interdisciplinary framework that combines social and technical expertise to meet this challenge by integrating legal reasoning methods (IRAC, lex specialis) with Inductive Logic Programming (ILP). The framework translates statutory provisions, case law, accident and incident reports, best-practice guidelines, and technical expertise into symbolic background knowledge and labelled scenarios, structured to mirror human legal reasoning. ILP then induces context-sensitive, human-readable rules with proof traces, enabling transparent and trustworthy decision-making. We demonstrate the approach in maritime navigation, where Maritime Autonomous Surface Ships (MASS) must comply with the International Regulations for Preventing Collisions at Sea 1972 (COLREGs). A narrow-channel overtaking/crossing case study shows how ILP resolves conflicting duties by prioritising overtaking (Rule 13) over crossing (Rule 15), requiring the own-ship to abort and yield in line with precedent. While applied here to MASS, the framework generalises to other high-risk regulated domains where safety depends on combining legal reasoning with technical assurance.