Publications

Auditing for Spatial Fairness

Authors:
Dimitris Sacharidis, Giorgos Giannopoulos, George Papastefanatos, Kostas Stefanidis

Source: https://dx.doi.org/10.48786/edbt.2023.41

Abstract:
This paper studies algorithmic fairness when the protected attribute is location. To handle protected attributes that are continuous, such as age or income, the standard approach is to discretize the domain into predefined groups, and compare algorithmic outcomes across groups. However, applying this idea to location raises concerns of gerrymandering and may introduce statistical bias. Prior work addresses these concerns but only for regularly spaced locations, while raising other issues, most notably its inability to discern regions that are likely to exhibit spatial unfairness. Similar to established notions of algorithmic fairness, we define spatial fairness as the statistical independence of outcomes from location. This translates into requiring that for each region of space, the distribution of outcomes is identical inside and outside the region. To allow for localized discrepancies in the distribution of outcomes, we compare how well two competing hypotheses explain the observed outcomes. The null hypothesis assumes spatial fairness, while the alternate allows different distributions inside and outside regions. Their goodness of fit is then assessed by a likelihood ratio test. If there is no significant difference in how well the two hypotheses explain the observed outcomes, we conclude that the algorithm is spatially fair.

An Approach for Intelligent Behaviour-Based Threat Modelling with Explanations

Authors:
Sonu Preetam, Maxime Compastié, Vanesa Daza, Shuaib Siddiqui

Source: https://doi.org/10.1109/NFV-SDN59219.2023.10329587

Abstract:
To disrupt the emergence of novel threats, defenders must obtain insights into the attacker’s behaviours through Tactics, Techniques, and Procedures (TTP) to establish adequate countermeasures. However, albeit detecting the usage of a subset of techniques is well documented and investigated, understanding the chaining of these techniques into a complete set of attack scenarios remains a manned process, prone to errors in complex and dynamic environments, such as software networks. In this paper, we propose a hybrid model for threat behaviour profiling. Our model exploits multimodal threat data using diverse real-time logs from virtualised environments to generate a novel dataset that maximises the explainability of a technique. Once a set of techniques is qualified, we leverage attack graphs and AI model explanations to correlate techniques usage into attack scenarios describing a complete behaviour from a threat actor. Our proposed approach is generalizable to distributed and heterogeneous environments, making it a promising method against ever-evolving threats.

Index Terms— Big Data, Earth Observation, Open Architecture, MLOps, Hydrology

METIS : An Open-Architecture for Building AI-Ready Cloud Platforms – Application to Foster Research on Hydrological Modeling

Authors:
Vincent GAUDISSART, Yasmine BOULFANI, Kevin LARNIER, Gwendoline STEPHAN, Jacques, COVES and Christophe TRIQUET –  CS GROUP France 

Source: https://data.europa.eu/doi/10.2760/46796

Abstract:
In today’s data-driven world, organizations often face the challenge of implementing and maintaining complex platforms for processing and leveraging big data and artificial intelligence (AI) technologies. This article introduces METIS, a powerful software suite, that simplifies the creation of such platforms by providing a range of reusable components. METIS, based on open-source components, follows an open architecture philosophy, enabling integration with existing systems and the flexibility to meet diverse project requirements. We specifically highlight BISAW, an instantiation of METIS components tailored for Business Intelligence (BI) and AI applications using earth observation data. BISAW offers comprehensive functionality to manage the entire data lifecycle and fosters collaboration among data scientists, engineers, decision-makers, and data providers. This article explores the challenges of implementing such platforms and demonstrates how BISAW facilitates the exploitation of data through its flexibility, integration capabilities, and streamlined development flow. Furthermore, we present the current METIS use case involving the ExtremeXP EU Horizon-funded project.

GEANT Security Days : Extending UEBA for emerging threat, detection, characterisation and intelligence generation

Authors:
Carolina FERÁNDEZ, Maxime COMPASITIÉ, Nil ORTIZ RABELLA, Sonu PREETAM and Xavier MARRUGAT –  i2Cat

Source: https://data.europa.eu/doi/10.2760/46796

Abstract:
This presentation aims to overcome some of the challenges regarding emerging and mutable threats, which may go unnoticed for some time due to a constrained data foundation that does not extract enough knowledge from the network status. We bring an AI, knowledge-based technology and one of its applied use cases to detect and categorise threats based on user’s, device’s and tool’s behaviour across the network. The presented technology can also be used to foster collaboration across academic and research centres regarding threat intelligence sharing, since both the extracted knowledge and some particularities of the models can be exported for others to learn, adapt and act on it.

Towards a Reference Component Model of Edge-Cloud Continuum

Authors:
Danylo Khalyeyev, Tomáš Bureš, and Petr Hnětynka

Source: https://doi.org/10.1109/ICSA-C57050.2023.00030

Abstract:
Edge-cloud continuum (ECC) is a novel paradigm that seeks to blend the worlds of cloud computing and IoT into a continuous ecosystem capable of providing access to a range of previously impossible applications with significantly improved quality of service. However, while using the term ECC becomes increasingly common, there is still no clear and commonly accepted consensus on what the term entails and which properties the ECC environment must possess. Consequently, there is a lack of tools and examples for reasoning about applications in ECC and their specific properties. In this paper, we present the results of our literature study aimed at identifying the most common properties ascribed to ECC. Based on this, we outline a reference component model that can serve as a tool for reasoning about ECC systems and their properties.

Controlling Automatic Experiment-Driven Systems Using Statistics and Machine Learning

Authors:
Milad Abdullah

Source: https://doi.org/10.1007/978-3-031-36889-9_9

Abstract:
Experiments are used in many modern systems to optimize their operation. Such experiment-driven systems are used in various fields, such as web-based systems, smart-* systems, and various self-adaptive systems. There is a class of these systems that derive their data from running simulations or another type of computation, such as in digital twins, online planning using probabilistic model-checking, or performance benchmarking. To obtain statistically significant results, these systems must repeat the experiments multiple times. As a result, they consume extensive computation resources. The GraalVM benchmarking project detects performance changes in the GraalVM compiler. However, the benchmarking project has an extensive usage of computational resources and time. The doctoral research project proposed in this paper focuses on controlling the experiments with the goal of reducing computation costs. The plan is to use statistical and machine learning approaches to predict the outcomes of experiments and select the experiments yielding more useful information. As an evaluation, we are applying these methods to the GraalVM benchmarking project; the initial results confirm that these methods have the potential to significantly reduce computation costs.

Amalur: Data Integration Meets Machine Learning

Authors:
Rihan Hai, Christos Koutras, Andra Ionescu, Ziyu Li, Wenbo Sun, Jessie Van Schijndel, Yan Kang and Asterios Katsifodimos.

Source: https://doi.org/10.1109/ICDE55515.2023.00301

Abstract:
Machine learning (ML) training data is often scattered across disparate collections of datasets, called data silos. This fragmentation poses a major challenge for data-intensive ML applications: integrating and transforming data residing in different sources demand a lot of manual work and computational resources. With data privacy and security constraints, data often cannot leave the premises of data silos, hence model training should proceed in a decentralized manner. In this work, we present a vision of how to bridge the traditional data integration (DI) techniques with the requirements of modern machine learning. We explore the possibilities of utilizing metadata obtained from data integration processes for improving the effectiveness and efficiency of ML models. Towards this direction, we analyze two common use cases over data silos, feature augmentation and federated learning. Bringing data integration and machine learning together, we highlight new research opportunities from the aspects of systems, representations, factorized learning and federated learning.

An Empirical Performance Comparison between Matrix Multiplication Join and Hash Join on GPUs

Authors:
Wenbo Sun, Asterios Katsifodimos and Rihan Hai

Source: https://doi.org/10.1109/ICDEW58674.2023.00034

Abstract:
Recent advances in Graphic Processing Units (GPUs) have facilitated a significant performance boost for database operators, in particular, joins. It has been intensively studied how conventional join implementations, such as hash joins, benefit from the massive parallelism of GPUs. With the proliferation of machine learning, more databases have started to provide native support for the basic building blocks of ML algorithms, i.e., linear algebra operators such as matrix multiplication (MM). Despite the recent increasing interest in processing relational joins using matrix multiplication (MM-join), two crucial questions still remain open: i) how efficient are current MM-join implementations compared to the GPU-based join algorithms; ii) how should practitioners choose among MM-join and conventional GPU-based joins given different data characteristics.In this paper, we compare the execution time, and memory I/O of MM-join against multiple GPU hash joins. An empirical analysis of our experimental results reveals that the state-of-the-art hash join implementation shows substantial scalability for various data characteristics. In contrast, MM-join outperforms the SOTA hash join in low join selectivity and low table cardinality but shows unsatisfactory scalability due to synchronous data movement and computation.

Online ML Self-adaptation in Face of Traps

Authors:
Michal Töpfer, František Plášil, Tomáš Bureš, Petr Hnětynka, Martin Kruliš and Danny Weyns

Source: https://doi.org/10.1109/ACSOS58161.2023.00023

Abstract:
Online machine learning (ML) is often used in selfadaptive systems to strengthen the adaptation mechanism and improve the system utility. Despite such benefits, applying online ML for self-adaptation can be challenging, and not many papers report its limitations. Recently, we experimented with applying online ML for self-adaptation of a smart farming scenario and we had faced several unexpected difficulties – traps – that, to our knowledge, are not discussed enough in the community. In this paper, we report our experience with these traps. Specifically, we discuss several traps that relate to the specification and online training of the ML-based estimators, their impact on selfadaptation, and the approach used to evaluate the estimators. Our overview of these traps provides a list of lessons learned, which can serve as guidance for other researchers and practitioners when applying online ML for self-adaptation.

Early Stopping of Non-productive Performance Testing Experiments Using Measurement Mutations

Authors:
Milad Abdullah, Lubomír Bulej, Tomáš Bureš, Vojtěch Horký and Petr Tůma

Source: https://doi.org/10.1109/SEAA60479.2023.00022

Abstract:
Modern software projects often incorporate some form of performance testing into their development cycle, intending to detect changes in performance between commits or releases. Performance testing generally relies on experimental evaluation using various benchmark workloads. To detect performance changes reliably, benchmarks must be executed many times to account for variability in the measurement results. While considered best practice, this approach can become prohibitively expensive when the number of versions and benchmark workloads increases. To alleviate the cost of performance testing, we propose an approach for the early stopping of non-productive experiments that are unlikely to detect a performance bug in a particular benchmark. The stopping conditions are based on benchmark-specific thresholds determined from historical data modified to emulate the potential effects of software changes on benchmark performance. We evaluate the approach on the GraalVM benchmarking project and show that it can eliminate about 50% of the experiments if we can afford to ignore about 15% of the least significant performance changes.

The ExtremeXP project is co-funded by the European Union Horizon Program HORIZON-CL4-2022-DATA-01-01, under Grant Agreement No. 101093164
© ExtremeXP 2023. All Rights Reserved – Privacy Policy
Verified by MonsterInsights