Challenge 1: Core functionalities of the ExtremeXP framework

In this challenge, you will decompose a machine learning workflow into tasks, define it using the DSL, configure an experiment with multiple model variants and hyperparameters, execute it on ProActive, and visualize the results. 

Estimated Time : 60–90 minutes

Difficulty Level : Intermediate (intermediate DSL knowledge, some Python understanding)

Background

The ExtremeXP experimentation framework is made up of several components that work together to define, run, and analyze experiments. In this tutorial, you will interact only with the DSL (Domain-Specific Language) to specify an experiment, but it’s helpful to understand how the entire system fits together.
1. Defining an Experiment

Every experiment in ExtremeXP begins with a model. You can create this model in one of two ways: 

  • Using the DSL Framework (the method we will use in this tutorial). 
  • Using the Graphical Editors  

A third option, the Intent-based Editor, can also generate experiments, but it is not covered in this tutorial. 

2. Sending the Experiment to the Experimentation Engine

Once your experiment model is defined, it is passed to the Experimentation Engine, a Python module that serves as the core interpreter. The Experimentation Engine reads your experiment description and decides which workflows need to be executed. 

3. Executing Workflows

To run the workflows, the Experimentation Engine communicates with the Execution Engine. This component is built on top of the Proactive Scheduling and Abstraction Layer, although other workflow tools such as Kubeflow could also be used. After the workflows are complete, their results are sent back to the Experimentation Engine. 

4. Managing Data

The Experimentation Engine then updates two key components: 

  • Data Abstraction Layer (DAL): 
    Stores and retrieves executed experiments, workflows, and their associated metrics. You can think of DAL as the system’s experiment memory. 
  • Decentralized Data Management (DDM): 
    Handles dataset storage and access across the framework. DDM includes a graphical interface for uploading and tagging datasets, as well as an API used by other components to load or save data. 
5. Visualizing the Results

Finally, results become available through two visualization tools: 

  • Experiment Visualization: 
    Allows users to explore detailed results of individual experiments. 
  • Experiment Cards: 
    Provides a high-level overview of all past experiments, helping analysts identify patterns, extract best practices, and plan future studies. 

Prerequisites

Follow the user guide below to install the Experimentation Engine and run the example experiment successfully.

Exercise

You are given a customer churn prediction dataset from a telecom company. Your goal is to build and evaluate a machine learning pipeline that predicts whether a customer will churn (leave the company) within the next month based on their account and usage features.
  • Format

    CSV with mixed numeric and categorical features

  • Size

    1,500 rows, 14 features

  • Target

    Binary column churn (Yes/No)

  • Features include

    ⦁ Numeric : tenure (months), monthly_charges, total_charges, number_support_calls, pages_consulted_per_month
    ⦁ Categorical : internet_service_type, contract_type, online_security, tech_support, paperless_billing

Part 1 - Understanding the ML problem

Before writing any DSL, understand the ML pipeline structure from the provided Python scripts from the shared folder <Link>. 

Download the provided Python scripts from the shared folder

Step 01

Read through each script

⦁ Task 1 - Load and Split Data (write descriptions)
⦁ Task 2 - Preprocess Data
⦁ Task 3 - Train Model
⦁ Task 4 - Evaluate Model
⦁ Task 5 - Export Artifacts (Optional)
Step 02

Understand the input and output of each script

Step 03

Identify the parameters each script expects and where they come from

Step 04

Part 2 - Define the Workflow in ExtremeXP DSL

  • Create a file named churn_workflow.xxp that defines the workflow structure, task ordering, and data flow

  • Define all 5 tasks with correct implementation paths, inputs, outputs, and parameters

  • Establish the data dependencies and execution order

  • Ensure parameter names match those in the Python scripts

Part 3 - Define the Experiment

  • Create a file named churn_experiment.xxp that defines the experiment space to explore

  • Declare the experiment parameters and their value ranges

  • Bind experiment parameters to workflow task parameters

  • Define output metrics

1. Install eexp_engine, run example experiment
2. Creating your first experiment

You will be working with the “” 

3. Download the script from <>
4. Split it into 2 tasks
  1. The second task should have a parameter TODO
  2. Maybe also a metric? 
  3. Maybe also input/output datasets (local)? 
5. Specify the workflow that consists of these 2 tasks.
6. Specify the experiment that uses the parameter TODO and performs a grid search over the values 1, 2, 4, and 8
7. Run the experiment using the eexp_engine library in Python
8. Check the execution of the tasks in Proactive
9. Check the results of the experiments in the Visualization Dashboard

Success criteria

Workflow DSL Checklist

attach the correct workflow DSL

Experiment DSL Checklist

attach the correct experiment DSL

Provide screenshots that show

The workflows been run in Proactive
The visualization results in the Dashboard
The ExtremeXP project is co-funded by the European Union Horizon Program HORIZON CL4-2022-DATA-01-01, under Grant Agreement No. 101093164

© ExtremeXP 2023. All Rights Reserved – Privacy Policy