Challenge 1: Core functionalities of the ExtremeXP framework

In this challenge, you will decompose a machine learning workflow into tasks, define it using the DSL, configure an experiment with multiple model variants and hyperparameters, execute it on ProActive, and visualize the results.

Estimated Time : 60–90 minutes

Difficulty Level : Intermediate (intermediate DSL knowledge, some Python understanding)

Background

The ExtremeXP experimentation framework is made up of several components that work together to define, run, and analyze experiments. In this tutorial, you will interact only with the DSL (Domain-Specific Language) to specify an experiment, but it’s helpful to understand how the entire system fits together.

1. Defining an Experiment

Every experiment in ExtremeXP begins with a model. You can create this model in one of two ways:

Using the DSL Framework (the method we will use in this tutorial).

Using the Graphical Editors

A third option, the Intent-based Editor, can also generate experiments, but it is not covered in this tutorial.

2. Sending the Experiment to the Experimentation Engine

Once your experiment model is defined, it is passed to the Experimentation Engine, a Python module that serves as the core interpreter. The Experimentation Engine reads your experiment description and decides which workflows need to be executed.

3. Executing Workflows

To run the workflows, the Experimentation Engine communicates with the Execution Engine. This component is built on top of the Proactive Scheduling and Abstraction Layer, although other workflow tools such as Kubeflow could also be used. After the workflows are complete, their results are sent back to the Experimentation Engine.

4. Managing Data

The Experimentation Engine then updates two key components:

Data Abstraction Layer (DAL):
Stores and retrieves executed experiments, workflows, and their associated metrics. You can think of DAL as the system’s experiment memory.

Decentralized Data Management (DDM):
Handles dataset storage and access across the framework. DDM includes a graphical interface for uploading and tagging datasets, as well as an API used by other components to load or save data.

5. Visualizing the Results

Finally, results become available through two visualization tools:

Experiment Visualization:
Allows users to explore detailed results of individual experiments.

Experiment Cards:
Provides a high-level overview of all past experiments, helping analysts identify patterns, extract best practices, and plan future studies.

Prerequisites

Follow the user guide below to install the Experimentation Engine and run the example experiment successfully.

To learn more

Exercise

You are given a customer churn prediction dataset from a telecom company. Your goal is to build and evaluate a machine learning pipeline that predicts whether a customer will churn (leave the company) within the next month based on their account and usage features.

Format

CSV with mixed numeric and categorical features
Size

1,500 rows, 14 features
Target

Binary column churn (Yes/No)
Features include

⦁ Numeric : tenure (months), monthly_charges, total_charges, number_support_calls, pages_consulted_per_month
⦁ Categorical : internet_service_type, contract_type, online_security, tech_support, paperless_billing

Part 1 - Understanding the ML problem

Before writing any DSL, understand the ML pipeline structure from the provided Python scripts from the shared folder <Link>.

Download the provided Python scripts from the shared folder

Step 01

Read through each script

⦁ Task 1 - Load and Split Data (write descriptions)
⦁ Task 2 - Preprocess Data
⦁ Task 3 - Train Model
⦁ Task 4 - Evaluate Model
⦁ Task 5 - Export Artifacts (Optional)

Step 02

Understand the input and output of each script

Step 03

Identify the parameters each script expects and where they come from

Step 04

Part 2 - Define the Workflow in ExtremeXP DSL

Create a file named churn_workflow.xxp that defines the workflow structure, task ordering, and data flow
Define all 5 tasks with correct implementation paths, inputs, outputs, and parameters
Establish the data dependencies and execution order
Ensure parameter names match those in the Python scripts

Part 3 - Define the Experiment

Create a file named churn_experiment.xxp that defines the experiment space to explore
Declare the experiment parameters and their value ranges
Bind experiment parameters to workflow task parameters
Define output metrics

1. Install eexp_engine, run example experiment

2. Creating your first experiment

You will be working with the “”

3. Download the script from <>

4. Split it into 2 tasks

The second task should have a parameter TODO
Maybe also a metric?
Maybe also input/output datasets (local)?

5. Specify the workflow that consists of these 2 tasks.

6. Specify the experiment that uses the parameter TODO and performs a grid search over the values 1, 2, 4, and 8

7. Run the experiment using the eexp_engine library in Python

8. Check the execution of the tasks in Proactive

9. Check the results of the experiments in the Visualization Dashboard

Success criteria

Workflow DSL Checklist

All five tasks are defined
Each task specifies its implementation script path
Input/output artifacts are correctly declared
Task parameters match the script's expected parameters
Data flow is explicit (outputs of one task feed into inputs of the next)

attach the correct workflow DSL

Experiment DSL Checklist

attach the correct experiment DSL

Provide screenshots that show

The workflows been run in Proactive

The visualization results in the Dashboard

Feed Back (Link to feedback form/survey for this assignment to be provided by the beta testing coordinator)

Challenge 1: Core functionalities of the ExtremeXP framework

Estimated Time : 60–90 minutes

Difficulty Level : Intermediate (intermediate DSL knowledge, some Python understanding)

Background

Prerequisites

Exercise

Format

Size

Target

Features include

Part 1 - Understanding the ML problem

Download the provided Python scripts from the shared folder

Read through each script

Understand the input and output of each script

Identify the parameters each script expects and where they come from

Part 2 - Define the Workflow in ExtremeXP DSL

Create a file named churn_workflow.xxp that defines the workflow structure, task ordering, and data flow

Define all 5 tasks with correct implementation paths, inputs, outputs, and parameters

Establish the data dependencies and execution order

Ensure parameter names match those in the Python scripts

Part 3 - Define the Experiment

Create a file named churn_experiment.xxp that defines the experiment space to explore

Declare the experiment parameters and their value ranges

Bind experiment parameters to workflow task parameters

Define output metrics

Success criteria

Workflow DSL Checklist

Experiment DSL Checklist

Provide screenshots that show