Example: General Usage

This example shows the general usage of PyExperimenter, from creating an experiment configuration file, over the actual execution of (dummy) experiments, to the extraction of experimental results.

To execute this notebook you need to install:

pip install py_experimenter
pip install scikit-learn

Experiment Configuration File

This notebook shows an example execution of PyExperimenter based on an experiment configuration file. Further explanation about the usage of PyExperimenter can be found in the documentation.

[1]:
import os

content = """
PY_EXPERIMENTER:
  n_jobs: 1

  Database:
    provider: sqlite
    database: py_experimenter
    table:
      name: example_general_usage
      keyfields:
        dataset:
          type: VARCHAR(255)
          values: ['iris']
        cross_validation_splits:
          type: INT
          values: [5]
        seed:
          type: int
          values:
            start: 2
            stop: 7
            step: 2
        kernel:
          type: VARCHAR(255)
          values: ['linear', 'poly', 'rbf', 'sigmoid']
      result_timestamps: False
      resultfields:
        pipeline: LONGTEXT
        train_f1: DECIMAL
        train_accuracy: DECIMAL
        test_f1: DECIMAL
        test_accuracy: DECIMAL

  Custom:
    datapath: sample_data

  CodeCarbon:
    offline_mode: False
    measure_power_secs: 25
    tracking_mode: process
    log_level: error
    save_to_file: True
    output_dir: output/CodeCarbon
"""
# Create config directory if it does not exist
if not os.path.exists('config'):
    os.mkdir('config')

# Create config file
experiment_configuration_file_path = os.path.join('config', 'example_general_usage.yml')
with open(experiment_configuration_file_path, "w") as f:
  f.write(content)

Defining the execution function

Next, the execution of a single experiment has to be defined. Note that this is a dummy example, which contains limited reasonable code. It is meant to show the core functionality of the PyExperimenter.

The method is called with the parameters, i.e. keyfields, of a database entry. The results are meant to be processed to be written into the database, i.e. as resultfields.

[2]:
import random
import numpy as np

from py_experimenter.result_processor import ResultProcessor

from sklearn.datasets import load_iris
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.model_selection import cross_validate

def run_ml(parameters: dict, result_processor: ResultProcessor, custom_config: dict):
  seed = parameters['seed']
  random.seed(seed)
  np.random.seed(seed)

  data = load_iris()
  # In case you want to load a file from a path
  # path = os.path.join(custom_config['path'], parameters['dataset'])
  # data = pd.read_csv(path)

  X = data.data
  y = data.target

  model = make_pipeline(StandardScaler(), SVC(kernel=parameters['kernel'], gamma='auto'))
  result_processor.process_results({
    'pipeline': str(model)
  })

  if parameters['dataset'] != 'iris':
    raise ValueError("Example error")

  scores = cross_validate(model, X, y,
    cv=parameters['cross_validation_splits'],
    scoring=('accuracy', 'f1_micro'),
    return_train_score=True
  )

  result_processor.process_results({
    'train_f1': np.mean(scores['train_f1_micro']),
    'train_accuracy': np.mean(scores['train_accuracy'])
  })

  result_processor.process_results({
    'test_f1': np.mean(scores['test_f1_micro']),
    'test_accuracy': np.mean(scores['test_accuracy'])
  })

Executing PyExperimenter

The actual execution of the PyExperimenter is done in multiple steps.

Initialize PyExperimenter

The PyExperimenter is initialized with the previously created configuration file. Additionally, PyExperimenter is given a name, i.e. job id, which is especially useful for parallel executions of multiple experiments on HPC.

[3]:
from py_experimenter.experimenter import PyExperimenter

experimenter = PyExperimenter(experiment_configuration_file_path=experiment_configuration_file_path, name='example_notebook')
2024-03-11 08:09:53,452  | py-experimenter - INFO     | Found 4 keyfields
2024-03-11 08:09:53,454  | py-experimenter - INFO     | Found 5 resultfields
2024-03-11 08:09:53,455  | py-experimenter - WARNING  | No logtables given
2024-03-11 08:09:53,456  | py-experimenter - INFO     | Found 1 custom values
2024-03-11 08:09:53,458  | py-experimenter - INFO     | Found 6 codecarbon values
2024-03-11 08:09:53,459  | py-experimenter - INFO     | Initialized and connected to database

Fill Table

The table is filled based on the above created configuration file with fill_table_from_config(). Therefore, the cartesian product of all keyfields makes up the content of the table. Additionally, a custom defined row, i.e. a custom defined keyfield tuple, is added with fill_table_with_rows().

Note that the table can easily be obtained as pandas.Dataframe via experimenter.get_table().

[4]:
experimenter.fill_table_from_config()

experimenter.fill_table_with_rows(rows=[
      {'dataset': 'error_dataset', 'cross_validation_splits': 3, 'seed': 42, 'kernel':'linear'}])

# showing database table
experimenter.get_table()
2024-03-11 08:09:53,533  | py-experimenter - INFO     | 12 rows successfully added to database. 0 rows were skipped.
2024-03-11 08:09:53,559  | py-experimenter - INFO     | 1 rows successfully added to database. 0 rows were skipped.
[4]:
ID dataset cross_validation_splits seed kernel creation_date status start_date name machine pipeline train_f1 train_accuracy test_f1 test_accuracy end_date error
0 1 iris 5 2 linear 2024-03-11 08:09:53 created None None None None None None None None None None
1 2 iris 5 4 linear 2024-03-11 08:09:53 created None None None None None None None None None None
2 3 iris 5 6 linear 2024-03-11 08:09:53 created None None None None None None None None None None
3 4 iris 5 2 poly 2024-03-11 08:09:53 created None None None None None None None None None None
4 5 iris 5 4 poly 2024-03-11 08:09:53 created None None None None None None None None None None
5 6 iris 5 6 poly 2024-03-11 08:09:53 created None None None None None None None None None None
6 7 iris 5 2 rbf 2024-03-11 08:09:53 created None None None None None None None None None None
7 8 iris 5 4 rbf 2024-03-11 08:09:53 created None None None None None None None None None None
8 9 iris 5 6 rbf 2024-03-11 08:09:53 created None None None None None None None None None None
9 10 iris 5 2 sigmoid 2024-03-11 08:09:53 created None None None None None None None None None None
10 11 iris 5 4 sigmoid 2024-03-11 08:09:53 created None None None None None None None None None None
11 12 iris 5 6 sigmoid 2024-03-11 08:09:53 created None None None None None None None None None None
12 13 error_dataset 3 42 linear 2024-03-11 08:09:53 created None None None None None None None None None None

Execute PyExperimenter

First two randmly chosen experiments are exeecuted by setting max_experiments=2 and random_order=True.

[5]:
experimenter.execute(run_ml, max_experiments=-1)

# showing database table
experimenter.get_table()
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
2024-03-11 08:10:50,090  | py-experimenter - ERROR    | Traceback (most recent call last):
  File "/home/lukas/py_experimenter/py_experimenter/experimenter.py", line 372, in _execute_experiment
    final_status = experiment_function(keyfield_values, result_processor, self.config.custom_configuration.custom_values)
  File "/tmp/ipykernel_19831/1244630566.py", line 31, in run_ml
    raise ValueError("Example error")
ValueError: Example error

/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
2024-03-11 08:10:50,137  | py-experimenter - INFO     | All configured executions finished.
[5]:
ID dataset cross_validation_splits seed kernel creation_date status start_date name machine pipeline train_f1 train_accuracy test_f1 test_accuracy end_date error
0 1 iris 5 2 linear 2024-03-11 08:09:53 done 2024-03-11 08:09:53 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.971667 0.971667 0.966667 0.966667 2024-03-11 08:09:57 None
1 2 iris 5 4 linear 2024-03-11 08:09:53 done 2024-03-11 08:09:58 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.971667 0.971667 0.966667 0.966667 2024-03-11 08:10:02 None
2 3 iris 5 6 linear 2024-03-11 08:09:53 done 2024-03-11 08:10:02 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.971667 0.971667 0.966667 0.966667 2024-03-11 08:10:06 None
3 4 iris 5 2 poly 2024-03-11 08:09:53 done 2024-03-11 08:10:06 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.936667 0.936667 0.933333 0.933333 2024-03-11 08:10:11 None
4 5 iris 5 4 poly 2024-03-11 08:09:53 done 2024-03-11 08:10:11 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.936667 0.936667 0.933333 0.933333 2024-03-11 08:10:15 None
5 6 iris 5 6 poly 2024-03-11 08:09:53 done 2024-03-11 08:10:15 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.936667 0.936667 0.933333 0.933333 2024-03-11 08:10:19 None
6 7 iris 5 2 rbf 2024-03-11 08:09:53 done 2024-03-11 08:10:19 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.975000 0.975000 0.966667 0.966667 2024-03-11 08:10:23 None
7 8 iris 5 4 rbf 2024-03-11 08:09:53 done 2024-03-11 08:10:24 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.975000 0.975000 0.966667 0.966667 2024-03-11 08:10:28 None
8 9 iris 5 6 rbf 2024-03-11 08:09:53 done 2024-03-11 08:10:28 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.975000 0.975000 0.966667 0.966667 2024-03-11 08:10:32 None
9 10 iris 5 2 sigmoid 2024-03-11 08:09:53 done 2024-03-11 08:10:32 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.896667 0.896667 0.893333 0.893333 2024-03-11 08:10:37 None
10 11 iris 5 4 sigmoid 2024-03-11 08:09:53 done 2024-03-11 08:10:37 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.896667 0.896667 0.893333 0.893333 2024-03-11 08:10:41 None
11 12 iris 5 6 sigmoid 2024-03-11 08:09:53 done 2024-03-11 08:10:41 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.896667 0.896667 0.893333 0.893333 2024-03-11 08:10:45 None
12 13 error_dataset 3 42 linear 2024-03-11 08:09:53 error 2024-03-11 08:10:45 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... NaN NaN NaN NaN 2024-03-11 08:10:50 Traceback (most recent call last):\n File "/h...

Restart Failed Experiments

As experiments fail at some time, those experiments were reset for another try with reset_experiments(). The status describes which table rows should be replace. In this example all failed experiments, i.e. having status==error, are reset. Experiments can also be reset based on multiple status by simply passing a list of status, e.g. experimenter.reset_experiments('error', 'done'). In that case, all experiments with status ‘error’ or ‘done’ will be reset.

Now all remaining experiments are executed due to max_experiments=-1. Note that the random_order parameter is set to False by default meaning they are executed in orer of increasing id. The first parameter, i.e. run_ml, relates to the actual method that should be executed with the given keyfields of the table.

[6]:
experimenter.reset_experiments('error')

# showing database table
experimenter.get_table()
2024-03-11 08:10:50,191  | py-experimenter - INFO     | 1 rows successfully added to database. 0 rows were skipped.
2024-03-11 08:10:50,192  | py-experimenter - INFO     | 1 experiments with status error were reset
[6]:
ID dataset cross_validation_splits seed kernel creation_date status start_date name machine pipeline train_f1 train_accuracy test_f1 test_accuracy end_date error
0 1 iris 5 2 linear 2024-03-11 08:09:53 done 2024-03-11 08:09:53 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.971667 0.971667 0.966667 0.966667 2024-03-11 08:09:57 None
1 2 iris 5 4 linear 2024-03-11 08:09:53 done 2024-03-11 08:09:58 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.971667 0.971667 0.966667 0.966667 2024-03-11 08:10:02 None
2 3 iris 5 6 linear 2024-03-11 08:09:53 done 2024-03-11 08:10:02 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.971667 0.971667 0.966667 0.966667 2024-03-11 08:10:06 None
3 4 iris 5 2 poly 2024-03-11 08:09:53 done 2024-03-11 08:10:06 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.936667 0.936667 0.933333 0.933333 2024-03-11 08:10:11 None
4 5 iris 5 4 poly 2024-03-11 08:09:53 done 2024-03-11 08:10:11 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.936667 0.936667 0.933333 0.933333 2024-03-11 08:10:15 None
5 6 iris 5 6 poly 2024-03-11 08:09:53 done 2024-03-11 08:10:15 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.936667 0.936667 0.933333 0.933333 2024-03-11 08:10:19 None
6 7 iris 5 2 rbf 2024-03-11 08:09:53 done 2024-03-11 08:10:19 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.975000 0.975000 0.966667 0.966667 2024-03-11 08:10:23 None
7 8 iris 5 4 rbf 2024-03-11 08:09:53 done 2024-03-11 08:10:24 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.975000 0.975000 0.966667 0.966667 2024-03-11 08:10:28 None
8 9 iris 5 6 rbf 2024-03-11 08:09:53 done 2024-03-11 08:10:28 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.975000 0.975000 0.966667 0.966667 2024-03-11 08:10:32 None
9 10 iris 5 2 sigmoid 2024-03-11 08:09:53 done 2024-03-11 08:10:32 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.896667 0.896667 0.893333 0.893333 2024-03-11 08:10:37 None
10 11 iris 5 4 sigmoid 2024-03-11 08:09:53 done 2024-03-11 08:10:37 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.896667 0.896667 0.893333 0.893333 2024-03-11 08:10:41 None
11 12 iris 5 6 sigmoid 2024-03-11 08:09:53 done 2024-03-11 08:10:41 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.896667 0.896667 0.893333 0.893333 2024-03-11 08:10:45 None
12 14 error_dataset 3 42 linear 2024-03-11 08:10:50 created None None None None NaN NaN NaN NaN None None

After the reset of failed experiments, they can be executed again as described above.

[7]:
experimenter.execute(run_ml, max_experiments=-1)

# showing database table
experimenter.get_table()
2024-03-11 08:10:54,491  | py-experimenter - ERROR    | Traceback (most recent call last):
  File "/home/lukas/py_experimenter/py_experimenter/experimenter.py", line 372, in _execute_experiment
    final_status = experiment_function(keyfield_values, result_processor, self.config.custom_configuration.custom_values)
  File "/tmp/ipykernel_19831/1244630566.py", line 31, in run_ml
    raise ValueError("Example error")
ValueError: Example error

/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
2024-03-11 08:10:54,533  | py-experimenter - INFO     | All configured executions finished.
[7]:
ID dataset cross_validation_splits seed kernel creation_date status start_date name machine pipeline train_f1 train_accuracy test_f1 test_accuracy end_date error
0 1 iris 5 2 linear 2024-03-11 08:09:53 done 2024-03-11 08:09:53 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.971667 0.971667 0.966667 0.966667 2024-03-11 08:09:57 None
1 2 iris 5 4 linear 2024-03-11 08:09:53 done 2024-03-11 08:09:58 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.971667 0.971667 0.966667 0.966667 2024-03-11 08:10:02 None
2 3 iris 5 6 linear 2024-03-11 08:09:53 done 2024-03-11 08:10:02 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.971667 0.971667 0.966667 0.966667 2024-03-11 08:10:06 None
3 4 iris 5 2 poly 2024-03-11 08:09:53 done 2024-03-11 08:10:06 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.936667 0.936667 0.933333 0.933333 2024-03-11 08:10:11 None
4 5 iris 5 4 poly 2024-03-11 08:09:53 done 2024-03-11 08:10:11 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.936667 0.936667 0.933333 0.933333 2024-03-11 08:10:15 None
5 6 iris 5 6 poly 2024-03-11 08:09:53 done 2024-03-11 08:10:15 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.936667 0.936667 0.933333 0.933333 2024-03-11 08:10:19 None
6 7 iris 5 2 rbf 2024-03-11 08:09:53 done 2024-03-11 08:10:19 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.975000 0.975000 0.966667 0.966667 2024-03-11 08:10:23 None
7 8 iris 5 4 rbf 2024-03-11 08:09:53 done 2024-03-11 08:10:24 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.975000 0.975000 0.966667 0.966667 2024-03-11 08:10:28 None
8 9 iris 5 6 rbf 2024-03-11 08:09:53 done 2024-03-11 08:10:28 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.975000 0.975000 0.966667 0.966667 2024-03-11 08:10:32 None
9 10 iris 5 2 sigmoid 2024-03-11 08:09:53 done 2024-03-11 08:10:32 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.896667 0.896667 0.893333 0.893333 2024-03-11 08:10:37 None
10 11 iris 5 4 sigmoid 2024-03-11 08:09:53 done 2024-03-11 08:10:37 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.896667 0.896667 0.893333 0.893333 2024-03-11 08:10:41 None
11 12 iris 5 6 sigmoid 2024-03-11 08:09:53 done 2024-03-11 08:10:41 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.896667 0.896667 0.893333 0.893333 2024-03-11 08:10:45 None
12 14 error_dataset 3 42 linear 2024-03-11 08:10:50 error 2024-03-11 08:10:50 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... NaN NaN NaN NaN 2024-03-11 08:10:54 Traceback (most recent call last):\n File "/h...

Generating Result Table

The table containes single experiment results. Those can be aggregated, e.g. to generate the mean over all seeds.

[8]:
result_table_agg = experimenter.get_table().groupby(['dataset']).mean(numeric_only = True)
result_table_agg
[8]:
ID cross_validation_splits seed train_f1 train_accuracy test_f1 test_accuracy
dataset
error_dataset 14.0 3.0 42.0 NaN NaN NaN NaN
iris 6.5 5.0 4.0 0.945 0.945 0.94 0.94

Printing LaTex Table

As pandas.Dataframes can easily be printed as LaTex table, here is an example code for one of the above result columns.

[9]:
print(result_table_agg[['test_f1']].style.to_latex())
\begin{tabular}{lr}
 & test_f1 \\
dataset &  \\
error_dataset & nan \\
iris & 0.940000 \\
\end{tabular}

CodeCarbon

CodeCarbon is integrated into PyExperimenter to provide information about the carbon emissions of experiments. CodeCarbon will create a table with suffix _codecarbon in the database, each row containing information about the carbon emissions of a single experiment.

[10]:
experimenter.get_codecarbon_table()
[10]:
ID experiment_id codecarbon_timestamp project_name run_id duration_seconds emissions_kg emissions_rate_kg_sec cpu_power_watt gpu_power_watt ... cpu_model gpu_count gpu_model longitude latitude ram_total_size tracking_mode on_cloud power_usage_efficiency offline_mode
0 1 1 2024-03-11T08:09:57 codecarbon 1620091b-8b86-4b23-b54e-c4cf3020c04d 0.088837 3.598312e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.5312 52.4771 15.47501 process N 1.0 0
1 2 2 2024-03-11T08:10:02 codecarbon c60a2cf3-7e01-49b2-b40e-5fb2d6023f23 0.086848 3.585882e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.5312 52.4771 15.47501 process N 1.0 0
2 3 3 2024-03-11T08:10:06 codecarbon 65607dcf-143f-4721-9003-321205ce15eb 0.090465 3.729579e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.5312 52.4771 15.47501 process N 1.0 0
3 4 4 2024-03-11T08:10:11 codecarbon c3c63ce1-6a16-4060-a9e9-acb5fb73e3d6 0.080714 3.314905e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.5312 52.4771 15.47501 process N 1.0 0
4 5 5 2024-03-11T08:10:15 codecarbon 4c166d72-1a0d-477b-b70b-620914c09edb 0.089772 3.712606e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.5312 52.4771 15.47501 process N 1.0 0
5 6 6 2024-03-11T08:10:19 codecarbon 2d2efdca-c9ae-4eab-955a-2ec5f047bb5d 0.070777 2.878039e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.5312 52.4771 15.47501 process N 1.0 0
6 7 7 2024-03-11T08:10:23 codecarbon 42fc7f5d-2157-42dd-9c4d-ff47c9d60d71 0.076391 3.133284e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.5312 52.4771 15.47501 process N 1.0 0
7 8 8 2024-03-11T08:10:28 codecarbon 4158e707-e153-4faf-b171-7713067c018e 0.101170 4.188866e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.5312 52.4771 15.47501 process N 1.0 0
8 9 9 2024-03-11T08:10:32 codecarbon ec656b86-801c-4c3a-ae96-190249454c59 0.088437 3.639145e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.5312 52.4771 15.47501 process N 1.0 0
9 10 10 2024-03-11T08:10:37 codecarbon 4efdd1ef-946e-49f5-9d3b-8ddf46839d76 0.090036 3.723546e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.5312 52.4771 15.47501 process N 1.0 0
10 11 11 2024-03-11T08:10:41 codecarbon 97294712-72a2-49dd-8173-6ecf9c580b37 0.083235 3.419110e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.5312 52.4771 15.47501 process N 1.0 0
11 12 12 2024-03-11T08:10:45 codecarbon 7bbe7b09-301f-4064-8dc0-fd578221b525 0.079192 3.260114e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.5312 52.4771 15.47501 process N 1.0 0
12 13 13 2024-03-11T08:10:50 codecarbon e986f632-5ee3-437a-ac43-52c31578600a 0.048913 1.921918e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.5312 52.4771 15.47501 process N 1.0 0
13 14 14 2024-03-11T08:10:54 codecarbon 65d3eead-d8f5-4cd7-9ebd-2bda51acab85 0.040884 1.582226e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.5312 52.4771 15.47501 process N 1.0 0

14 rows × 34 columns

Aggregating CodeCarbon Results

The carbon emission information of CodeCarbon can be easily aggregated via pandas.Dataframe.

[11]:
carbon_emissions = experimenter.get_codecarbon_table().groupby(['project_name']).sum(numeric_only = True)
carbon_emissions
[11]:
ID experiment_id duration_seconds emissions_kg emissions_rate_kg_sec cpu_power_watt gpu_power_watt ram_power_watt cpu_energy_kw gpu_energy_kw ram_energy_kw energy_consumed_kw cpu_count ram_total_size power_usage_efficiency offline_mode
project_name
codecarbon 105 105 1.11567 0.000005 0.000057 595.0 0.0 1.070479 0.000012 0.0 2.193276e-08 0.000013 224.0 216.650139 14.0 0

Printing CodeCarbon Results as LaTex Table

Furthermore, the resulting pandas.Dataframe can easily be printed as LaTex table.

[12]:
print(carbon_emissions[['energy_consumed_kw', 'emissions_kg']].style.to_latex())
\begin{tabular}{lrr}
 & energy_consumed_kw & emissions_kg \\
project_name &  &  \\
codecarbon & 0.000013 & 0.000005 \\
\end{tabular}