Example: General Usage

This example shows the general usage of PyExperimenter, from creating an experiment configuration file, over the actual execution of (dummy) experiments, to the extraction of experimental results.

To execute this notebook you need to install:

pip install py_experimenter
pip install scikit-learn

Experiment Configuration File

This notebook shows an example execution of PyExperimenter based on an experiment configuration file. Further explanation about the usage of PyExperimenter can be found in the documentation.

[33]:
import os

content = """
PY_EXPERIMENTER:
  n_jobs: 1

  Database:
    provider: sqlite
    database: py_experimenter
    table:
      name: example_general_usage
      keyfields:
        dataset:
          type: VARCHAR(255)
          values: ['iris']
        cross_validation_splits:
          type: INT
          values: [5]
        seed:
          type: int
          values:
            start: 2
            stop: 7
            step: 2
        kernel:
          type: VARCHAR(255)
          values: ['linear', 'poly', 'rbf', 'sigmoid']
      result_timestamps: False
      resultfields:
        pipeline: LONGTEXT
        train_f1: DECIMAL
        train_accuracy: DECIMAL
        test_f1: DECIMAL
        test_accuracy: DECIMAL

  Custom:
    datapath: sample_data

  CodeCarbon:
    offline_mode: False
    measure_power_secs: 25
    tracking_mode: process
    log_level: error
    save_to_file: True
    output_dir: output/CodeCarbon
"""
# Create config directory if it does not exist
if not os.path.exists('config'):
    os.mkdir('config')

# Create config file
experiment_configuration_file_path = os.path.join('config', 'example_general_usage.yml')
with open(experiment_configuration_file_path, "w") as f:
  f.write(content)

Defining the execution function

Next, the execution of a single experiment has to be defined. Note that this is a dummy example, which contains limited reasonable code. It is meant to show the core functionality of the PyExperimenter.

The method is called with the parameters, i.e. keyfields, of a database entry. The results are meant to be processed to be written into the database, i.e. as resultfields.

[34]:
import random
import numpy as np

from py_experimenter.result_processor import ResultProcessor

from sklearn.datasets import load_iris
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.model_selection import cross_validate

def run_ml(parameters: dict, result_processor: ResultProcessor, custom_config: dict):
  seed = parameters['seed']
  random.seed(seed)
  np.random.seed(seed)

  data = load_iris()
  # In case you want to load a file from a path
  # path = os.path.join(custom_config['path'], parameters['dataset'])
  # data = pd.read_csv(path)

  X = data.data
  y = data.target

  model = make_pipeline(StandardScaler(), SVC(kernel=parameters['kernel'], gamma='auto'))
  result_processor.process_results({
    'pipeline': str(model)
  })

  if parameters['dataset'] != 'iris':
    raise ValueError("Example error")

  scores = cross_validate(model, X, y,
    cv=parameters['cross_validation_splits'],
    scoring=('accuracy', 'f1_micro'),
    return_train_score=True
  )

  result_processor.process_results({
    'train_f1': np.mean(scores['train_f1_micro']),
    'train_accuracy': np.mean(scores['train_accuracy'])
  })

  result_processor.process_results({
    'test_f1': np.mean(scores['test_f1_micro']),
    'test_accuracy': np.mean(scores['test_accuracy'])
  })

Executing PyExperimenter

The actual execution of the PyExperimenter is done in multiple steps.

Initialize PyExperimenter

The PyExperimenter is initialized with the previously created configuration file. Additionally, PyExperimenter is given a name, i.e. job id, which is especially useful for parallel executions of multiple experiments on HPC.

[35]:
from py_experimenter.experimenter import PyExperimenter

experimenter = PyExperimenter(experiment_configuration_file_path=experiment_configuration_file_path, name='example_notebook')
2024-04-15 15:30:49,974  | py-experimenter - INFO     | Found 4 keyfields
2024-04-15 15:30:49,976  | py-experimenter - INFO     | Found 5 resultfields
2024-04-15 15:30:49,977  | py-experimenter - WARNING  | No logtables given
2024-04-15 15:30:49,977  | py-experimenter - INFO     | Found 1 custom values
2024-04-15 15:30:49,978  | py-experimenter - INFO     | Found 6 codecarbon values
2024-04-15 15:30:49,980  | py-experimenter - INFO     | Initialized and connected to database

Fill Table

The table is filled based on the above created configuration file with fill_table_from_config(). Therefore, the cartesian product of all keyfields makes up the content of the table. Additionally, a custom defined row, i.e. a custom defined keyfield tuple, is added with fill_table_with_rows().

Note that the table can easily be obtained as pandas.Dataframe via experimenter.get_table().

[36]:
experimenter.fill_table_from_config()

experimenter.fill_table_with_rows(rows=[
      {'dataset': 'error_dataset', 'cross_validation_splits': 3, 'seed': 42, 'kernel':'linear'}])

# showing database table
experimenter.get_table()
2024-04-15 15:30:50,069  | py-experimenter - INFO     | 12 rows successfully added to database. 0 rows were skipped.
2024-04-15 15:30:50,083  | py-experimenter - INFO     | 1 rows successfully added to database. 0 rows were skipped.
[36]:
ID dataset cross_validation_splits seed kernel creation_date status start_date name machine pipeline train_f1 train_accuracy test_f1 test_accuracy end_date error
0 1 iris 5 2 linear 2024-04-15 15:30:50 created None None None None None None None None None None
1 2 iris 5 4 linear 2024-04-15 15:30:50 created None None None None None None None None None None
2 3 iris 5 6 linear 2024-04-15 15:30:50 created None None None None None None None None None None
3 4 iris 5 2 poly 2024-04-15 15:30:50 created None None None None None None None None None None
4 5 iris 5 4 poly 2024-04-15 15:30:50 created None None None None None None None None None None
5 6 iris 5 6 poly 2024-04-15 15:30:50 created None None None None None None None None None None
6 7 iris 5 2 rbf 2024-04-15 15:30:50 created None None None None None None None None None None
7 8 iris 5 4 rbf 2024-04-15 15:30:50 created None None None None None None None None None None
8 9 iris 5 6 rbf 2024-04-15 15:30:50 created None None None None None None None None None None
9 10 iris 5 2 sigmoid 2024-04-15 15:30:50 created None None None None None None None None None None
10 11 iris 5 4 sigmoid 2024-04-15 15:30:50 created None None None None None None None None None None
11 12 iris 5 6 sigmoid 2024-04-15 15:30:50 created None None None None None None None None None None
12 13 error_dataset 3 42 linear 2024-04-15 15:30:50 created None None None None None None None None None None

Execute PyExperimenter

First two randmly chosen experiments are exeecuted by setting max_experiments=2 and random_order=True.

[37]:
experimenter.execute(run_ml, max_experiments=-1)

# showing database table
experimenter.get_table()
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
2024-04-15 15:32:08,810  | py-experimenter - ERROR    | Traceback (most recent call last):
  File "/home/lukas/py_experimenter/py_experimenter/experimenter.py", line 403, in _execute_experiment
    final_status = experiment_function(keyfield_values, result_processor, self.config.custom_configuration.custom_values)
  File "/tmp/ipykernel_152317/1244630566.py", line 31, in run_ml
    raise ValueError("Example error")
ValueError: Example error

/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
2024-04-15 15:32:08,869  | py-experimenter - INFO     | All configured executions finished.
[37]:
ID dataset cross_validation_splits seed kernel creation_date status start_date name machine pipeline train_f1 train_accuracy test_f1 test_accuracy end_date error
0 1 iris 5 2 linear 2024-04-15 15:30:50 done 2024-04-15 15:30:50 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.971667 0.971667 0.966667 0.966667 2024-04-15 15:30:56 None
1 2 iris 5 4 linear 2024-04-15 15:30:50 done 2024-04-15 15:30:56 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.971667 0.971667 0.966667 0.966667 2024-04-15 15:31:02 None
2 3 iris 5 6 linear 2024-04-15 15:30:50 done 2024-04-15 15:31:02 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.971667 0.971667 0.966667 0.966667 2024-04-15 15:31:08 None
3 4 iris 5 2 poly 2024-04-15 15:30:50 done 2024-04-15 15:31:08 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.936667 0.936667 0.933333 0.933333 2024-04-15 15:31:14 None
4 5 iris 5 4 poly 2024-04-15 15:30:50 done 2024-04-15 15:31:14 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.936667 0.936667 0.933333 0.933333 2024-04-15 15:31:20 None
5 6 iris 5 6 poly 2024-04-15 15:30:50 done 2024-04-15 15:31:21 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.936667 0.936667 0.933333 0.933333 2024-04-15 15:31:27 None
6 7 iris 5 2 rbf 2024-04-15 15:30:50 done 2024-04-15 15:31:27 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.975000 0.975000 0.966667 0.966667 2024-04-15 15:31:32 None
7 8 iris 5 4 rbf 2024-04-15 15:30:50 done 2024-04-15 15:31:33 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.975000 0.975000 0.966667 0.966667 2024-04-15 15:31:38 None
8 9 iris 5 6 rbf 2024-04-15 15:30:50 done 2024-04-15 15:31:38 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.975000 0.975000 0.966667 0.966667 2024-04-15 15:31:44 None
9 10 iris 5 2 sigmoid 2024-04-15 15:30:50 done 2024-04-15 15:31:44 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.896667 0.896667 0.893333 0.893333 2024-04-15 15:31:50 None
10 11 iris 5 4 sigmoid 2024-04-15 15:30:50 done 2024-04-15 15:31:51 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.896667 0.896667 0.893333 0.893333 2024-04-15 15:31:56 None
11 12 iris 5 6 sigmoid 2024-04-15 15:30:50 done 2024-04-15 15:31:56 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.896667 0.896667 0.893333 0.893333 2024-04-15 15:32:02 None
12 13 error_dataset 3 42 linear 2024-04-15 15:30:50 error 2024-04-15 15:32:02 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... NaN NaN NaN NaN 2024-04-15 15:32:08 Traceback (most recent call last):\n File "/h...

Add Experiment and Execute

For various usecases it might be usefull to add a singular experiment and immidiately start its execution. An example of this is given below.

[38]:
experimenter.add_experiment_and_execute({'dataset': 'iris', 'cross_validation_splits': 5, 'seed': 17, 'kernel':'linear'}, run_ml)
2024-04-15 15:32:08,916  | py-experimenter - INFO     | Experiment with id 14 successfully added to database for immidiate execution.
[codecarbon INFO @ 15:32:08] [setup] RAM Tracking...
[codecarbon INFO @ 15:32:08] [setup] GPU Tracking...
[codecarbon INFO @ 15:32:08] No GPU found.
[codecarbon INFO @ 15:32:08] [setup] CPU Tracking...
[codecarbon WARNING @ 15:32:08] No CPU tracking mode found. Falling back on CPU constant mode.
[codecarbon WARNING @ 15:32:11] We saw that you have a 12th Gen Intel(R) Core(TM) i7-1260P but we don't know it. Please contact us.
[codecarbon INFO @ 15:32:11] CPU Model on constant consumption mode: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 15:32:11] >>> Tracker's metadata:
[codecarbon INFO @ 15:32:11]   Platform system: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
[codecarbon INFO @ 15:32:11]   Python version: 3.9.19
[codecarbon INFO @ 15:32:11]   CodeCarbon version: 2.3.4
[codecarbon INFO @ 15:32:11]   Available RAM : 15.475 GB
[codecarbon INFO @ 15:32:11]   CPU count: 16
[codecarbon INFO @ 15:32:11]   CPU model: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 15:32:11]   GPU count: None
[codecarbon INFO @ 15:32:11]   GPU model: None
[codecarbon INFO @ 15:32:15] Energy consumed for RAM : 0.000000 kWh. RAM Power : 5.803127288818359 W
[codecarbon INFO @ 15:32:15] Energy consumed for all CPUs : 0.000001 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 15:32:15] 0.000002 kWh of electricity used since the beginning.
2024-04-15 15:32:15,154  | py-experimenter - INFO     | Experiment with id 14 successfully executed.

Restart Failed Experiments

As experiments fail at some time, those experiments were reset for another try with reset_experiments(). The status describes which table rows should be replace. In this example all failed experiments, i.e. having status==error, are reset. Experiments can also be reset based on multiple status by simply passing a list of status, e.g. experimenter.reset_experiments('error', 'done'). In that case, all experiments with status ‘error’ or ‘done’ will be reset.

Now all remaining experiments are executed due to max_experiments=-1. Note that the random_order parameter is set to False by default meaning they are executed in orer of increasing id. The first parameter, i.e. run_ml, relates to the actual method that should be executed with the given keyfields of the table.

[39]:
experimenter.reset_experiments('error')

# showing database table
experimenter.get_table()
2024-04-15 15:32:15,212  | py-experimenter - INFO     | 1 rows successfully added to database. 0 rows were skipped.
2024-04-15 15:32:15,213  | py-experimenter - INFO     | 1 experiments with status error were reset
[39]:
ID dataset cross_validation_splits seed kernel creation_date status start_date name machine pipeline train_f1 train_accuracy test_f1 test_accuracy end_date error
0 1 iris 5 2 linear 2024-04-15 15:30:50 done 2024-04-15 15:30:50 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.971667 0.971667 0.966667 0.966667 2024-04-15 15:30:56 None
1 2 iris 5 4 linear 2024-04-15 15:30:50 done 2024-04-15 15:30:56 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.971667 0.971667 0.966667 0.966667 2024-04-15 15:31:02 None
2 3 iris 5 6 linear 2024-04-15 15:30:50 done 2024-04-15 15:31:02 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.971667 0.971667 0.966667 0.966667 2024-04-15 15:31:08 None
3 4 iris 5 2 poly 2024-04-15 15:30:50 done 2024-04-15 15:31:08 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.936667 0.936667 0.933333 0.933333 2024-04-15 15:31:14 None
4 5 iris 5 4 poly 2024-04-15 15:30:50 done 2024-04-15 15:31:14 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.936667 0.936667 0.933333 0.933333 2024-04-15 15:31:20 None
5 6 iris 5 6 poly 2024-04-15 15:30:50 done 2024-04-15 15:31:21 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.936667 0.936667 0.933333 0.933333 2024-04-15 15:31:27 None
6 7 iris 5 2 rbf 2024-04-15 15:30:50 done 2024-04-15 15:31:27 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.975000 0.975000 0.966667 0.966667 2024-04-15 15:31:32 None
7 8 iris 5 4 rbf 2024-04-15 15:30:50 done 2024-04-15 15:31:33 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.975000 0.975000 0.966667 0.966667 2024-04-15 15:31:38 None
8 9 iris 5 6 rbf 2024-04-15 15:30:50 done 2024-04-15 15:31:38 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.975000 0.975000 0.966667 0.966667 2024-04-15 15:31:44 None
9 10 iris 5 2 sigmoid 2024-04-15 15:30:50 done 2024-04-15 15:31:44 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.896667 0.896667 0.893333 0.893333 2024-04-15 15:31:50 None
10 11 iris 5 4 sigmoid 2024-04-15 15:30:50 done 2024-04-15 15:31:51 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.896667 0.896667 0.893333 0.893333 2024-04-15 15:31:56 None
11 12 iris 5 6 sigmoid 2024-04-15 15:30:50 done 2024-04-15 15:31:56 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.896667 0.896667 0.893333 0.893333 2024-04-15 15:32:02 None
12 14 iris 5 17 linear 2024-04-15 15:32:08 done None example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.971667 0.971667 0.966667 0.966667 2024-04-15 15:32:15 None
13 15 error_dataset 3 42 linear 2024-04-15 15:32:15 created None None None None NaN NaN NaN NaN None None

After the reset of failed experiments, they can be executed again as described above.

[40]:
experimenter.execute(run_ml, max_experiments=-1)

# showing database table
experimenter.get_table()
2024-04-15 15:32:21,347  | py-experimenter - ERROR    | Traceback (most recent call last):
  File "/home/lukas/py_experimenter/py_experimenter/experimenter.py", line 403, in _execute_experiment
    final_status = experiment_function(keyfield_values, result_processor, self.config.custom_configuration.custom_values)
  File "/tmp/ipykernel_152317/1244630566.py", line 31, in run_ml
    raise ValueError("Example error")
ValueError: Example error

/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
2024-04-15 15:32:21,419  | py-experimenter - INFO     | All configured executions finished.
[40]:
ID dataset cross_validation_splits seed kernel creation_date status start_date name machine pipeline train_f1 train_accuracy test_f1 test_accuracy end_date error
0 1 iris 5 2 linear 2024-04-15 15:30:50 done 2024-04-15 15:30:50 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.971667 0.971667 0.966667 0.966667 2024-04-15 15:30:56 None
1 2 iris 5 4 linear 2024-04-15 15:30:50 done 2024-04-15 15:30:56 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.971667 0.971667 0.966667 0.966667 2024-04-15 15:31:02 None
2 3 iris 5 6 linear 2024-04-15 15:30:50 done 2024-04-15 15:31:02 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.971667 0.971667 0.966667 0.966667 2024-04-15 15:31:08 None
3 4 iris 5 2 poly 2024-04-15 15:30:50 done 2024-04-15 15:31:08 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.936667 0.936667 0.933333 0.933333 2024-04-15 15:31:14 None
4 5 iris 5 4 poly 2024-04-15 15:30:50 done 2024-04-15 15:31:14 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.936667 0.936667 0.933333 0.933333 2024-04-15 15:31:20 None
5 6 iris 5 6 poly 2024-04-15 15:30:50 done 2024-04-15 15:31:21 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.936667 0.936667 0.933333 0.933333 2024-04-15 15:31:27 None
6 7 iris 5 2 rbf 2024-04-15 15:30:50 done 2024-04-15 15:31:27 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.975000 0.975000 0.966667 0.966667 2024-04-15 15:31:32 None
7 8 iris 5 4 rbf 2024-04-15 15:30:50 done 2024-04-15 15:31:33 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.975000 0.975000 0.966667 0.966667 2024-04-15 15:31:38 None
8 9 iris 5 6 rbf 2024-04-15 15:30:50 done 2024-04-15 15:31:38 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.975000 0.975000 0.966667 0.966667 2024-04-15 15:31:44 None
9 10 iris 5 2 sigmoid 2024-04-15 15:30:50 done 2024-04-15 15:31:44 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.896667 0.896667 0.893333 0.893333 2024-04-15 15:31:50 None
10 11 iris 5 4 sigmoid 2024-04-15 15:30:50 done 2024-04-15 15:31:51 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.896667 0.896667 0.893333 0.893333 2024-04-15 15:31:56 None
11 12 iris 5 6 sigmoid 2024-04-15 15:30:50 done 2024-04-15 15:31:56 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.896667 0.896667 0.893333 0.893333 2024-04-15 15:32:02 None
12 14 iris 5 17 linear 2024-04-15 15:32:08 done None example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... 0.971667 0.971667 0.966667 0.966667 2024-04-15 15:32:15 None
13 15 error_dataset 3 42 linear 2024-04-15 15:32:15 error 2024-04-15 15:32:15 example_notebook Worklaptop Pipeline(steps=[('standardscaler', StandardSca... NaN NaN NaN NaN 2024-04-15 15:32:21 Traceback (most recent call last):\n File "/h...

Generating Result Table

The table containes single experiment results. Those can be aggregated, e.g. to generate the mean over all seeds.

[41]:
result_table_agg = experimenter.get_table().groupby(['dataset']).mean(numeric_only = True)
result_table_agg
[41]:
ID cross_validation_splits seed train_f1 train_accuracy test_f1 test_accuracy
dataset
error_dataset 15.000000 3.0 42.0 NaN NaN NaN NaN
iris 7.076923 5.0 5.0 0.947051 0.947051 0.942051 0.942051

Printing LaTex Table

As pandas.Dataframes can easily be printed as LaTex table, here is an example code for one of the above result columns.

[42]:
print(result_table_agg[['test_f1']].style.to_latex())
\begin{tabular}{lr}
 & test_f1 \\
dataset &  \\
error_dataset & nan \\
iris & 0.942051 \\
\end{tabular}

CodeCarbon

CodeCarbon is integrated into PyExperimenter to provide information about the carbon emissions of experiments. CodeCarbon will create a table with suffix _codecarbon in the database, each row containing information about the carbon emissions of a single experiment.

[43]:
experimenter.get_codecarbon_table()
[43]:
ID experiment_id codecarbon_timestamp project_name run_id duration_seconds emissions_kg emissions_rate_kg_sec cpu_power_watt gpu_power_watt ... cpu_model gpu_count gpu_model longitude latitude ram_total_size tracking_mode on_cloud power_usage_efficiency offline_mode
0 1 1 2024-04-15T15:30:56 codecarbon 2a195f38-0473-43ef-a083-eb31df911ede 0.139903 5.762182e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.7054 52.3872 15.475006 process N 1.0 0
1 2 2 2024-04-15T15:31:02 codecarbon a2100083-2ed6-470c-ad61-219edc8cb79e 0.108931 4.423928e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.7054 52.3872 15.475006 process N 1.0 0
2 3 3 2024-04-15T15:31:08 codecarbon 38ec9b59-777a-464f-829b-748e1c361855 0.110439 4.340302e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.7054 52.3872 15.475006 process N 1.0 0
3 4 4 2024-04-15T15:31:14 codecarbon 9d20ff75-5f3c-4643-aed6-f1908f642c61 0.101483 4.133265e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.7054 52.3872 15.475006 process N 1.0 0
4 5 5 2024-04-15T15:31:21 codecarbon 04e05bbf-4d0a-4c34-9b23-3060b9a72533 0.132089 5.430256e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.7054 52.3872 15.475006 process N 1.0 0
5 6 6 2024-04-15T15:31:27 codecarbon d89299a6-b4b1-4deb-9d76-8f00a2ba4805 0.128005 5.274309e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.7054 52.3872 15.475006 process N 1.0 0
6 7 7 2024-04-15T15:31:33 codecarbon 06314180-7751-48bc-8421-12e7c0136f78 0.127770 5.259911e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.7054 52.3872 15.475006 process N 1.0 0
7 8 8 2024-04-15T15:31:38 codecarbon 0ff3ba7b-fb88-41ed-adb2-6897579918ca 0.131880 5.401629e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.7054 52.3872 15.475006 process N 1.0 0
8 9 9 2024-04-15T15:31:44 codecarbon 68223357-4f14-4fe7-a91e-a5cc2ef58c77 0.136260 5.611917e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.7054 52.3872 15.475006 process N 1.0 0
9 10 10 2024-04-15T15:31:50 codecarbon 02befad1-59ab-484f-93c1-532779f62848 0.116057 4.693639e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.7054 52.3872 15.475006 process N 1.0 0
10 11 11 2024-04-15T15:31:56 codecarbon aefafab2-ddbe-4175-b786-dc445593a082 0.127321 5.264124e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.7054 52.3872 15.475006 process N 1.0 0
11 12 12 2024-04-15T15:32:02 codecarbon f6c89d11-c994-4ae9-acc8-f26d01d3bff0 0.118765 4.905017e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.7054 52.3872 15.475006 process N 1.0 0
12 13 13 2024-04-15T15:32:08 codecarbon 745dae32-510b-4855-87e6-c6f7d5624a7c 0.061385 2.340841e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.7054 52.3872 15.475006 process N 1.0 0
13 14 14 2024-04-15T15:32:15 codecarbon c373b752-5eb6-4032-9df2-3259af17e487 0.131715 5.500138e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.7054 52.3872 15.475006 machine N 1.0 0
14 15 15 2024-04-15T15:32:21 codecarbon 0a130d02-6819-4d4c-bbec-2359dc4d631e 0.081272 3.183783e-07 0.000004 42.5 0.0 ... 12th Gen Intel(R) Core(TM) i7-1260P None None 9.7054 52.3872 15.475006 process N 1.0 0

15 rows × 34 columns

Aggregating CodeCarbon Results

The carbon emission information of CodeCarbon can be easily aggregated via pandas.Dataframe.

[44]:
carbon_emissions = experimenter.get_codecarbon_table().groupby(['project_name']).sum(numeric_only = True)
carbon_emissions
[44]:
ID experiment_id duration_seconds emissions_kg emissions_rate_kg_sec cpu_power_watt gpu_power_watt ram_power_watt cpu_energy_kw gpu_energy_kw ram_energy_kw energy_consumed_kw cpu_count ram_total_size power_usage_efficiency offline_mode
project_name
codecarbon 120 120 1.753273 0.000007 0.000061 637.5 0.0 7.061411 0.000019 0.0 2.161534e-07 0.00002 240.0 232.125092 15.0 0

Printing CodeCarbon Results as LaTex Table

Furthermore, the resulting pandas.Dataframe can easily be printed as LaTex table.

[45]:
print(carbon_emissions[['energy_consumed_kw', 'emissions_kg']].style.to_latex())
\begin{tabular}{lrr}
 & energy_consumed_kw & emissions_kg \\
project_name &  &  \\
codecarbon & 0.000020 & 0.000007 \\
\end{tabular}