Example: General Usage

This example shows the general usage of PyExperimenter, from creating an experiment configuration file, over the actual execution of (dummy) experiments, to the extraction of experimental results.

To execute this notebook you need to install:

pip install py_experimenter
pip install scikit-learn

Experiment Configuration File

This notebook shows an example execution of PyExperimenter based on an experiment configuration file. Further explanation about the usage of PyExperimenter can be found in the documentation.

[33]:

import os

content = """
PY_EXPERIMENTER:
  n_jobs: 1

  Database:
    provider: sqlite
    database: py_experimenter
    table:
      name: example_general_usage
      keyfields:
        dataset:
          type: VARCHAR(255)
          values: ['iris']
        cross_validation_splits:
          type: INT
          values: [5]
        seed:
          type: int
          values:
            start: 2
            stop: 7
            step: 2
        kernel:
          type: VARCHAR(255)
          values: ['linear', 'poly', 'rbf', 'sigmoid']
      result_timestamps: False
      resultfields:
        pipeline: LONGTEXT
        train_f1: DECIMAL
        train_accuracy: DECIMAL
        test_f1: DECIMAL
        test_accuracy: DECIMAL

  Custom:
    datapath: sample_data

  CodeCarbon:
    offline_mode: False
    measure_power_secs: 25
    tracking_mode: process
    log_level: error
    save_to_file: True
    output_dir: output/CodeCarbon
"""
# Create config directory if it does not exist
if not os.path.exists('config'):
    os.mkdir('config')

# Create config file
experiment_configuration_file_path = os.path.join('config', 'example_general_usage.yml')
with open(experiment_configuration_file_path, "w") as f:
  f.write(content)

Defining the execution function

Next, the execution of a single experiment has to be defined. Note that this is a dummy example, which contains limited reasonable code. It is meant to show the core functionality of the PyExperimenter.

The method is called with the parameters, i.e. keyfields, of a database entry. The results are meant to be processed to be written into the database, i.e. as resultfields.

[34]:

import random
import numpy as np

from py_experimenter.result_processor import ResultProcessor

from sklearn.datasets import load_iris
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.model_selection import cross_validate

def run_ml(parameters: dict, result_processor: ResultProcessor, custom_config: dict):
  seed = parameters['seed']
  random.seed(seed)
  np.random.seed(seed)

  data = load_iris()
  # In case you want to load a file from a path
  # path = os.path.join(custom_config['path'], parameters['dataset'])
  # data = pd.read_csv(path)

  X = data.data
  y = data.target

  model = make_pipeline(StandardScaler(), SVC(kernel=parameters['kernel'], gamma='auto'))
  result_processor.process_results({
    'pipeline': str(model)
  })

  if parameters['dataset'] != 'iris':
    raise ValueError("Example error")

  scores = cross_validate(model, X, y,
    cv=parameters['cross_validation_splits'],
    scoring=('accuracy', 'f1_micro'),
    return_train_score=True
  )

  result_processor.process_results({
    'train_f1': np.mean(scores['train_f1_micro']),
    'train_accuracy': np.mean(scores['train_accuracy'])
  })

  result_processor.process_results({
    'test_f1': np.mean(scores['test_f1_micro']),
    'test_accuracy': np.mean(scores['test_accuracy'])
  })

Executing PyExperimenter

The actual execution of the PyExperimenter is done in multiple steps.

Initialize PyExperimenter

The PyExperimenter is initialized with the previously created configuration file. Additionally, PyExperimenter is given a name, i.e. job id, which is especially useful for parallel executions of multiple experiments on HPC.

[35]:

from py_experimenter.experimenter import PyExperimenter

experimenter = PyExperimenter(experiment_configuration_file_path=experiment_configuration_file_path, name='example_notebook')

2024-04-15 15:30:49,974  | py-experimenter - INFO     | Found 4 keyfields
2024-04-15 15:30:49,976  | py-experimenter - INFO     | Found 5 resultfields
2024-04-15 15:30:49,977  | py-experimenter - WARNING  | No logtables given
2024-04-15 15:30:49,977  | py-experimenter - INFO     | Found 1 custom values
2024-04-15 15:30:49,978  | py-experimenter - INFO     | Found 6 codecarbon values
2024-04-15 15:30:49,980  | py-experimenter - INFO     | Initialized and connected to database

Fill Table

The table is filled based on the above created configuration file with fill_table_from_config(). Therefore, the cartesian product of all keyfields makes up the content of the table. Additionally, a custom defined row, i.e. a custom defined keyfield tuple, is added with fill_table_with_rows().

Note that the table can easily be obtained as pandas.Dataframe via experimenter.get_table().

[36]:

experimenter.fill_table_from_config()

experimenter.fill_table_with_rows(rows=[
      {'dataset': 'error_dataset', 'cross_validation_splits': 3, 'seed': 42, 'kernel':'linear'}])

# showing database table
experimenter.get_table()

2024-04-15 15:30:50,069  | py-experimenter - INFO     | 12 rows successfully added to database. 0 rows were skipped.
2024-04-15 15:30:50,083  | py-experimenter - INFO     | 1 rows successfully added to database. 0 rows were skipped.

[36]:

	ID	dataset	cross_validation_splits	seed	kernel	creation_date	status	start_date	name	machine	pipeline	train_f1	train_accuracy	test_f1	test_accuracy	end_date	error
0	1	iris	5	2	linear	2024-04-15 15:30:50	created	None	None	None	None	None	None	None	None	None	None
1	2	iris	5	4	linear	2024-04-15 15:30:50	created	None	None	None	None	None	None	None	None	None	None
2	3	iris	5	6	linear	2024-04-15 15:30:50	created	None	None	None	None	None	None	None	None	None	None
3	4	iris	5	2	poly	2024-04-15 15:30:50	created	None	None	None	None	None	None	None	None	None	None
4	5	iris	5	4	poly	2024-04-15 15:30:50	created	None	None	None	None	None	None	None	None	None	None
5	6	iris	5	6	poly	2024-04-15 15:30:50	created	None	None	None	None	None	None	None	None	None	None
6	7	iris	5	2	rbf	2024-04-15 15:30:50	created	None	None	None	None	None	None	None	None	None	None
7	8	iris	5	4	rbf	2024-04-15 15:30:50	created	None	None	None	None	None	None	None	None	None	None
8	9	iris	5	6	rbf	2024-04-15 15:30:50	created	None	None	None	None	None	None	None	None	None	None
9	10	iris	5	2	sigmoid	2024-04-15 15:30:50	created	None	None	None	None	None	None	None	None	None	None
10	11	iris	5	4	sigmoid	2024-04-15 15:30:50	created	None	None	None	None	None	None	None	None	None	None
11	12	iris	5	6	sigmoid	2024-04-15 15:30:50	created	None	None	None	None	None	None	None	None	None	None
12	13	error_dataset	3	42	linear	2024-04-15 15:30:50	created	None	None	None	None	None	None	None	None	None	None

Execute PyExperimenter

First two randmly chosen experiments are exeecuted by setting max_experiments=2 and random_order=True.

[37]:

experimenter.execute(run_ml, max_experiments=-1)

# showing database table
experimenter.get_table()

/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
2024-04-15 15:32:08,810  | py-experimenter - ERROR    | Traceback (most recent call last):
  File "/home/lukas/py_experimenter/py_experimenter/experimenter.py", line 403, in _execute_experiment
    final_status = experiment_function(keyfield_values, result_processor, self.config.custom_configuration.custom_values)
  File "/tmp/ipykernel_152317/1244630566.py", line 31, in run_ml
    raise ValueError("Example error")
ValueError: Example error

/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
2024-04-15 15:32:08,869  | py-experimenter - INFO     | All configured executions finished.

[37]:

	ID	dataset	cross_validation_splits	seed	kernel	creation_date	status	start_date	name	machine	pipeline	train_f1	train_accuracy	test_f1	test_accuracy	end_date	error
0	1	iris	5	2	linear	2024-04-15 15:30:50	done	2024-04-15 15:30:50	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.971667	0.971667	0.966667	0.966667	2024-04-15 15:30:56	None
1	2	iris	5	4	linear	2024-04-15 15:30:50	done	2024-04-15 15:30:56	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.971667	0.971667	0.966667	0.966667	2024-04-15 15:31:02	None
2	3	iris	5	6	linear	2024-04-15 15:30:50	done	2024-04-15 15:31:02	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.971667	0.971667	0.966667	0.966667	2024-04-15 15:31:08	None
3	4	iris	5	2	poly	2024-04-15 15:30:50	done	2024-04-15 15:31:08	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.936667	0.936667	0.933333	0.933333	2024-04-15 15:31:14	None
4	5	iris	5	4	poly	2024-04-15 15:30:50	done	2024-04-15 15:31:14	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.936667	0.936667	0.933333	0.933333	2024-04-15 15:31:20	None
5	6	iris	5	6	poly	2024-04-15 15:30:50	done	2024-04-15 15:31:21	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.936667	0.936667	0.933333	0.933333	2024-04-15 15:31:27	None
6	7	iris	5	2	rbf	2024-04-15 15:30:50	done	2024-04-15 15:31:27	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.975000	0.975000	0.966667	0.966667	2024-04-15 15:31:32	None
7	8	iris	5	4	rbf	2024-04-15 15:30:50	done	2024-04-15 15:31:33	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.975000	0.975000	0.966667	0.966667	2024-04-15 15:31:38	None
8	9	iris	5	6	rbf	2024-04-15 15:30:50	done	2024-04-15 15:31:38	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.975000	0.975000	0.966667	0.966667	2024-04-15 15:31:44	None
9	10	iris	5	2	sigmoid	2024-04-15 15:30:50	done	2024-04-15 15:31:44	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.896667	0.896667	0.893333	0.893333	2024-04-15 15:31:50	None
10	11	iris	5	4	sigmoid	2024-04-15 15:30:50	done	2024-04-15 15:31:51	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.896667	0.896667	0.893333	0.893333	2024-04-15 15:31:56	None
11	12	iris	5	6	sigmoid	2024-04-15 15:30:50	done	2024-04-15 15:31:56	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.896667	0.896667	0.893333	0.893333	2024-04-15 15:32:02	None
12	13	error_dataset	3	42	linear	2024-04-15 15:30:50	error	2024-04-15 15:32:02	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	NaN	NaN	NaN	NaN	2024-04-15 15:32:08	Traceback (most recent call last):\n File "/h...

Add Experiment and Execute

For various usecases it might be usefull to add a singular experiment and immidiately start its execution. An example of this is given below.

[38]:

experimenter.add_experiment_and_execute({'dataset': 'iris', 'cross_validation_splits': 5, 'seed': 17, 'kernel':'linear'}, run_ml)

2024-04-15 15:32:08,916  | py-experimenter - INFO     | Experiment with id 14 successfully added to database for immidiate execution.
[codecarbon INFO @ 15:32:08] [setup] RAM Tracking...
[codecarbon INFO @ 15:32:08] [setup] GPU Tracking...
[codecarbon INFO @ 15:32:08] No GPU found.
[codecarbon INFO @ 15:32:08] [setup] CPU Tracking...
[codecarbon WARNING @ 15:32:08] No CPU tracking mode found. Falling back on CPU constant mode.
[codecarbon WARNING @ 15:32:11] We saw that you have a 12th Gen Intel(R) Core(TM) i7-1260P but we don't know it. Please contact us.
[codecarbon INFO @ 15:32:11] CPU Model on constant consumption mode: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 15:32:11] >>> Tracker's metadata:
[codecarbon INFO @ 15:32:11]   Platform system: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
[codecarbon INFO @ 15:32:11]   Python version: 3.9.19
[codecarbon INFO @ 15:32:11]   CodeCarbon version: 2.3.4
[codecarbon INFO @ 15:32:11]   Available RAM : 15.475 GB
[codecarbon INFO @ 15:32:11]   CPU count: 16
[codecarbon INFO @ 15:32:11]   CPU model: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 15:32:11]   GPU count: None
[codecarbon INFO @ 15:32:11]   GPU model: None
[codecarbon INFO @ 15:32:15] Energy consumed for RAM : 0.000000 kWh. RAM Power : 5.803127288818359 W
[codecarbon INFO @ 15:32:15] Energy consumed for all CPUs : 0.000001 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 15:32:15] 0.000002 kWh of electricity used since the beginning.
2024-04-15 15:32:15,154  | py-experimenter - INFO     | Experiment with id 14 successfully executed.

Restart Failed Experiments

As experiments fail at some time, those experiments were reset for another try with reset_experiments(). The status describes which table rows should be replace. In this example all failed experiments, i.e. having status==error, are reset. Experiments can also be reset based on multiple status by simply passing a list of status, e.g. experimenter.reset_experiments('error', 'done'). In that case, all experiments with status ‘error’ or ‘done’ will be reset.

Now all remaining experiments are executed due to max_experiments=-1. Note that the random_order parameter is set to False by default meaning they are executed in orer of increasing id. The first parameter, i.e. run_ml, relates to the actual method that should be executed with the given keyfields of the table.

[39]:

experimenter.reset_experiments('error')

# showing database table
experimenter.get_table()

2024-04-15 15:32:15,212  | py-experimenter - INFO     | 1 rows successfully added to database. 0 rows were skipped.
2024-04-15 15:32:15,213  | py-experimenter - INFO     | 1 experiments with status error were reset

[39]:

	ID	dataset	cross_validation_splits	seed	kernel	creation_date	status	start_date	name	machine	pipeline	train_f1	train_accuracy	test_f1	test_accuracy	end_date	error
0	1	iris	5	2	linear	2024-04-15 15:30:50	done	2024-04-15 15:30:50	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.971667	0.971667	0.966667	0.966667	2024-04-15 15:30:56	None
1	2	iris	5	4	linear	2024-04-15 15:30:50	done	2024-04-15 15:30:56	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.971667	0.971667	0.966667	0.966667	2024-04-15 15:31:02	None
2	3	iris	5	6	linear	2024-04-15 15:30:50	done	2024-04-15 15:31:02	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.971667	0.971667	0.966667	0.966667	2024-04-15 15:31:08	None
3	4	iris	5	2	poly	2024-04-15 15:30:50	done	2024-04-15 15:31:08	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.936667	0.936667	0.933333	0.933333	2024-04-15 15:31:14	None
4	5	iris	5	4	poly	2024-04-15 15:30:50	done	2024-04-15 15:31:14	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.936667	0.936667	0.933333	0.933333	2024-04-15 15:31:20	None
5	6	iris	5	6	poly	2024-04-15 15:30:50	done	2024-04-15 15:31:21	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.936667	0.936667	0.933333	0.933333	2024-04-15 15:31:27	None
6	7	iris	5	2	rbf	2024-04-15 15:30:50	done	2024-04-15 15:31:27	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.975000	0.975000	0.966667	0.966667	2024-04-15 15:31:32	None
7	8	iris	5	4	rbf	2024-04-15 15:30:50	done	2024-04-15 15:31:33	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.975000	0.975000	0.966667	0.966667	2024-04-15 15:31:38	None
8	9	iris	5	6	rbf	2024-04-15 15:30:50	done	2024-04-15 15:31:38	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.975000	0.975000	0.966667	0.966667	2024-04-15 15:31:44	None
9	10	iris	5	2	sigmoid	2024-04-15 15:30:50	done	2024-04-15 15:31:44	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.896667	0.896667	0.893333	0.893333	2024-04-15 15:31:50	None
10	11	iris	5	4	sigmoid	2024-04-15 15:30:50	done	2024-04-15 15:31:51	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.896667	0.896667	0.893333	0.893333	2024-04-15 15:31:56	None
11	12	iris	5	6	sigmoid	2024-04-15 15:30:50	done	2024-04-15 15:31:56	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.896667	0.896667	0.893333	0.893333	2024-04-15 15:32:02	None
12	14	iris	5	17	linear	2024-04-15 15:32:08	done	None	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.971667	0.971667	0.966667	0.966667	2024-04-15 15:32:15	None
13	15	error_dataset	3	42	linear	2024-04-15 15:32:15	created	None	None	None	None	NaN	NaN	NaN	NaN	None	None

After the reset of failed experiments, they can be executed again as described above.

[40]:

experimenter.execute(run_ml, max_experiments=-1)

# showing database table
experimenter.get_table()

2024-04-15 15:32:21,347  | py-experimenter - ERROR    | Traceback (most recent call last):
  File "/home/lukas/py_experimenter/py_experimenter/experimenter.py", line 403, in _execute_experiment
    final_status = experiment_function(keyfield_values, result_processor, self.config.custom_configuration.custom_values)
  File "/tmp/ipykernel_152317/1244630566.py", line 31, in run_ml
    raise ValueError("Example error")
ValueError: Example error

/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
2024-04-15 15:32:21,419  | py-experimenter - INFO     | All configured executions finished.

[40]:

	ID	dataset	cross_validation_splits	seed	kernel	creation_date	status	start_date	name	machine	pipeline	train_f1	train_accuracy	test_f1	test_accuracy	end_date	error
0	1	iris	5	2	linear	2024-04-15 15:30:50	done	2024-04-15 15:30:50	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.971667	0.971667	0.966667	0.966667	2024-04-15 15:30:56	None
1	2	iris	5	4	linear	2024-04-15 15:30:50	done	2024-04-15 15:30:56	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.971667	0.971667	0.966667	0.966667	2024-04-15 15:31:02	None
2	3	iris	5	6	linear	2024-04-15 15:30:50	done	2024-04-15 15:31:02	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.971667	0.971667	0.966667	0.966667	2024-04-15 15:31:08	None
3	4	iris	5	2	poly	2024-04-15 15:30:50	done	2024-04-15 15:31:08	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.936667	0.936667	0.933333	0.933333	2024-04-15 15:31:14	None
4	5	iris	5	4	poly	2024-04-15 15:30:50	done	2024-04-15 15:31:14	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.936667	0.936667	0.933333	0.933333	2024-04-15 15:31:20	None
5	6	iris	5	6	poly	2024-04-15 15:30:50	done	2024-04-15 15:31:21	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.936667	0.936667	0.933333	0.933333	2024-04-15 15:31:27	None
6	7	iris	5	2	rbf	2024-04-15 15:30:50	done	2024-04-15 15:31:27	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.975000	0.975000	0.966667	0.966667	2024-04-15 15:31:32	None
7	8	iris	5	4	rbf	2024-04-15 15:30:50	done	2024-04-15 15:31:33	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.975000	0.975000	0.966667	0.966667	2024-04-15 15:31:38	None
8	9	iris	5	6	rbf	2024-04-15 15:30:50	done	2024-04-15 15:31:38	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.975000	0.975000	0.966667	0.966667	2024-04-15 15:31:44	None
9	10	iris	5	2	sigmoid	2024-04-15 15:30:50	done	2024-04-15 15:31:44	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.896667	0.896667	0.893333	0.893333	2024-04-15 15:31:50	None
10	11	iris	5	4	sigmoid	2024-04-15 15:30:50	done	2024-04-15 15:31:51	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.896667	0.896667	0.893333	0.893333	2024-04-15 15:31:56	None
11	12	iris	5	6	sigmoid	2024-04-15 15:30:50	done	2024-04-15 15:31:56	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.896667	0.896667	0.893333	0.893333	2024-04-15 15:32:02	None
12	14	iris	5	17	linear	2024-04-15 15:32:08	done	None	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	0.971667	0.971667	0.966667	0.966667	2024-04-15 15:32:15	None
13	15	error_dataset	3	42	linear	2024-04-15 15:32:15	error	2024-04-15 15:32:15	example_notebook	Worklaptop	Pipeline(steps=[('standardscaler', StandardSca...	NaN	NaN	NaN	NaN	2024-04-15 15:32:21	Traceback (most recent call last):\n File "/h...

Generating Result Table

The table containes single experiment results. Those can be aggregated, e.g. to generate the mean over all seeds.

[41]:

result_table_agg = experimenter.get_table().groupby(['dataset']).mean(numeric_only = True)
result_table_agg

[41]:

	ID	cross_validation_splits	seed	train_f1	train_accuracy	test_f1	test_accuracy
dataset
error_dataset	15.000000	3.0	42.0	NaN	NaN	NaN	NaN
iris	7.076923	5.0	5.0	0.947051	0.947051	0.942051	0.942051

Printing LaTex Table

As pandas.Dataframes can easily be printed as LaTex table, here is an example code for one of the above result columns.

[42]:

print(result_table_agg[['test_f1']].style.to_latex())

\begin{tabular}{lr}
 & test_f1 \\
dataset &  \\
error_dataset & nan \\
iris & 0.942051 \\
\end{tabular}

CodeCarbon

CodeCarbon is integrated into PyExperimenter to provide information about the carbon emissions of experiments. CodeCarbon will create a table with suffix _codecarbon in the database, each row containing information about the carbon emissions of a single experiment.

[43]:

experimenter.get_codecarbon_table()

[43]:

	ID	experiment_id	codecarbon_timestamp	project_name	run_id	duration_seconds	emissions_kg	emissions_rate_kg_sec	cpu_power_watt	...	cpu_model	gpu_count	gpu_model	longitude	latitude	ram_total_size	tracking_mode	on_cloud	power_usage_efficiency
0	1	1	2024-04-15T15:30:56	codecarbon	2a195f38-0473-43ef-a083-eb31df911ede	0.139903	5.762182e-07	0.000004	42.5	...	12th Gen Intel(R) Core(TM) i7-1260P	None	None	9.7054	52.3872	15.475006	process	N	1.0
1	2	2	2024-04-15T15:31:02	codecarbon	a2100083-2ed6-470c-ad61-219edc8cb79e	0.108931	4.423928e-07	0.000004	42.5	...	12th Gen Intel(R) Core(TM) i7-1260P	None	None	9.7054	52.3872	15.475006	process	N	1.0
2	3	3	2024-04-15T15:31:08	codecarbon	38ec9b59-777a-464f-829b-748e1c361855	0.110439	4.340302e-07	0.000004	42.5	...	12th Gen Intel(R) Core(TM) i7-1260P	None	None	9.7054	52.3872	15.475006	process	N	1.0
3	4	4	2024-04-15T15:31:14	codecarbon	9d20ff75-5f3c-4643-aed6-f1908f642c61	0.101483	4.133265e-07	0.000004	42.5	...	12th Gen Intel(R) Core(TM) i7-1260P	None	None	9.7054	52.3872	15.475006	process	N	1.0
4	5	5	2024-04-15T15:31:21	codecarbon	04e05bbf-4d0a-4c34-9b23-3060b9a72533	0.132089	5.430256e-07	0.000004	42.5	...	12th Gen Intel(R) Core(TM) i7-1260P	None	None	9.7054	52.3872	15.475006	process	N	1.0
5	6	6	2024-04-15T15:31:27	codecarbon	d89299a6-b4b1-4deb-9d76-8f00a2ba4805	0.128005	5.274309e-07	0.000004	42.5	...	12th Gen Intel(R) Core(TM) i7-1260P	None	None	9.7054	52.3872	15.475006	process	N	1.0
6	7	7	2024-04-15T15:31:33	codecarbon	06314180-7751-48bc-8421-12e7c0136f78	0.127770	5.259911e-07	0.000004	42.5	...	12th Gen Intel(R) Core(TM) i7-1260P	None	None	9.7054	52.3872	15.475006	process	N	1.0
7	8	8	2024-04-15T15:31:38	codecarbon	0ff3ba7b-fb88-41ed-adb2-6897579918ca	0.131880	5.401629e-07	0.000004	42.5	...	12th Gen Intel(R) Core(TM) i7-1260P	None	None	9.7054	52.3872	15.475006	process	N	1.0
8	9	9	2024-04-15T15:31:44	codecarbon	68223357-4f14-4fe7-a91e-a5cc2ef58c77	0.136260	5.611917e-07	0.000004	42.5	...	12th Gen Intel(R) Core(TM) i7-1260P	None	None	9.7054	52.3872	15.475006	process	N	1.0
9	10	10	2024-04-15T15:31:50	codecarbon	02befad1-59ab-484f-93c1-532779f62848	0.116057	4.693639e-07	0.000004	42.5	...	12th Gen Intel(R) Core(TM) i7-1260P	None	None	9.7054	52.3872	15.475006	process	N	1.0
10	11	11	2024-04-15T15:31:56	codecarbon	aefafab2-ddbe-4175-b786-dc445593a082	0.127321	5.264124e-07	0.000004	42.5	...	12th Gen Intel(R) Core(TM) i7-1260P	None	None	9.7054	52.3872	15.475006	process	N	1.0
11	12	12	2024-04-15T15:32:02	codecarbon	f6c89d11-c994-4ae9-acc8-f26d01d3bff0	0.118765	4.905017e-07	0.000004	42.5	...	12th Gen Intel(R) Core(TM) i7-1260P	None	None	9.7054	52.3872	15.475006	process	N	1.0
12	13	13	2024-04-15T15:32:08	codecarbon	745dae32-510b-4855-87e6-c6f7d5624a7c	0.061385	2.340841e-07	0.000004	42.5	...	12th Gen Intel(R) Core(TM) i7-1260P	None	None	9.7054	52.3872	15.475006	process	N	1.0
13	14	14	2024-04-15T15:32:15	codecarbon	c373b752-5eb6-4032-9df2-3259af17e487	0.131715	5.500138e-07	0.000004	42.5	...	12th Gen Intel(R) Core(TM) i7-1260P	None	None	9.7054	52.3872	15.475006	machine	N	1.0
14	15	15	2024-04-15T15:32:21	codecarbon	0a130d02-6819-4d4c-bbec-2359dc4d631e	0.081272	3.183783e-07	0.000004	42.5	...	12th Gen Intel(R) Core(TM) i7-1260P	None	None	9.7054	52.3872	15.475006	process	N	1.0

15 rows × 34 columns

Aggregating CodeCarbon Results

The carbon emission information of CodeCarbon can be easily aggregated via pandas.Dataframe.

[44]:

carbon_emissions = experimenter.get_codecarbon_table().groupby(['project_name']).sum(numeric_only = True)
carbon_emissions

[44]:

	ID	experiment_id	duration_seconds	emissions_kg	emissions_rate_kg_sec	cpu_power_watt	gpu_power_watt	ram_power_watt	cpu_energy_kw	gpu_energy_kw	ram_energy_kw	energy_consumed_kw	cpu_count	ram_total_size	power_usage_efficiency	offline_mode
project_name
codecarbon	120	120	1.753273	0.000007	0.000061	637.5	0.0	7.061411	0.000019	0.0	2.161534e-07	0.00002	240.0	232.125092	15.0	0

Printing CodeCarbon Results as LaTex Table

Furthermore, the resulting pandas.Dataframe can easily be printed as LaTex table.

[45]:

print(carbon_emissions[['energy_consumed_kw', 'emissions_kg']].style.to_latex())

\begin{tabular}{lrr}
 & energy_consumed_kw & emissions_kg \\
project_name &  &  \\
codecarbon & 0.000020 & 0.000007 \\
\end{tabular}