Example: General Usage
This example shows the general usage of PyExperimenter
, from creating an experiment configuration file, over the actual execution of (dummy) experiments, to the extraction of experimental results.
To execute this notebook you need to install:
pip install py_experimenter
pip install scikit-learn
Experiment Configuration File
This notebook shows an example execution of PyExperimenter
based on an experiment configuration file. Further explanation about the usage of PyExperimenter
can be found in the documentation.
[33]:
import os
content = """
PY_EXPERIMENTER:
n_jobs: 1
Database:
provider: sqlite
database: py_experimenter
table:
name: example_general_usage
keyfields:
dataset:
type: VARCHAR(255)
values: ['iris']
cross_validation_splits:
type: INT
values: [5]
seed:
type: int
values:
start: 2
stop: 7
step: 2
kernel:
type: VARCHAR(255)
values: ['linear', 'poly', 'rbf', 'sigmoid']
result_timestamps: False
resultfields:
pipeline: LONGTEXT
train_f1: DECIMAL
train_accuracy: DECIMAL
test_f1: DECIMAL
test_accuracy: DECIMAL
Custom:
datapath: sample_data
CodeCarbon:
offline_mode: False
measure_power_secs: 25
tracking_mode: process
log_level: error
save_to_file: True
output_dir: output/CodeCarbon
"""
# Create config directory if it does not exist
if not os.path.exists('config'):
os.mkdir('config')
# Create config file
experiment_configuration_file_path = os.path.join('config', 'example_general_usage.yml')
with open(experiment_configuration_file_path, "w") as f:
f.write(content)
Defining the execution function
Next, the execution of a single experiment has to be defined. Note that this is a dummy example, which contains limited reasonable code. It is meant to show the core functionality of the PyExperimenter.
The method is called with the parameters, i.e. keyfields
, of a database entry. The results are meant to be processed to be written into the database, i.e. as resultfields
.
[34]:
import random
import numpy as np
from py_experimenter.result_processor import ResultProcessor
from sklearn.datasets import load_iris
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.model_selection import cross_validate
def run_ml(parameters: dict, result_processor: ResultProcessor, custom_config: dict):
seed = parameters['seed']
random.seed(seed)
np.random.seed(seed)
data = load_iris()
# In case you want to load a file from a path
# path = os.path.join(custom_config['path'], parameters['dataset'])
# data = pd.read_csv(path)
X = data.data
y = data.target
model = make_pipeline(StandardScaler(), SVC(kernel=parameters['kernel'], gamma='auto'))
result_processor.process_results({
'pipeline': str(model)
})
if parameters['dataset'] != 'iris':
raise ValueError("Example error")
scores = cross_validate(model, X, y,
cv=parameters['cross_validation_splits'],
scoring=('accuracy', 'f1_micro'),
return_train_score=True
)
result_processor.process_results({
'train_f1': np.mean(scores['train_f1_micro']),
'train_accuracy': np.mean(scores['train_accuracy'])
})
result_processor.process_results({
'test_f1': np.mean(scores['test_f1_micro']),
'test_accuracy': np.mean(scores['test_accuracy'])
})
Executing PyExperimenter
The actual execution of the PyExperimenter is done in multiple steps.
Initialize PyExperimenter
The PyExperimenter is initialized with the previously created configuration file. Additionally, PyExperimenter
is given a name
, i.e. job id, which is especially useful for parallel executions of multiple experiments on HPC.
[35]:
from py_experimenter.experimenter import PyExperimenter
experimenter = PyExperimenter(experiment_configuration_file_path=experiment_configuration_file_path, name='example_notebook')
2024-04-15 15:30:49,974 | py-experimenter - INFO | Found 4 keyfields
2024-04-15 15:30:49,976 | py-experimenter - INFO | Found 5 resultfields
2024-04-15 15:30:49,977 | py-experimenter - WARNING | No logtables given
2024-04-15 15:30:49,977 | py-experimenter - INFO | Found 1 custom values
2024-04-15 15:30:49,978 | py-experimenter - INFO | Found 6 codecarbon values
2024-04-15 15:30:49,980 | py-experimenter - INFO | Initialized and connected to database
Fill Table
The table is filled based on the above created configuration file with fill_table_from_config()
. Therefore, the cartesian product of all keyfields makes up the content of the table. Additionally, a custom defined row, i.e. a custom defined keyfield tuple, is added with fill_table_with_rows()
.
Note that the table can easily be obtained as pandas.Dataframe
via experimenter.get_table()
.
[36]:
experimenter.fill_table_from_config()
experimenter.fill_table_with_rows(rows=[
{'dataset': 'error_dataset', 'cross_validation_splits': 3, 'seed': 42, 'kernel':'linear'}])
# showing database table
experimenter.get_table()
2024-04-15 15:30:50,069 | py-experimenter - INFO | 12 rows successfully added to database. 0 rows were skipped.
2024-04-15 15:30:50,083 | py-experimenter - INFO | 1 rows successfully added to database. 0 rows were skipped.
[36]:
ID | dataset | cross_validation_splits | seed | kernel | creation_date | status | start_date | name | machine | pipeline | train_f1 | train_accuracy | test_f1 | test_accuracy | end_date | error | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | iris | 5 | 2 | linear | 2024-04-15 15:30:50 | created | None | None | None | None | None | None | None | None | None | None |
1 | 2 | iris | 5 | 4 | linear | 2024-04-15 15:30:50 | created | None | None | None | None | None | None | None | None | None | None |
2 | 3 | iris | 5 | 6 | linear | 2024-04-15 15:30:50 | created | None | None | None | None | None | None | None | None | None | None |
3 | 4 | iris | 5 | 2 | poly | 2024-04-15 15:30:50 | created | None | None | None | None | None | None | None | None | None | None |
4 | 5 | iris | 5 | 4 | poly | 2024-04-15 15:30:50 | created | None | None | None | None | None | None | None | None | None | None |
5 | 6 | iris | 5 | 6 | poly | 2024-04-15 15:30:50 | created | None | None | None | None | None | None | None | None | None | None |
6 | 7 | iris | 5 | 2 | rbf | 2024-04-15 15:30:50 | created | None | None | None | None | None | None | None | None | None | None |
7 | 8 | iris | 5 | 4 | rbf | 2024-04-15 15:30:50 | created | None | None | None | None | None | None | None | None | None | None |
8 | 9 | iris | 5 | 6 | rbf | 2024-04-15 15:30:50 | created | None | None | None | None | None | None | None | None | None | None |
9 | 10 | iris | 5 | 2 | sigmoid | 2024-04-15 15:30:50 | created | None | None | None | None | None | None | None | None | None | None |
10 | 11 | iris | 5 | 4 | sigmoid | 2024-04-15 15:30:50 | created | None | None | None | None | None | None | None | None | None | None |
11 | 12 | iris | 5 | 6 | sigmoid | 2024-04-15 15:30:50 | created | None | None | None | None | None | None | None | None | None | None |
12 | 13 | error_dataset | 3 | 42 | linear | 2024-04-15 15:30:50 | created | None | None | None | None | None | None | None | None | None | None |
Execute PyExperimenter
First two randmly chosen experiments are exeecuted by setting max_experiments=2
and random_order=True
.
[37]:
experimenter.execute(run_ml, max_experiments=-1)
# showing database table
experimenter.get_table()
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
2024-04-15 15:32:08,810 | py-experimenter - ERROR | Traceback (most recent call last):
File "/home/lukas/py_experimenter/py_experimenter/experimenter.py", line 403, in _execute_experiment
final_status = experiment_function(keyfield_values, result_processor, self.config.custom_configuration.custom_values)
File "/tmp/ipykernel_152317/1244630566.py", line 31, in run_ml
raise ValueError("Example error")
ValueError: Example error
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
2024-04-15 15:32:08,869 | py-experimenter - INFO | All configured executions finished.
[37]:
ID | dataset | cross_validation_splits | seed | kernel | creation_date | status | start_date | name | machine | pipeline | train_f1 | train_accuracy | test_f1 | test_accuracy | end_date | error | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | iris | 5 | 2 | linear | 2024-04-15 15:30:50 | done | 2024-04-15 15:30:50 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.971667 | 0.971667 | 0.966667 | 0.966667 | 2024-04-15 15:30:56 | None |
1 | 2 | iris | 5 | 4 | linear | 2024-04-15 15:30:50 | done | 2024-04-15 15:30:56 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.971667 | 0.971667 | 0.966667 | 0.966667 | 2024-04-15 15:31:02 | None |
2 | 3 | iris | 5 | 6 | linear | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:02 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.971667 | 0.971667 | 0.966667 | 0.966667 | 2024-04-15 15:31:08 | None |
3 | 4 | iris | 5 | 2 | poly | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:08 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.936667 | 0.936667 | 0.933333 | 0.933333 | 2024-04-15 15:31:14 | None |
4 | 5 | iris | 5 | 4 | poly | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:14 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.936667 | 0.936667 | 0.933333 | 0.933333 | 2024-04-15 15:31:20 | None |
5 | 6 | iris | 5 | 6 | poly | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:21 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.936667 | 0.936667 | 0.933333 | 0.933333 | 2024-04-15 15:31:27 | None |
6 | 7 | iris | 5 | 2 | rbf | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:27 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.975000 | 0.975000 | 0.966667 | 0.966667 | 2024-04-15 15:31:32 | None |
7 | 8 | iris | 5 | 4 | rbf | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:33 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.975000 | 0.975000 | 0.966667 | 0.966667 | 2024-04-15 15:31:38 | None |
8 | 9 | iris | 5 | 6 | rbf | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:38 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.975000 | 0.975000 | 0.966667 | 0.966667 | 2024-04-15 15:31:44 | None |
9 | 10 | iris | 5 | 2 | sigmoid | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:44 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.896667 | 0.896667 | 0.893333 | 0.893333 | 2024-04-15 15:31:50 | None |
10 | 11 | iris | 5 | 4 | sigmoid | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:51 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.896667 | 0.896667 | 0.893333 | 0.893333 | 2024-04-15 15:31:56 | None |
11 | 12 | iris | 5 | 6 | sigmoid | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:56 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.896667 | 0.896667 | 0.893333 | 0.893333 | 2024-04-15 15:32:02 | None |
12 | 13 | error_dataset | 3 | 42 | linear | 2024-04-15 15:30:50 | error | 2024-04-15 15:32:02 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | NaN | NaN | NaN | NaN | 2024-04-15 15:32:08 | Traceback (most recent call last):\n File "/h... |
Add Experiment and Execute
For various usecases it might be usefull to add a singular experiment and immidiately start its execution. An example of this is given below.
[38]:
experimenter.add_experiment_and_execute({'dataset': 'iris', 'cross_validation_splits': 5, 'seed': 17, 'kernel':'linear'}, run_ml)
2024-04-15 15:32:08,916 | py-experimenter - INFO | Experiment with id 14 successfully added to database for immidiate execution.
[codecarbon INFO @ 15:32:08] [setup] RAM Tracking...
[codecarbon INFO @ 15:32:08] [setup] GPU Tracking...
[codecarbon INFO @ 15:32:08] No GPU found.
[codecarbon INFO @ 15:32:08] [setup] CPU Tracking...
[codecarbon WARNING @ 15:32:08] No CPU tracking mode found. Falling back on CPU constant mode.
[codecarbon WARNING @ 15:32:11] We saw that you have a 12th Gen Intel(R) Core(TM) i7-1260P but we don't know it. Please contact us.
[codecarbon INFO @ 15:32:11] CPU Model on constant consumption mode: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 15:32:11] >>> Tracker's metadata:
[codecarbon INFO @ 15:32:11] Platform system: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
[codecarbon INFO @ 15:32:11] Python version: 3.9.19
[codecarbon INFO @ 15:32:11] CodeCarbon version: 2.3.4
[codecarbon INFO @ 15:32:11] Available RAM : 15.475 GB
[codecarbon INFO @ 15:32:11] CPU count: 16
[codecarbon INFO @ 15:32:11] CPU model: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 15:32:11] GPU count: None
[codecarbon INFO @ 15:32:11] GPU model: None
[codecarbon INFO @ 15:32:15] Energy consumed for RAM : 0.000000 kWh. RAM Power : 5.803127288818359 W
[codecarbon INFO @ 15:32:15] Energy consumed for all CPUs : 0.000001 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 15:32:15] 0.000002 kWh of electricity used since the beginning.
2024-04-15 15:32:15,154 | py-experimenter - INFO | Experiment with id 14 successfully executed.
Restart Failed Experiments
As experiments fail at some time, those experiments were reset for another try with reset_experiments()
. The status
describes which table rows should be replace. In this example all failed experiments, i.e. having status==error
, are reset. Experiments can also be reset based on multiple status by simply passing a list of status, e.g. experimenter.reset_experiments('error', 'done')
. In that case, all experiments with status ‘error’ or ‘done’ will be reset.
Now all remaining experiments are executed due to max_experiments=-1
. Note that the random_order
parameter is set to False
by default meaning they are executed in orer of increasing id. The first parameter, i.e. run_ml
, relates to the actual method that should be executed with the given keyfields of the table.
[39]:
experimenter.reset_experiments('error')
# showing database table
experimenter.get_table()
2024-04-15 15:32:15,212 | py-experimenter - INFO | 1 rows successfully added to database. 0 rows were skipped.
2024-04-15 15:32:15,213 | py-experimenter - INFO | 1 experiments with status error were reset
[39]:
ID | dataset | cross_validation_splits | seed | kernel | creation_date | status | start_date | name | machine | pipeline | train_f1 | train_accuracy | test_f1 | test_accuracy | end_date | error | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | iris | 5 | 2 | linear | 2024-04-15 15:30:50 | done | 2024-04-15 15:30:50 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.971667 | 0.971667 | 0.966667 | 0.966667 | 2024-04-15 15:30:56 | None |
1 | 2 | iris | 5 | 4 | linear | 2024-04-15 15:30:50 | done | 2024-04-15 15:30:56 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.971667 | 0.971667 | 0.966667 | 0.966667 | 2024-04-15 15:31:02 | None |
2 | 3 | iris | 5 | 6 | linear | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:02 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.971667 | 0.971667 | 0.966667 | 0.966667 | 2024-04-15 15:31:08 | None |
3 | 4 | iris | 5 | 2 | poly | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:08 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.936667 | 0.936667 | 0.933333 | 0.933333 | 2024-04-15 15:31:14 | None |
4 | 5 | iris | 5 | 4 | poly | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:14 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.936667 | 0.936667 | 0.933333 | 0.933333 | 2024-04-15 15:31:20 | None |
5 | 6 | iris | 5 | 6 | poly | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:21 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.936667 | 0.936667 | 0.933333 | 0.933333 | 2024-04-15 15:31:27 | None |
6 | 7 | iris | 5 | 2 | rbf | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:27 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.975000 | 0.975000 | 0.966667 | 0.966667 | 2024-04-15 15:31:32 | None |
7 | 8 | iris | 5 | 4 | rbf | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:33 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.975000 | 0.975000 | 0.966667 | 0.966667 | 2024-04-15 15:31:38 | None |
8 | 9 | iris | 5 | 6 | rbf | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:38 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.975000 | 0.975000 | 0.966667 | 0.966667 | 2024-04-15 15:31:44 | None |
9 | 10 | iris | 5 | 2 | sigmoid | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:44 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.896667 | 0.896667 | 0.893333 | 0.893333 | 2024-04-15 15:31:50 | None |
10 | 11 | iris | 5 | 4 | sigmoid | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:51 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.896667 | 0.896667 | 0.893333 | 0.893333 | 2024-04-15 15:31:56 | None |
11 | 12 | iris | 5 | 6 | sigmoid | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:56 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.896667 | 0.896667 | 0.893333 | 0.893333 | 2024-04-15 15:32:02 | None |
12 | 14 | iris | 5 | 17 | linear | 2024-04-15 15:32:08 | done | None | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.971667 | 0.971667 | 0.966667 | 0.966667 | 2024-04-15 15:32:15 | None |
13 | 15 | error_dataset | 3 | 42 | linear | 2024-04-15 15:32:15 | created | None | None | None | None | NaN | NaN | NaN | NaN | None | None |
After the reset of failed experiments, they can be executed again as described above.
[40]:
experimenter.execute(run_ml, max_experiments=-1)
# showing database table
experimenter.get_table()
2024-04-15 15:32:21,347 | py-experimenter - ERROR | Traceback (most recent call last):
File "/home/lukas/py_experimenter/py_experimenter/experimenter.py", line 403, in _execute_experiment
final_status = experiment_function(keyfield_values, result_processor, self.config.custom_configuration.custom_values)
File "/tmp/ipykernel_152317/1244630566.py", line 31, in run_ml
raise ValueError("Example error")
ValueError: Example error
/home/lukas/anaconda3/envs/py-experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
2024-04-15 15:32:21,419 | py-experimenter - INFO | All configured executions finished.
[40]:
ID | dataset | cross_validation_splits | seed | kernel | creation_date | status | start_date | name | machine | pipeline | train_f1 | train_accuracy | test_f1 | test_accuracy | end_date | error | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | iris | 5 | 2 | linear | 2024-04-15 15:30:50 | done | 2024-04-15 15:30:50 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.971667 | 0.971667 | 0.966667 | 0.966667 | 2024-04-15 15:30:56 | None |
1 | 2 | iris | 5 | 4 | linear | 2024-04-15 15:30:50 | done | 2024-04-15 15:30:56 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.971667 | 0.971667 | 0.966667 | 0.966667 | 2024-04-15 15:31:02 | None |
2 | 3 | iris | 5 | 6 | linear | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:02 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.971667 | 0.971667 | 0.966667 | 0.966667 | 2024-04-15 15:31:08 | None |
3 | 4 | iris | 5 | 2 | poly | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:08 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.936667 | 0.936667 | 0.933333 | 0.933333 | 2024-04-15 15:31:14 | None |
4 | 5 | iris | 5 | 4 | poly | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:14 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.936667 | 0.936667 | 0.933333 | 0.933333 | 2024-04-15 15:31:20 | None |
5 | 6 | iris | 5 | 6 | poly | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:21 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.936667 | 0.936667 | 0.933333 | 0.933333 | 2024-04-15 15:31:27 | None |
6 | 7 | iris | 5 | 2 | rbf | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:27 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.975000 | 0.975000 | 0.966667 | 0.966667 | 2024-04-15 15:31:32 | None |
7 | 8 | iris | 5 | 4 | rbf | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:33 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.975000 | 0.975000 | 0.966667 | 0.966667 | 2024-04-15 15:31:38 | None |
8 | 9 | iris | 5 | 6 | rbf | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:38 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.975000 | 0.975000 | 0.966667 | 0.966667 | 2024-04-15 15:31:44 | None |
9 | 10 | iris | 5 | 2 | sigmoid | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:44 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.896667 | 0.896667 | 0.893333 | 0.893333 | 2024-04-15 15:31:50 | None |
10 | 11 | iris | 5 | 4 | sigmoid | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:51 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.896667 | 0.896667 | 0.893333 | 0.893333 | 2024-04-15 15:31:56 | None |
11 | 12 | iris | 5 | 6 | sigmoid | 2024-04-15 15:30:50 | done | 2024-04-15 15:31:56 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.896667 | 0.896667 | 0.893333 | 0.893333 | 2024-04-15 15:32:02 | None |
12 | 14 | iris | 5 | 17 | linear | 2024-04-15 15:32:08 | done | None | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | 0.971667 | 0.971667 | 0.966667 | 0.966667 | 2024-04-15 15:32:15 | None |
13 | 15 | error_dataset | 3 | 42 | linear | 2024-04-15 15:32:15 | error | 2024-04-15 15:32:15 | example_notebook | Worklaptop | Pipeline(steps=[('standardscaler', StandardSca... | NaN | NaN | NaN | NaN | 2024-04-15 15:32:21 | Traceback (most recent call last):\n File "/h... |
Generating Result Table
The table containes single experiment results. Those can be aggregated, e.g. to generate the mean over all seeds.
[41]:
result_table_agg = experimenter.get_table().groupby(['dataset']).mean(numeric_only = True)
result_table_agg
[41]:
ID | cross_validation_splits | seed | train_f1 | train_accuracy | test_f1 | test_accuracy | |
---|---|---|---|---|---|---|---|
dataset | |||||||
error_dataset | 15.000000 | 3.0 | 42.0 | NaN | NaN | NaN | NaN |
iris | 7.076923 | 5.0 | 5.0 | 0.947051 | 0.947051 | 0.942051 | 0.942051 |
Printing LaTex Table
As pandas.Dataframe
s can easily be printed as LaTex table, here is an example code for one of the above result columns.
[42]:
print(result_table_agg[['test_f1']].style.to_latex())
\begin{tabular}{lr}
& test_f1 \\
dataset & \\
error_dataset & nan \\
iris & 0.942051 \\
\end{tabular}
CodeCarbon
CodeCarbon is integrated into PyExperimenter
to provide information about the carbon emissions of experiments. CodeCarbon
will create a table with suffix _codecarbon
in the database, each row containing information about the carbon emissions of a single experiment.
[43]:
experimenter.get_codecarbon_table()
[43]:
ID | experiment_id | codecarbon_timestamp | project_name | run_id | duration_seconds | emissions_kg | emissions_rate_kg_sec | cpu_power_watt | gpu_power_watt | ... | cpu_model | gpu_count | gpu_model | longitude | latitude | ram_total_size | tracking_mode | on_cloud | power_usage_efficiency | offline_mode | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 2024-04-15T15:30:56 | codecarbon | 2a195f38-0473-43ef-a083-eb31df911ede | 0.139903 | 5.762182e-07 | 0.000004 | 42.5 | 0.0 | ... | 12th Gen Intel(R) Core(TM) i7-1260P | None | None | 9.7054 | 52.3872 | 15.475006 | process | N | 1.0 | 0 |
1 | 2 | 2 | 2024-04-15T15:31:02 | codecarbon | a2100083-2ed6-470c-ad61-219edc8cb79e | 0.108931 | 4.423928e-07 | 0.000004 | 42.5 | 0.0 | ... | 12th Gen Intel(R) Core(TM) i7-1260P | None | None | 9.7054 | 52.3872 | 15.475006 | process | N | 1.0 | 0 |
2 | 3 | 3 | 2024-04-15T15:31:08 | codecarbon | 38ec9b59-777a-464f-829b-748e1c361855 | 0.110439 | 4.340302e-07 | 0.000004 | 42.5 | 0.0 | ... | 12th Gen Intel(R) Core(TM) i7-1260P | None | None | 9.7054 | 52.3872 | 15.475006 | process | N | 1.0 | 0 |
3 | 4 | 4 | 2024-04-15T15:31:14 | codecarbon | 9d20ff75-5f3c-4643-aed6-f1908f642c61 | 0.101483 | 4.133265e-07 | 0.000004 | 42.5 | 0.0 | ... | 12th Gen Intel(R) Core(TM) i7-1260P | None | None | 9.7054 | 52.3872 | 15.475006 | process | N | 1.0 | 0 |
4 | 5 | 5 | 2024-04-15T15:31:21 | codecarbon | 04e05bbf-4d0a-4c34-9b23-3060b9a72533 | 0.132089 | 5.430256e-07 | 0.000004 | 42.5 | 0.0 | ... | 12th Gen Intel(R) Core(TM) i7-1260P | None | None | 9.7054 | 52.3872 | 15.475006 | process | N | 1.0 | 0 |
5 | 6 | 6 | 2024-04-15T15:31:27 | codecarbon | d89299a6-b4b1-4deb-9d76-8f00a2ba4805 | 0.128005 | 5.274309e-07 | 0.000004 | 42.5 | 0.0 | ... | 12th Gen Intel(R) Core(TM) i7-1260P | None | None | 9.7054 | 52.3872 | 15.475006 | process | N | 1.0 | 0 |
6 | 7 | 7 | 2024-04-15T15:31:33 | codecarbon | 06314180-7751-48bc-8421-12e7c0136f78 | 0.127770 | 5.259911e-07 | 0.000004 | 42.5 | 0.0 | ... | 12th Gen Intel(R) Core(TM) i7-1260P | None | None | 9.7054 | 52.3872 | 15.475006 | process | N | 1.0 | 0 |
7 | 8 | 8 | 2024-04-15T15:31:38 | codecarbon | 0ff3ba7b-fb88-41ed-adb2-6897579918ca | 0.131880 | 5.401629e-07 | 0.000004 | 42.5 | 0.0 | ... | 12th Gen Intel(R) Core(TM) i7-1260P | None | None | 9.7054 | 52.3872 | 15.475006 | process | N | 1.0 | 0 |
8 | 9 | 9 | 2024-04-15T15:31:44 | codecarbon | 68223357-4f14-4fe7-a91e-a5cc2ef58c77 | 0.136260 | 5.611917e-07 | 0.000004 | 42.5 | 0.0 | ... | 12th Gen Intel(R) Core(TM) i7-1260P | None | None | 9.7054 | 52.3872 | 15.475006 | process | N | 1.0 | 0 |
9 | 10 | 10 | 2024-04-15T15:31:50 | codecarbon | 02befad1-59ab-484f-93c1-532779f62848 | 0.116057 | 4.693639e-07 | 0.000004 | 42.5 | 0.0 | ... | 12th Gen Intel(R) Core(TM) i7-1260P | None | None | 9.7054 | 52.3872 | 15.475006 | process | N | 1.0 | 0 |
10 | 11 | 11 | 2024-04-15T15:31:56 | codecarbon | aefafab2-ddbe-4175-b786-dc445593a082 | 0.127321 | 5.264124e-07 | 0.000004 | 42.5 | 0.0 | ... | 12th Gen Intel(R) Core(TM) i7-1260P | None | None | 9.7054 | 52.3872 | 15.475006 | process | N | 1.0 | 0 |
11 | 12 | 12 | 2024-04-15T15:32:02 | codecarbon | f6c89d11-c994-4ae9-acc8-f26d01d3bff0 | 0.118765 | 4.905017e-07 | 0.000004 | 42.5 | 0.0 | ... | 12th Gen Intel(R) Core(TM) i7-1260P | None | None | 9.7054 | 52.3872 | 15.475006 | process | N | 1.0 | 0 |
12 | 13 | 13 | 2024-04-15T15:32:08 | codecarbon | 745dae32-510b-4855-87e6-c6f7d5624a7c | 0.061385 | 2.340841e-07 | 0.000004 | 42.5 | 0.0 | ... | 12th Gen Intel(R) Core(TM) i7-1260P | None | None | 9.7054 | 52.3872 | 15.475006 | process | N | 1.0 | 0 |
13 | 14 | 14 | 2024-04-15T15:32:15 | codecarbon | c373b752-5eb6-4032-9df2-3259af17e487 | 0.131715 | 5.500138e-07 | 0.000004 | 42.5 | 0.0 | ... | 12th Gen Intel(R) Core(TM) i7-1260P | None | None | 9.7054 | 52.3872 | 15.475006 | machine | N | 1.0 | 0 |
14 | 15 | 15 | 2024-04-15T15:32:21 | codecarbon | 0a130d02-6819-4d4c-bbec-2359dc4d631e | 0.081272 | 3.183783e-07 | 0.000004 | 42.5 | 0.0 | ... | 12th Gen Intel(R) Core(TM) i7-1260P | None | None | 9.7054 | 52.3872 | 15.475006 | process | N | 1.0 | 0 |
15 rows × 34 columns
Aggregating CodeCarbon Results
The carbon emission information of CodeCarbon
can be easily aggregated via pandas.Dataframe
.
[44]:
carbon_emissions = experimenter.get_codecarbon_table().groupby(['project_name']).sum(numeric_only = True)
carbon_emissions
[44]:
ID | experiment_id | duration_seconds | emissions_kg | emissions_rate_kg_sec | cpu_power_watt | gpu_power_watt | ram_power_watt | cpu_energy_kw | gpu_energy_kw | ram_energy_kw | energy_consumed_kw | cpu_count | ram_total_size | power_usage_efficiency | offline_mode | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
project_name | ||||||||||||||||
codecarbon | 120 | 120 | 1.753273 | 0.000007 | 0.000061 | 637.5 | 0.0 | 7.061411 | 0.000019 | 0.0 | 2.161534e-07 | 0.00002 | 240.0 | 232.125092 | 15.0 | 0 |
Printing CodeCarbon Results as LaTex Table
Furthermore, the resulting pandas.Dataframe
can easily be printed as LaTex table.
[45]:
print(carbon_emissions[['energy_consumed_kw', 'emissions_kg']].style.to_latex())
\begin{tabular}{lrr}
& energy_consumed_kw & emissions_kg \\
project_name & & \\
codecarbon & 0.000020 & 0.000007 \\
\end{tabular}