Example: Usage of Logtables
This example shows the usage of Logtables
. We will show how one can define and fill logtables. For this example you should already understand the basic functionalities of PyExerimenter
. Note that the purpose of this notebook is to demonstrate the functionalities of logtables, not to provide reasonable experiments.
To execute this notebook you need to install:
pip install py_experimenter
pip install scikit-learn
Experiment Configuration File
This notebook shows an example execution of PyExperimenter
based the configuration file that is used in the general usage notebook. However, this file is slightly adapted to show the usage of logtables. The goal in this small example is to find the best kernel for an SVM on some dataset using grid search and log the performance of SVMs initialized with different kernels. Further explanation of logtables can
be found in the documentation.
[1]:
import os
content = """
PY_EXPERIMENTER:
n_jobs : 1
Database:
provider: sqlite
database: py_experimenter
table:
name: example_logtables
keyfields:
dataset:
type: VARCHAR(50)
values: ['iris']
cross_validation_splits:
type: int
values: [5]
seed:
type: int
values: [1, 2, 3, 4, 5]
result_timestamps: false
resultfields:
best_kernel_accuracy: VARCHAR(50)
best_kernel_f1: VARCHAR(50)
logtables:
train_scores:
f1: DOUBLE
accuracy: DOUBLE
kernel: VARCHAR(50)
test_f1:
test_f1: DOUBLE
test_accuracy:
test_accuracy: DOUBLE
CUSTOM:
path: sample_data
"""
# Create config directory if it does not exist
if not os.path.exists('config'):
os.mkdir('config')
# Create config file
experiment_configuration_file_path = os.path.join('config', 'example_logtables.yml')
with open(experiment_configuration_file_path, "w") as f:
f.write(content)
Defining the execution function
Next, the execution of a single experiment has to be defined. Note that this dummy example is a slightly modified version of the general usage notebook. Instead of executing with one kernel we iterate over kernels to find the best one. Additionally, the results get logged.
[2]:
import random
import numpy as np
from py_experimenter.result_processor import ResultProcessor
from sklearn.datasets import load_iris
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.model_selection import cross_validate
def run_ml(parameters: dict, result_processor: ResultProcessor, custom_config: dict):
seed = parameters['seed']
# Initalize variables
performance_f1 = 0
best_kernel_f1 = ''
performance_accuracy = 0
best_kernel_accuracy = ''
for kernel in ['linear', 'poly', 'rbf', 'sigmoid']:
# Set seed for reproducibility
random.seed(seed)
np.random.seed(seed)
data = load_iris()
X = data.data
y = data.target
model = make_pipeline(StandardScaler(), SVC(kernel=kernel, gamma='auto'))
scores = cross_validate(model, X, y,
cv=parameters['cross_validation_splits'],
scoring=('accuracy', 'f1_micro'),
return_train_score=True
)
# Log scores to logtables
result_processor.process_logs(
{
'train_scores': {
'f1': np.mean(scores['train_f1_micro']),
'accuracy': np.mean(scores['train_accuracy']),
'kernel': "'" + kernel + "'"
},
'test_f1': {
'test_f1': np.mean(scores['test_f1_micro'])},
'test_accuracy': {
'test_accuracy': np.mean(scores['test_accuracy'])},
}
)
if np.mean(scores['test_f1_micro']) > performance_f1:
performance_f1 = np.mean(scores['test_f1_micro'])
best_kernel_f1 = kernel
if np.mean(scores['test_accuracy']) > performance_accuracy:
performance_accuracy = np.mean(scores['test_accuracy'])
best_kernel_accuracy = kernel
result_processor.process_results({
'best_kernel_f1': best_kernel_f1,
'best_kernel_accuracy': best_kernel_accuracy
})
Executing PyExperimenter
Now we create a PyExperimenter
object with the experiment configuration above. We also fill the database with with values from that experiment configuration file.
[3]:
from py_experimenter.experimenter import PyExperimenter
experimenter = PyExperimenter(experiment_configuration_file_path=experiment_configuration_file_path, name='example_notebook')
experimenter.fill_table_from_config()
experimenter.get_table()
2024-03-11 08:18:13,492 | py-experimenter - INFO | Found 3 keyfields
2024-03-11 08:18:13,493 | py-experimenter - INFO | Found 2 resultfields
2024-03-11 08:18:13,494 | py-experimenter - INFO | Found 3 logtables
2024-03-11 08:18:13,495 | py-experimenter - INFO | Found logtable example_logtables__train_scores
2024-03-11 08:18:13,496 | py-experimenter - INFO | Found logtable example_logtables__test_f1
2024-03-11 08:18:13,496 | py-experimenter - INFO | Found logtable example_logtables__test_accuracy
2024-03-11 08:18:13,497 | py-experimenter - WARNING | No custom section defined in config
2024-03-11 08:18:13,498 | py-experimenter - WARNING | No codecarbon section defined in config
2024-03-11 08:18:13,499 | py-experimenter - INFO | Initialized and connected to database
2024-03-11 08:18:13,595 | py-experimenter - INFO | 5 rows successfully added to database. 0 rows were skipped.
[3]:
ID | dataset | cross_validation_splits | seed | creation_date | status | start_date | name | machine | best_kernel_accuracy | best_kernel_f1 | end_date | error | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | iris | 5 | 1 | 2024-03-11 08:18:13 | created | None | None | None | None | None | None | None |
1 | 2 | iris | 5 | 2 | 2024-03-11 08:18:13 | created | None | None | None | None | None | None | None |
2 | 3 | iris | 5 | 3 | 2024-03-11 08:18:13 | created | None | None | None | None | None | None | None |
3 | 4 | iris | 5 | 4 | 2024-03-11 08:18:13 | created | None | None | None | None | None | None | None |
4 | 5 | iris | 5 | 5 | 2024-03-11 08:18:13 | created | None | None | None | None | None | None | None |
[4]:
# Read one of the logtables
experimenter.get_logtable('train_scores')
[4]:
ID | experiment_id | timestamp | f1 | accuracy | kernel |
---|
Run Experiments
All experiments are executed sequentially by the same PyExperimenter
due to max_experiments=-1
and the implicit n_jobs=1
as no amount of jobs is specified in the configuration file. If just a single one or a predifined number of experiments should be executed, the -1
has to be replaced by the corresponding amount.
The first parameter, i.e. run_ml
, relates to the actual method that should be executed with the given keyfields of the table.
[5]:
experimenter.execute(run_ml, max_experiments=-1)
[codecarbon INFO @ 08:18:13] [setup] RAM Tracking...
[codecarbon INFO @ 08:18:13] [setup] GPU Tracking...
[codecarbon INFO @ 08:18:13] No GPU found.
[codecarbon INFO @ 08:18:13] [setup] CPU Tracking...
[codecarbon WARNING @ 08:18:13] No CPU tracking mode found. Falling back on CPU constant mode.
[codecarbon WARNING @ 08:18:14] We saw that you have a 12th Gen Intel(R) Core(TM) i7-1260P but we don't know it. Please contact us.
[codecarbon INFO @ 08:18:14] CPU Model on constant consumption mode: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 08:18:14] >>> Tracker's metadata:
[codecarbon INFO @ 08:18:14] Platform system: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
[codecarbon INFO @ 08:18:14] Python version: 3.9.0
[codecarbon INFO @ 08:18:14] CodeCarbon version: 2.3.4
[codecarbon INFO @ 08:18:14] Available RAM : 15.475 GB
[codecarbon INFO @ 08:18:14] CPU count: 16
[codecarbon INFO @ 08:18:14] CPU model: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 08:18:14] GPU count: None
[codecarbon INFO @ 08:18:14] GPU model: None
[codecarbon INFO @ 08:18:18] Energy consumed for RAM : 0.000000 kWh. RAM Power : 5.803128719329834 W
[codecarbon INFO @ 08:18:18] Energy consumed for all CPUs : 0.000002 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 08:18:18] 0.000003 kWh of electricity used since the beginning.
[codecarbon INFO @ 08:18:18] [setup] RAM Tracking...
[codecarbon INFO @ 08:18:18] [setup] GPU Tracking...
[codecarbon INFO @ 08:18:18] No GPU found.
[codecarbon INFO @ 08:18:18] [setup] CPU Tracking...
[codecarbon WARNING @ 08:18:18] No CPU tracking mode found. Falling back on CPU constant mode.
[codecarbon WARNING @ 08:18:19] We saw that you have a 12th Gen Intel(R) Core(TM) i7-1260P but we don't know it. Please contact us.
[codecarbon INFO @ 08:18:19] CPU Model on constant consumption mode: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 08:18:19] >>> Tracker's metadata:
[codecarbon INFO @ 08:18:19] Platform system: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
[codecarbon INFO @ 08:18:19] Python version: 3.9.0
[codecarbon INFO @ 08:18:19] CodeCarbon version: 2.3.4
[codecarbon INFO @ 08:18:19] Available RAM : 15.475 GB
[codecarbon INFO @ 08:18:19] CPU count: 16
[codecarbon INFO @ 08:18:19] CPU model: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 08:18:19] GPU count: None
[codecarbon INFO @ 08:18:19] GPU model: None
[codecarbon INFO @ 08:18:22] Energy consumed for RAM : 0.000000 kWh. RAM Power : 5.803128719329834 W
[codecarbon INFO @ 08:18:22] Energy consumed for all CPUs : 0.000002 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 08:18:22] 0.000002 kWh of electricity used since the beginning.
[codecarbon INFO @ 08:18:22] [setup] RAM Tracking...
[codecarbon INFO @ 08:18:22] [setup] GPU Tracking...
[codecarbon INFO @ 08:18:22] No GPU found.
[codecarbon INFO @ 08:18:22] [setup] CPU Tracking...
[codecarbon WARNING @ 08:18:22] No CPU tracking mode found. Falling back on CPU constant mode.
[codecarbon WARNING @ 08:18:23] We saw that you have a 12th Gen Intel(R) Core(TM) i7-1260P but we don't know it. Please contact us.
[codecarbon INFO @ 08:18:23] CPU Model on constant consumption mode: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 08:18:23] >>> Tracker's metadata:
[codecarbon INFO @ 08:18:23] Platform system: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
[codecarbon INFO @ 08:18:23] Python version: 3.9.0
[codecarbon INFO @ 08:18:23] CodeCarbon version: 2.3.4
[codecarbon INFO @ 08:18:23] Available RAM : 15.475 GB
[codecarbon INFO @ 08:18:23] CPU count: 16
[codecarbon INFO @ 08:18:23] CPU model: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 08:18:23] GPU count: None
[codecarbon INFO @ 08:18:23] GPU model: None
[codecarbon INFO @ 08:18:27] Energy consumed for RAM : 0.000000 kWh. RAM Power : 5.803128719329834 W
[codecarbon INFO @ 08:18:27] Energy consumed for all CPUs : 0.000002 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 08:18:27] 0.000002 kWh of electricity used since the beginning.
[codecarbon INFO @ 08:18:27] [setup] RAM Tracking...
[codecarbon INFO @ 08:18:27] [setup] GPU Tracking...
[codecarbon INFO @ 08:18:27] No GPU found.
[codecarbon INFO @ 08:18:27] [setup] CPU Tracking...
[codecarbon WARNING @ 08:18:27] No CPU tracking mode found. Falling back on CPU constant mode.
[codecarbon WARNING @ 08:18:28] We saw that you have a 12th Gen Intel(R) Core(TM) i7-1260P but we don't know it. Please contact us.
[codecarbon INFO @ 08:18:28] CPU Model on constant consumption mode: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 08:18:28] >>> Tracker's metadata:
[codecarbon INFO @ 08:18:28] Platform system: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
[codecarbon INFO @ 08:18:28] Python version: 3.9.0
[codecarbon INFO @ 08:18:28] CodeCarbon version: 2.3.4
[codecarbon INFO @ 08:18:28] Available RAM : 15.475 GB
[codecarbon INFO @ 08:18:28] CPU count: 16
[codecarbon INFO @ 08:18:28] CPU model: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 08:18:28] GPU count: None
[codecarbon INFO @ 08:18:28] GPU model: None
[codecarbon INFO @ 08:18:31] Energy consumed for RAM : 0.000000 kWh. RAM Power : 5.803128719329834 W
[codecarbon INFO @ 08:18:31] Energy consumed for all CPUs : 0.000002 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 08:18:31] 0.000002 kWh of electricity used since the beginning.
[codecarbon INFO @ 08:18:31] [setup] RAM Tracking...
[codecarbon INFO @ 08:18:31] [setup] GPU Tracking...
[codecarbon INFO @ 08:18:31] No GPU found.
[codecarbon INFO @ 08:18:31] [setup] CPU Tracking...
[codecarbon WARNING @ 08:18:31] No CPU tracking mode found. Falling back on CPU constant mode.
[codecarbon WARNING @ 08:18:32] We saw that you have a 12th Gen Intel(R) Core(TM) i7-1260P but we don't know it. Please contact us.
[codecarbon INFO @ 08:18:32] CPU Model on constant consumption mode: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 08:18:32] >>> Tracker's metadata:
[codecarbon INFO @ 08:18:32] Platform system: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
[codecarbon INFO @ 08:18:32] Python version: 3.9.0
[codecarbon INFO @ 08:18:32] CodeCarbon version: 2.3.4
[codecarbon INFO @ 08:18:32] Available RAM : 15.475 GB
[codecarbon INFO @ 08:18:32] CPU count: 16
[codecarbon INFO @ 08:18:32] CPU model: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 08:18:32] GPU count: None
[codecarbon INFO @ 08:18:32] GPU model: None
[codecarbon INFO @ 08:18:35] Energy consumed for RAM : 0.000000 kWh. RAM Power : 5.803128719329834 W
[codecarbon INFO @ 08:18:35] Energy consumed for all CPUs : 0.000002 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 08:18:35] 0.000002 kWh of electricity used since the beginning.
2024-03-11 08:18:36,076 | py-experimenter - INFO | All configured executions finished.
Check Results
The content of all database tables having keyfields and resultfields, as well as every logtable can be easily obtained.
[6]:
experimenter.get_table()
[6]:
ID | dataset | cross_validation_splits | seed | creation_date | status | start_date | name | machine | best_kernel_accuracy | best_kernel_f1 | end_date | error | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | iris | 5 | 1 | 2024-03-11 08:18:13 | done | 2024-03-11 08:18:13 | example_notebook | Worklaptop | linear | linear | 2024-03-11 08:18:18 | None |
1 | 2 | iris | 5 | 2 | 2024-03-11 08:18:13 | done | 2024-03-11 08:18:18 | example_notebook | Worklaptop | linear | linear | 2024-03-11 08:18:22 | None |
2 | 3 | iris | 5 | 3 | 2024-03-11 08:18:13 | done | 2024-03-11 08:18:22 | example_notebook | Worklaptop | linear | linear | 2024-03-11 08:18:27 | None |
3 | 4 | iris | 5 | 4 | 2024-03-11 08:18:13 | done | 2024-03-11 08:18:27 | example_notebook | Worklaptop | linear | linear | 2024-03-11 08:18:31 | None |
4 | 5 | iris | 5 | 5 | 2024-03-11 08:18:13 | done | 2024-03-11 08:18:31 | example_notebook | Worklaptop | linear | linear | 2024-03-11 08:18:35 | None |
[7]:
experimenter.get_logtable('train_scores')
[7]:
ID | experiment_id | timestamp | f1 | accuracy | kernel | |
---|---|---|---|---|---|---|
0 | 1 | 1 | 2024-03-11 08:18:17 | 0.971667 | 0.971667 | 'linear' |
1 | 2 | 1 | 2024-03-11 08:18:18 | 0.936667 | 0.936667 | 'poly' |
2 | 3 | 1 | 2024-03-11 08:18:18 | 0.975000 | 0.975000 | 'rbf' |
3 | 4 | 1 | 2024-03-11 08:18:18 | 0.896667 | 0.896667 | 'sigmoid' |
4 | 5 | 2 | 2024-03-11 08:18:22 | 0.971667 | 0.971667 | 'linear' |
5 | 6 | 2 | 2024-03-11 08:18:22 | 0.936667 | 0.936667 | 'poly' |
6 | 7 | 2 | 2024-03-11 08:18:22 | 0.975000 | 0.975000 | 'rbf' |
7 | 8 | 2 | 2024-03-11 08:18:22 | 0.896667 | 0.896667 | 'sigmoid' |
8 | 9 | 3 | 2024-03-11 08:18:26 | 0.971667 | 0.971667 | 'linear' |
9 | 10 | 3 | 2024-03-11 08:18:26 | 0.936667 | 0.936667 | 'poly' |
10 | 11 | 3 | 2024-03-11 08:18:26 | 0.975000 | 0.975000 | 'rbf' |
11 | 12 | 3 | 2024-03-11 08:18:27 | 0.896667 | 0.896667 | 'sigmoid' |
12 | 13 | 4 | 2024-03-11 08:18:31 | 0.971667 | 0.971667 | 'linear' |
13 | 14 | 4 | 2024-03-11 08:18:31 | 0.936667 | 0.936667 | 'poly' |
14 | 15 | 4 | 2024-03-11 08:18:31 | 0.975000 | 0.975000 | 'rbf' |
15 | 16 | 4 | 2024-03-11 08:18:31 | 0.896667 | 0.896667 | 'sigmoid' |
16 | 17 | 5 | 2024-03-11 08:18:35 | 0.971667 | 0.971667 | 'linear' |
17 | 18 | 5 | 2024-03-11 08:18:35 | 0.936667 | 0.936667 | 'poly' |
18 | 19 | 5 | 2024-03-11 08:18:35 | 0.975000 | 0.975000 | 'rbf' |
19 | 20 | 5 | 2024-03-11 08:18:35 | 0.896667 | 0.896667 | 'sigmoid' |
[8]:
experimenter.get_logtable('test_f1')
[8]:
ID | experiment_id | timestamp | test_f1 | |
---|---|---|---|---|
0 | 1 | 1 | 2024-03-11 08:18:17 | 0.966667 |
1 | 2 | 1 | 2024-03-11 08:18:18 | 0.933333 |
2 | 3 | 1 | 2024-03-11 08:18:18 | 0.966667 |
3 | 4 | 1 | 2024-03-11 08:18:18 | 0.893333 |
4 | 5 | 2 | 2024-03-11 08:18:22 | 0.966667 |
5 | 6 | 2 | 2024-03-11 08:18:22 | 0.933333 |
6 | 7 | 2 | 2024-03-11 08:18:22 | 0.966667 |
7 | 8 | 2 | 2024-03-11 08:18:22 | 0.893333 |
8 | 9 | 3 | 2024-03-11 08:18:26 | 0.966667 |
9 | 10 | 3 | 2024-03-11 08:18:26 | 0.933333 |
10 | 11 | 3 | 2024-03-11 08:18:26 | 0.966667 |
11 | 12 | 3 | 2024-03-11 08:18:27 | 0.893333 |
12 | 13 | 4 | 2024-03-11 08:18:31 | 0.966667 |
13 | 14 | 4 | 2024-03-11 08:18:31 | 0.933333 |
14 | 15 | 4 | 2024-03-11 08:18:31 | 0.966667 |
15 | 16 | 4 | 2024-03-11 08:18:31 | 0.893333 |
16 | 17 | 5 | 2024-03-11 08:18:35 | 0.966667 |
17 | 18 | 5 | 2024-03-11 08:18:35 | 0.933333 |
18 | 19 | 5 | 2024-03-11 08:18:35 | 0.966667 |
19 | 20 | 5 | 2024-03-11 08:18:35 | 0.893333 |
[9]:
experimenter.get_logtable('test_accuracy')
[9]:
ID | experiment_id | timestamp | test_accuracy | |
---|---|---|---|---|
0 | 1 | 1 | 2024-03-11 08:18:17 | 0.966667 |
1 | 2 | 1 | 2024-03-11 08:18:18 | 0.933333 |
2 | 3 | 1 | 2024-03-11 08:18:18 | 0.966667 |
3 | 4 | 1 | 2024-03-11 08:18:18 | 0.893333 |
4 | 5 | 2 | 2024-03-11 08:18:22 | 0.966667 |
5 | 6 | 2 | 2024-03-11 08:18:22 | 0.933333 |
6 | 7 | 2 | 2024-03-11 08:18:22 | 0.966667 |
7 | 8 | 2 | 2024-03-11 08:18:22 | 0.893333 |
8 | 9 | 3 | 2024-03-11 08:18:26 | 0.966667 |
9 | 10 | 3 | 2024-03-11 08:18:26 | 0.933333 |
10 | 11 | 3 | 2024-03-11 08:18:26 | 0.966667 |
11 | 12 | 3 | 2024-03-11 08:18:27 | 0.893333 |
12 | 13 | 4 | 2024-03-11 08:18:31 | 0.966667 |
13 | 14 | 4 | 2024-03-11 08:18:31 | 0.933333 |
14 | 15 | 4 | 2024-03-11 08:18:31 | 0.966667 |
15 | 16 | 4 | 2024-03-11 08:18:31 | 0.893333 |
16 | 17 | 5 | 2024-03-11 08:18:35 | 0.966667 |
17 | 18 | 5 | 2024-03-11 08:18:35 | 0.933333 |
18 | 19 | 5 | 2024-03-11 08:18:35 | 0.966667 |
19 | 20 | 5 | 2024-03-11 08:18:35 | 0.893333 |
CodeCarbon
Note that CodeCarbon
is activated by default, collecting information about the carbon emissions of each experiment. Have a look at our general usage example and the according documentation of CodeCarbon fields for more information.