Example: Usage of Logtables

This example shows the usage of Logtables. We will show how one can define and fill logtables. For this example you should already understand the basic functionalities of PyExerimenter. Note that the purpose of this notebook is to demonstrate the functionalities of logtables, not to provide reasonable experiments.

To execute this notebook you need to install:

pip install py_experimenter
pip install scikit-learn

Experiment Configuration File

This notebook shows an example execution of PyExperimenter based the configuration file that is used in the general usage notebook. However, this file is slightly adapted to show the usage of logtables. The goal in this small example is to find the best kernel for an SVM on some dataset using grid search and log the performance of SVMs initialized with different kernels. Further explanation of logtables can be found in the documentation.

[1]:

import os

content = """
PY_EXPERIMENTER:
  n_jobs : 1

  Database:
    provider: sqlite
    database: py_experimenter
    table:
      name: example_logtables
      keyfields:
        dataset:
          type: VARCHAR(50)
          values: ['iris']
        cross_validation_splits:
          type: int
          values: [5]
        seed:
          type: int
          values: [1, 2, 3, 4, 5]
      result_timestamps: false
      resultfields:
        best_kernel_accuracy: VARCHAR(50)
        best_kernel_f1: VARCHAR(50)
    logtables:
      train_scores:
        f1: DOUBLE
        accuracy: DOUBLE
        kernel: VARCHAR(50)
      test_f1:
        test_f1: DOUBLE
      test_accuracy:
        test_accuracy: DOUBLE

  CUSTOM:
    path: sample_data
"""

# Create config directory if it does not exist
if not os.path.exists('config'):
    os.mkdir('config')

# Create config file
experiment_configuration_file_path = os.path.join('config', 'example_logtables.yml')
with open(experiment_configuration_file_path, "w") as f:
  f.write(content)

Defining the execution function

Next, the execution of a single experiment has to be defined. Note that this dummy example is a slightly modified version of the general usage notebook. Instead of executing with one kernel we iterate over kernels to find the best one. Additionally, the results get logged.

[2]:

import random
import numpy as np

from py_experimenter.result_processor import ResultProcessor

from sklearn.datasets import load_iris
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.model_selection import cross_validate

def run_ml(parameters: dict, result_processor: ResultProcessor, custom_config: dict):
    seed = parameters['seed']

    # Initalize variables
    performance_f1 = 0
    best_kernel_f1 = ''
    performance_accuracy = 0
    best_kernel_accuracy = ''

    for kernel in ['linear', 'poly', 'rbf', 'sigmoid']:
        # Set seed for reproducibility
        random.seed(seed)
        np.random.seed(seed)

        data = load_iris()
        X = data.data
        y = data.target

        model = make_pipeline(StandardScaler(), SVC(kernel=kernel, gamma='auto'))
        scores = cross_validate(model, X, y,
                                cv=parameters['cross_validation_splits'],
                                scoring=('accuracy', 'f1_micro'),
                                return_train_score=True
                                )

        # Log scores to logtables
        result_processor.process_logs(
            {
                'train_scores': {
                    'f1': np.mean(scores['train_f1_micro']),
                    'accuracy': np.mean(scores['train_accuracy']),
                    'kernel': "'" + kernel + "'"
                },
                'test_f1': {
                    'test_f1': np.mean(scores['test_f1_micro'])},
                'test_accuracy': {
                    'test_accuracy': np.mean(scores['test_accuracy'])},
            }
        )

        if np.mean(scores['test_f1_micro']) > performance_f1:
            performance_f1 = np.mean(scores['test_f1_micro'])
            best_kernel_f1 = kernel
        if np.mean(scores['test_accuracy']) > performance_accuracy:
            performance_accuracy = np.mean(scores['test_accuracy'])
            best_kernel_accuracy = kernel

    result_processor.process_results({
        'best_kernel_f1': best_kernel_f1,
        'best_kernel_accuracy': best_kernel_accuracy
    })

Executing PyExperimenter

Now we create a PyExperimenter object with the experiment configuration above. We also fill the database with with values from that experiment configuration file.

[3]:

from py_experimenter.experimenter import PyExperimenter

experimenter = PyExperimenter(experiment_configuration_file_path=experiment_configuration_file_path, name='example_notebook')
experimenter.fill_table_from_config()

experimenter.get_table()

2025-02-11 17:06:01,436  | py-experimenter - INFO     | Found 3 keyfields
2025-02-11 17:06:01,437  | py-experimenter - INFO     | Found 2 resultfields
2025-02-11 17:06:01,438  | py-experimenter - INFO     | Found 3 logtables
2025-02-11 17:06:01,438  | py-experimenter - INFO     | Found logtable example_logtables__train_scores
2025-02-11 17:06:01,439  | py-experimenter - INFO     | Found logtable example_logtables__test_f1
2025-02-11 17:06:01,439  | py-experimenter - INFO     | Found logtable example_logtables__test_accuracy
2025-02-11 17:06:01,439  | py-experimenter - WARNING  | No custom section defined in config
2025-02-11 17:06:01,440  | py-experimenter - WARNING  | No codecarbon section defined in config
2025-02-11 17:06:01,440  | py-experimenter - INFO     | Initialized and connected to database
2025-02-11 17:06:01,445  | py-experimenter - INFO     | 5 rows successfully added to database. 0 rows were skipped.

[3]:

	ID	dataset	cross_validation_splits	seed	creation_date	status	start_date	name	machine	best_kernel_accuracy	best_kernel_f1	end_date	error
0	1	iris	5	1	2025-02-11 17:06:01	created	None	None	None	None	None	None	None
1	2	iris	5	2	2025-02-11 17:06:01	created	None	None	None	None	None	None	None
2	3	iris	5	3	2025-02-11 17:06:01	created	None	None	None	None	None	None	None
3	4	iris	5	4	2025-02-11 17:06:01	created	None	None	None	None	None	None	None
4	5	iris	5	5	2025-02-11 17:06:01	created	None	None	None	None	None	None	None

[4]:

# Read one of the logtables
experimenter.get_logtable('train_scores')

[4]:

	ID	experiment_id	timestamp	f1	accuracy	kernel

Run Experiments

All experiments are executed sequentially by the same PyExperimenter due to max_experiments=-1 and the implicit n_jobs=1 as no amount of jobs is specified in the configuration file. If just a single one or a predifined number of experiments should be executed, the -1 has to be replaced by the corresponding amount.

The first parameter, i.e. run_ml, relates to the actual method that should be executed with the given keyfields of the table.

[5]:

experimenter.execute(run_ml, max_experiments=-1)

[codecarbon INFO @ 17:06:01] [setup] RAM Tracking...
[codecarbon INFO @ 17:06:01] [setup] GPU Tracking...
[codecarbon INFO @ 17:06:01] No GPU found.
[codecarbon INFO @ 17:06:01] [setup] CPU Tracking...
[codecarbon WARNING @ 17:06:01] No CPU tracking mode found. Falling back on CPU constant mode.
[codecarbon INFO @ 17:06:01] CPU Model on constant consumption mode: Apple M1 Pro
[codecarbon INFO @ 17:06:01] >>> Tracker's metadata:
[codecarbon INFO @ 17:06:01]   Platform system: macOS-15.3.1-arm64-arm-64bit
[codecarbon INFO @ 17:06:01]   Python version: 3.9.19
[codecarbon INFO @ 17:06:01]   CodeCarbon version: 2.3.4
[codecarbon INFO @ 17:06:01]   Available RAM : 32.000 GB
[codecarbon INFO @ 17:06:01]   CPU count: 10
[codecarbon INFO @ 17:06:01]   CPU model: Apple M1 Pro
[codecarbon INFO @ 17:06:01]   GPU count: None
[codecarbon INFO @ 17:06:01]   GPU model: None
[codecarbon INFO @ 17:06:04] Energy consumed for RAM : 0.000000 kWh. RAM Power : 12.0 W
[codecarbon INFO @ 17:06:04] Energy consumed for all CPUs : 0.000000 kWh. Total CPU Power : 5.0 W
[codecarbon INFO @ 17:06:04] 0.000000 kWh of electricity used since the beginning.
/opt/anaconda3/envs/py_experimenter/lib/python3.9/site-packages/codecarbon/output.py:168: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
[codecarbon INFO @ 17:06:04] [setup] RAM Tracking...
[codecarbon INFO @ 17:06:04] [setup] GPU Tracking...
[codecarbon INFO @ 17:06:04] No GPU found.
[codecarbon INFO @ 17:06:04] [setup] CPU Tracking...
[codecarbon WARNING @ 17:06:05] No CPU tracking mode found. Falling back on CPU constant mode.
[codecarbon INFO @ 17:06:05] CPU Model on constant consumption mode: Apple M1 Pro
[codecarbon INFO @ 17:06:05] >>> Tracker's metadata:
[codecarbon INFO @ 17:06:05]   Platform system: macOS-15.3.1-arm64-arm-64bit
[codecarbon INFO @ 17:06:05]   Python version: 3.9.19
[codecarbon INFO @ 17:06:05]   CodeCarbon version: 2.3.4
[codecarbon INFO @ 17:06:05]   Available RAM : 32.000 GB
[codecarbon INFO @ 17:06:05]   CPU count: 10
[codecarbon INFO @ 17:06:05]   CPU model: Apple M1 Pro
[codecarbon INFO @ 17:06:05]   GPU count: None
[codecarbon INFO @ 17:06:05]   GPU model: None
[codecarbon INFO @ 17:06:07] Energy consumed for RAM : 0.000000 kWh. RAM Power : 12.0 W
[codecarbon INFO @ 17:06:07] Energy consumed for all CPUs : 0.000000 kWh. Total CPU Power : 5.0 W
[codecarbon INFO @ 17:06:07] 0.000000 kWh of electricity used since the beginning.
[codecarbon INFO @ 17:06:07] [setup] RAM Tracking...
[codecarbon INFO @ 17:06:07] [setup] GPU Tracking...
[codecarbon INFO @ 17:06:07] No GPU found.
[codecarbon INFO @ 17:06:07] [setup] CPU Tracking...
[codecarbon WARNING @ 17:06:07] No CPU tracking mode found. Falling back on CPU constant mode.
[codecarbon INFO @ 17:06:07] CPU Model on constant consumption mode: Apple M1 Pro
[codecarbon INFO @ 17:06:07] >>> Tracker's metadata:
[codecarbon INFO @ 17:06:07]   Platform system: macOS-15.3.1-arm64-arm-64bit
[codecarbon INFO @ 17:06:07]   Python version: 3.9.19
[codecarbon INFO @ 17:06:07]   CodeCarbon version: 2.3.4
[codecarbon INFO @ 17:06:07]   Available RAM : 32.000 GB
[codecarbon INFO @ 17:06:07]   CPU count: 10
[codecarbon INFO @ 17:06:07]   CPU model: Apple M1 Pro
[codecarbon INFO @ 17:06:07]   GPU count: None
[codecarbon INFO @ 17:06:07]   GPU model: None
[codecarbon INFO @ 17:06:07] Energy consumed for RAM : 0.000000 kWh. RAM Power : 12.0 W
[codecarbon INFO @ 17:06:07] Energy consumed for all CPUs : 0.000000 kWh. Total CPU Power : 5.0 W
[codecarbon INFO @ 17:06:07] 0.000000 kWh of electricity used since the beginning.
[codecarbon INFO @ 17:06:07] [setup] RAM Tracking...
[codecarbon INFO @ 17:06:07] [setup] GPU Tracking...
[codecarbon INFO @ 17:06:07] No GPU found.
[codecarbon INFO @ 17:06:07] [setup] CPU Tracking...
[codecarbon WARNING @ 17:06:07] No CPU tracking mode found. Falling back on CPU constant mode.
[codecarbon INFO @ 17:06:07] CPU Model on constant consumption mode: Apple M1 Pro
[codecarbon INFO @ 17:06:07] >>> Tracker's metadata:
[codecarbon INFO @ 17:06:07]   Platform system: macOS-15.3.1-arm64-arm-64bit
[codecarbon INFO @ 17:06:07]   Python version: 3.9.19
[codecarbon INFO @ 17:06:07]   CodeCarbon version: 2.3.4
[codecarbon INFO @ 17:06:07]   Available RAM : 32.000 GB
[codecarbon INFO @ 17:06:07]   CPU count: 10
[codecarbon INFO @ 17:06:07]   CPU model: Apple M1 Pro
[codecarbon INFO @ 17:06:07]   GPU count: None
[codecarbon INFO @ 17:06:07]   GPU model: None
[codecarbon INFO @ 17:06:08] Energy consumed for RAM : 0.000000 kWh. RAM Power : 12.0 W
[codecarbon INFO @ 17:06:08] Energy consumed for all CPUs : 0.000000 kWh. Total CPU Power : 5.0 W
[codecarbon INFO @ 17:06:08] 0.000000 kWh of electricity used since the beginning.
[codecarbon INFO @ 17:06:08] [setup] RAM Tracking...
[codecarbon INFO @ 17:06:08] [setup] GPU Tracking...
[codecarbon INFO @ 17:06:08] No GPU found.
[codecarbon INFO @ 17:06:08] [setup] CPU Tracking...
[codecarbon WARNING @ 17:06:08] No CPU tracking mode found. Falling back on CPU constant mode.
[codecarbon INFO @ 17:06:08] CPU Model on constant consumption mode: Apple M1 Pro
[codecarbon INFO @ 17:06:08] >>> Tracker's metadata:
[codecarbon INFO @ 17:06:08]   Platform system: macOS-15.3.1-arm64-arm-64bit
[codecarbon INFO @ 17:06:08]   Python version: 3.9.19
[codecarbon INFO @ 17:06:08]   CodeCarbon version: 2.3.4
[codecarbon INFO @ 17:06:08]   Available RAM : 32.000 GB
[codecarbon INFO @ 17:06:08]   CPU count: 10
[codecarbon INFO @ 17:06:08]   CPU model: Apple M1 Pro
[codecarbon INFO @ 17:06:08]   GPU count: None
[codecarbon INFO @ 17:06:08]   GPU model: None
[codecarbon INFO @ 17:06:08] Energy consumed for RAM : 0.000000 kWh. RAM Power : 12.0 W
[codecarbon INFO @ 17:06:08] Energy consumed for all CPUs : 0.000000 kWh. Total CPU Power : 5.0 W
[codecarbon INFO @ 17:06:08] 0.000000 kWh of electricity used since the beginning.
2025-02-11 17:06:08,619  | py-experimenter - INFO     | All configured executions finished.

Check Results

The content of all database tables having keyfields and resultfields, as well as every logtable can be easily obtained.

[6]:

experimenter.get_table()

[6]:

	ID	dataset	cross_validation_splits	seed	creation_date	status	start_date	name	machine	best_kernel_accuracy	best_kernel_f1	end_date	error
0	1	iris	5	1	2025-02-11 17:06:01	done	2025-02-11 17:06:01	example_notebook	lukass-MacBook-Pro.local	linear	linear	2025-02-11 17:06:04	None
1	2	iris	5	2	2025-02-11 17:06:01	done	2025-02-11 17:06:04	example_notebook	lukass-MacBook-Pro.local	linear	linear	2025-02-11 17:06:07	None
2	3	iris	5	3	2025-02-11 17:06:01	done	2025-02-11 17:06:07	example_notebook	lukass-MacBook-Pro.local	linear	linear	2025-02-11 17:06:07	None
3	4	iris	5	4	2025-02-11 17:06:01	done	2025-02-11 17:06:07	example_notebook	lukass-MacBook-Pro.local	linear	linear	2025-02-11 17:06:08	None
4	5	iris	5	5	2025-02-11 17:06:01	done	2025-02-11 17:06:08	example_notebook	lukass-MacBook-Pro.local	linear	linear	2025-02-11 17:06:08	None

[7]:

experimenter.get_logtable('train_scores')

[7]:

	ID	experiment_id	timestamp	f1	accuracy	kernel
0	1	1	2025-02-11 17:06:04	0.971667	0.971667	'linear'
1	2	1	2025-02-11 17:06:04	0.936667	0.936667	'poly'
2	3	1	2025-02-11 17:06:04	0.975000	0.975000	'rbf'
3	4	1	2025-02-11 17:06:04	0.896667	0.896667	'sigmoid'
4	5	2	2025-02-11 17:06:07	0.971667	0.971667	'linear'
5	6	2	2025-02-11 17:06:07	0.936667	0.936667	'poly'
6	7	2	2025-02-11 17:06:07	0.975000	0.975000	'rbf'
7	8	2	2025-02-11 17:06:07	0.896667	0.896667	'sigmoid'
8	9	3	2025-02-11 17:06:07	0.971667	0.971667	'linear'
9	10	3	2025-02-11 17:06:07	0.936667	0.936667	'poly'
10	11	3	2025-02-11 17:06:07	0.975000	0.975000	'rbf'
11	12	3	2025-02-11 17:06:07	0.896667	0.896667	'sigmoid'
12	13	4	2025-02-11 17:06:08	0.971667	0.971667	'linear'
13	14	4	2025-02-11 17:06:08	0.936667	0.936667	'poly'
14	15	4	2025-02-11 17:06:08	0.975000	0.975000	'rbf'
15	16	4	2025-02-11 17:06:08	0.896667	0.896667	'sigmoid'
16	17	5	2025-02-11 17:06:08	0.971667	0.971667	'linear'
17	18	5	2025-02-11 17:06:08	0.936667	0.936667	'poly'
18	19	5	2025-02-11 17:06:08	0.975000	0.975000	'rbf'
19	20	5	2025-02-11 17:06:08	0.896667	0.896667	'sigmoid'

[8]:

experimenter.get_logtable('test_f1')

[8]:

	ID	experiment_id	timestamp	test_f1
0	1	1	2025-02-11 17:06:04	0.966667
1	2	1	2025-02-11 17:06:04	0.933333
2	3	1	2025-02-11 17:06:04	0.966667
3	4	1	2025-02-11 17:06:04	0.893333
4	5	2	2025-02-11 17:06:07	0.966667
5	6	2	2025-02-11 17:06:07	0.933333
6	7	2	2025-02-11 17:06:07	0.966667
7	8	2	2025-02-11 17:06:07	0.893333
8	9	3	2025-02-11 17:06:07	0.966667
9	10	3	2025-02-11 17:06:07	0.933333
10	11	3	2025-02-11 17:06:07	0.966667
11	12	3	2025-02-11 17:06:07	0.893333
12	13	4	2025-02-11 17:06:08	0.966667
13	14	4	2025-02-11 17:06:08	0.933333
14	15	4	2025-02-11 17:06:08	0.966667
15	16	4	2025-02-11 17:06:08	0.893333
16	17	5	2025-02-11 17:06:08	0.966667
17	18	5	2025-02-11 17:06:08	0.933333
18	19	5	2025-02-11 17:06:08	0.966667
19	20	5	2025-02-11 17:06:08	0.893333

[9]:

experimenter.get_logtable('test_accuracy')

[9]:

	ID	experiment_id	timestamp	test_accuracy
0	1	1	2025-02-11 17:06:04	0.966667
1	2	1	2025-02-11 17:06:04	0.933333
2	3	1	2025-02-11 17:06:04	0.966667
3	4	1	2025-02-11 17:06:04	0.893333
4	5	2	2025-02-11 17:06:07	0.966667
5	6	2	2025-02-11 17:06:07	0.933333
6	7	2	2025-02-11 17:06:07	0.966667
7	8	2	2025-02-11 17:06:07	0.893333
8	9	3	2025-02-11 17:06:07	0.966667
9	10	3	2025-02-11 17:06:07	0.933333
10	11	3	2025-02-11 17:06:07	0.966667
11	12	3	2025-02-11 17:06:07	0.893333
12	13	4	2025-02-11 17:06:08	0.966667
13	14	4	2025-02-11 17:06:08	0.933333
14	15	4	2025-02-11 17:06:08	0.966667
15	16	4	2025-02-11 17:06:08	0.893333
16	17	5	2025-02-11 17:06:08	0.966667
17	18	5	2025-02-11 17:06:08	0.933333
18	19	5	2025-02-11 17:06:08	0.966667
19	20	5	2025-02-11 17:06:08	0.893333

CodeCarbon

Note that CodeCarbon is activated by default, collecting information about the carbon emissions of each experiment. Have a look at our general usage example and the according documentation of CodeCarbon fields for more information.