Example: Usage of Logtables

This example shows the usage of Logtables. We will show how one can define and fill logtables. For this example you should already understand the basic functionalities of PyExerimenter. Note that the purpose of this notebook is to demonstrate the functionalities of logtables, not to provide reasonable experiments.

To execute this notebook you need to install:

pip install py_experimenter
pip install scikit-learn

Experiment Configuration File

This notebook shows an example execution of PyExperimenter based the configuration file that is used in the general usage notebook. However, this file is slightly adapted to show the usage of logtables. The goal in this small example is to find the best kernel for an SVM on some dataset using grid search and log the performance of SVMs initialized with different kernels. Further explanation of logtables can be found in the documentation.

[1]:
import os

content = """
PY_EXPERIMENTER:
  n_jobs : 1

  Database:
    provider: sqlite
    database: py_experimenter
    table:
      name: example_logtables
      keyfields:
        dataset:
          type: VARCHAR(50)
          values: ['iris']
        cross_validation_splits:
          type: int
          values: [5]
        seed:
          type: int
          values: [1, 2, 3, 4, 5]
      result_timestamps: false
      resultfields:
        best_kernel_accuracy: VARCHAR(50)
        best_kernel_f1: VARCHAR(50)
    logtables:
      train_scores:
        f1: DOUBLE
        accuracy: DOUBLE
        kernel: VARCHAR(50)
      test_f1:
        test_f1: DOUBLE
      test_accuracy:
        test_accuracy: DOUBLE

  CUSTOM:
    path: sample_data
"""

# Create config directory if it does not exist
if not os.path.exists('config'):
    os.mkdir('config')

# Create config file
experiment_configuration_file_path = os.path.join('config', 'example_logtables.yml')
with open(experiment_configuration_file_path, "w") as f:
  f.write(content)

Defining the execution function

Next, the execution of a single experiment has to be defined. Note that this dummy example is a slightly modified version of the general usage notebook. Instead of executing with one kernel we iterate over kernels to find the best one. Additionally, the results get logged.

[2]:
import random
import numpy as np

from py_experimenter.result_processor import ResultProcessor

from sklearn.datasets import load_iris
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.model_selection import cross_validate

def run_ml(parameters: dict, result_processor: ResultProcessor, custom_config: dict):
    seed = parameters['seed']

    # Initalize variables
    performance_f1 = 0
    best_kernel_f1 = ''
    performance_accuracy = 0
    best_kernel_accuracy = ''

    for kernel in ['linear', 'poly', 'rbf', 'sigmoid']:
        # Set seed for reproducibility
        random.seed(seed)
        np.random.seed(seed)

        data = load_iris()
        X = data.data
        y = data.target

        model = make_pipeline(StandardScaler(), SVC(kernel=kernel, gamma='auto'))
        scores = cross_validate(model, X, y,
                                cv=parameters['cross_validation_splits'],
                                scoring=('accuracy', 'f1_micro'),
                                return_train_score=True
                                )

        # Log scores to logtables
        result_processor.process_logs(
            {
                'train_scores': {
                    'f1': np.mean(scores['train_f1_micro']),
                    'accuracy': np.mean(scores['train_accuracy']),
                    'kernel': "'" + kernel + "'"
                },
                'test_f1': {
                    'test_f1': np.mean(scores['test_f1_micro'])},
                'test_accuracy': {
                    'test_accuracy': np.mean(scores['test_accuracy'])},
            }
        )

        if np.mean(scores['test_f1_micro']) > performance_f1:
            performance_f1 = np.mean(scores['test_f1_micro'])
            best_kernel_f1 = kernel
        if np.mean(scores['test_accuracy']) > performance_accuracy:
            performance_accuracy = np.mean(scores['test_accuracy'])
            best_kernel_accuracy = kernel

    result_processor.process_results({
        'best_kernel_f1': best_kernel_f1,
        'best_kernel_accuracy': best_kernel_accuracy
    })

Executing PyExperimenter

Now we create a PyExperimenter object with the experiment configuration above. We also fill the database with with values from that experiment configuration file.

[3]:
from py_experimenter.experimenter import PyExperimenter

experimenter = PyExperimenter(experiment_configuration_file_path=experiment_configuration_file_path, name='example_notebook')
experimenter.fill_table_from_config()

experimenter.get_table()
2024-03-11 08:18:13,492  | py-experimenter - INFO     | Found 3 keyfields
2024-03-11 08:18:13,493  | py-experimenter - INFO     | Found 2 resultfields
2024-03-11 08:18:13,494  | py-experimenter - INFO     | Found 3 logtables
2024-03-11 08:18:13,495  | py-experimenter - INFO     | Found logtable example_logtables__train_scores
2024-03-11 08:18:13,496  | py-experimenter - INFO     | Found logtable example_logtables__test_f1
2024-03-11 08:18:13,496  | py-experimenter - INFO     | Found logtable example_logtables__test_accuracy
2024-03-11 08:18:13,497  | py-experimenter - WARNING  | No custom section defined in config
2024-03-11 08:18:13,498  | py-experimenter - WARNING  | No codecarbon section defined in config
2024-03-11 08:18:13,499  | py-experimenter - INFO     | Initialized and connected to database
2024-03-11 08:18:13,595  | py-experimenter - INFO     | 5 rows successfully added to database. 0 rows were skipped.
[3]:
ID dataset cross_validation_splits seed creation_date status start_date name machine best_kernel_accuracy best_kernel_f1 end_date error
0 1 iris 5 1 2024-03-11 08:18:13 created None None None None None None None
1 2 iris 5 2 2024-03-11 08:18:13 created None None None None None None None
2 3 iris 5 3 2024-03-11 08:18:13 created None None None None None None None
3 4 iris 5 4 2024-03-11 08:18:13 created None None None None None None None
4 5 iris 5 5 2024-03-11 08:18:13 created None None None None None None None
[4]:
# Read one of the logtables
experimenter.get_logtable('train_scores')
[4]:
ID experiment_id timestamp f1 accuracy kernel

Run Experiments

All experiments are executed sequentially by the same PyExperimenter due to max_experiments=-1 and the implicit n_jobs=1 as no amount of jobs is specified in the configuration file. If just a single one or a predifined number of experiments should be executed, the -1 has to be replaced by the corresponding amount.

The first parameter, i.e. run_ml, relates to the actual method that should be executed with the given keyfields of the table.

[5]:
experimenter.execute(run_ml, max_experiments=-1)
[codecarbon INFO @ 08:18:13] [setup] RAM Tracking...
[codecarbon INFO @ 08:18:13] [setup] GPU Tracking...
[codecarbon INFO @ 08:18:13] No GPU found.
[codecarbon INFO @ 08:18:13] [setup] CPU Tracking...
[codecarbon WARNING @ 08:18:13] No CPU tracking mode found. Falling back on CPU constant mode.
[codecarbon WARNING @ 08:18:14] We saw that you have a 12th Gen Intel(R) Core(TM) i7-1260P but we don't know it. Please contact us.
[codecarbon INFO @ 08:18:14] CPU Model on constant consumption mode: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 08:18:14] >>> Tracker's metadata:
[codecarbon INFO @ 08:18:14]   Platform system: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
[codecarbon INFO @ 08:18:14]   Python version: 3.9.0
[codecarbon INFO @ 08:18:14]   CodeCarbon version: 2.3.4
[codecarbon INFO @ 08:18:14]   Available RAM : 15.475 GB
[codecarbon INFO @ 08:18:14]   CPU count: 16
[codecarbon INFO @ 08:18:14]   CPU model: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 08:18:14]   GPU count: None
[codecarbon INFO @ 08:18:14]   GPU model: None
[codecarbon INFO @ 08:18:18] Energy consumed for RAM : 0.000000 kWh. RAM Power : 5.803128719329834 W
[codecarbon INFO @ 08:18:18] Energy consumed for all CPUs : 0.000002 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 08:18:18] 0.000003 kWh of electricity used since the beginning.
[codecarbon INFO @ 08:18:18] [setup] RAM Tracking...
[codecarbon INFO @ 08:18:18] [setup] GPU Tracking...
[codecarbon INFO @ 08:18:18] No GPU found.
[codecarbon INFO @ 08:18:18] [setup] CPU Tracking...
[codecarbon WARNING @ 08:18:18] No CPU tracking mode found. Falling back on CPU constant mode.
[codecarbon WARNING @ 08:18:19] We saw that you have a 12th Gen Intel(R) Core(TM) i7-1260P but we don't know it. Please contact us.
[codecarbon INFO @ 08:18:19] CPU Model on constant consumption mode: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 08:18:19] >>> Tracker's metadata:
[codecarbon INFO @ 08:18:19]   Platform system: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
[codecarbon INFO @ 08:18:19]   Python version: 3.9.0
[codecarbon INFO @ 08:18:19]   CodeCarbon version: 2.3.4
[codecarbon INFO @ 08:18:19]   Available RAM : 15.475 GB
[codecarbon INFO @ 08:18:19]   CPU count: 16
[codecarbon INFO @ 08:18:19]   CPU model: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 08:18:19]   GPU count: None
[codecarbon INFO @ 08:18:19]   GPU model: None
[codecarbon INFO @ 08:18:22] Energy consumed for RAM : 0.000000 kWh. RAM Power : 5.803128719329834 W
[codecarbon INFO @ 08:18:22] Energy consumed for all CPUs : 0.000002 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 08:18:22] 0.000002 kWh of electricity used since the beginning.
[codecarbon INFO @ 08:18:22] [setup] RAM Tracking...
[codecarbon INFO @ 08:18:22] [setup] GPU Tracking...
[codecarbon INFO @ 08:18:22] No GPU found.
[codecarbon INFO @ 08:18:22] [setup] CPU Tracking...
[codecarbon WARNING @ 08:18:22] No CPU tracking mode found. Falling back on CPU constant mode.
[codecarbon WARNING @ 08:18:23] We saw that you have a 12th Gen Intel(R) Core(TM) i7-1260P but we don't know it. Please contact us.
[codecarbon INFO @ 08:18:23] CPU Model on constant consumption mode: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 08:18:23] >>> Tracker's metadata:
[codecarbon INFO @ 08:18:23]   Platform system: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
[codecarbon INFO @ 08:18:23]   Python version: 3.9.0
[codecarbon INFO @ 08:18:23]   CodeCarbon version: 2.3.4
[codecarbon INFO @ 08:18:23]   Available RAM : 15.475 GB
[codecarbon INFO @ 08:18:23]   CPU count: 16
[codecarbon INFO @ 08:18:23]   CPU model: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 08:18:23]   GPU count: None
[codecarbon INFO @ 08:18:23]   GPU model: None
[codecarbon INFO @ 08:18:27] Energy consumed for RAM : 0.000000 kWh. RAM Power : 5.803128719329834 W
[codecarbon INFO @ 08:18:27] Energy consumed for all CPUs : 0.000002 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 08:18:27] 0.000002 kWh of electricity used since the beginning.
[codecarbon INFO @ 08:18:27] [setup] RAM Tracking...
[codecarbon INFO @ 08:18:27] [setup] GPU Tracking...
[codecarbon INFO @ 08:18:27] No GPU found.
[codecarbon INFO @ 08:18:27] [setup] CPU Tracking...
[codecarbon WARNING @ 08:18:27] No CPU tracking mode found. Falling back on CPU constant mode.
[codecarbon WARNING @ 08:18:28] We saw that you have a 12th Gen Intel(R) Core(TM) i7-1260P but we don't know it. Please contact us.
[codecarbon INFO @ 08:18:28] CPU Model on constant consumption mode: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 08:18:28] >>> Tracker's metadata:
[codecarbon INFO @ 08:18:28]   Platform system: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
[codecarbon INFO @ 08:18:28]   Python version: 3.9.0
[codecarbon INFO @ 08:18:28]   CodeCarbon version: 2.3.4
[codecarbon INFO @ 08:18:28]   Available RAM : 15.475 GB
[codecarbon INFO @ 08:18:28]   CPU count: 16
[codecarbon INFO @ 08:18:28]   CPU model: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 08:18:28]   GPU count: None
[codecarbon INFO @ 08:18:28]   GPU model: None
[codecarbon INFO @ 08:18:31] Energy consumed for RAM : 0.000000 kWh. RAM Power : 5.803128719329834 W
[codecarbon INFO @ 08:18:31] Energy consumed for all CPUs : 0.000002 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 08:18:31] 0.000002 kWh of electricity used since the beginning.
[codecarbon INFO @ 08:18:31] [setup] RAM Tracking...
[codecarbon INFO @ 08:18:31] [setup] GPU Tracking...
[codecarbon INFO @ 08:18:31] No GPU found.
[codecarbon INFO @ 08:18:31] [setup] CPU Tracking...
[codecarbon WARNING @ 08:18:31] No CPU tracking mode found. Falling back on CPU constant mode.
[codecarbon WARNING @ 08:18:32] We saw that you have a 12th Gen Intel(R) Core(TM) i7-1260P but we don't know it. Please contact us.
[codecarbon INFO @ 08:18:32] CPU Model on constant consumption mode: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 08:18:32] >>> Tracker's metadata:
[codecarbon INFO @ 08:18:32]   Platform system: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
[codecarbon INFO @ 08:18:32]   Python version: 3.9.0
[codecarbon INFO @ 08:18:32]   CodeCarbon version: 2.3.4
[codecarbon INFO @ 08:18:32]   Available RAM : 15.475 GB
[codecarbon INFO @ 08:18:32]   CPU count: 16
[codecarbon INFO @ 08:18:32]   CPU model: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 08:18:32]   GPU count: None
[codecarbon INFO @ 08:18:32]   GPU model: None
[codecarbon INFO @ 08:18:35] Energy consumed for RAM : 0.000000 kWh. RAM Power : 5.803128719329834 W
[codecarbon INFO @ 08:18:35] Energy consumed for all CPUs : 0.000002 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 08:18:35] 0.000002 kWh of electricity used since the beginning.
2024-03-11 08:18:36,076  | py-experimenter - INFO     | All configured executions finished.

Check Results

The content of all database tables having keyfields and resultfields, as well as every logtable can be easily obtained.

[6]:
experimenter.get_table()
[6]:
ID dataset cross_validation_splits seed creation_date status start_date name machine best_kernel_accuracy best_kernel_f1 end_date error
0 1 iris 5 1 2024-03-11 08:18:13 done 2024-03-11 08:18:13 example_notebook Worklaptop linear linear 2024-03-11 08:18:18 None
1 2 iris 5 2 2024-03-11 08:18:13 done 2024-03-11 08:18:18 example_notebook Worklaptop linear linear 2024-03-11 08:18:22 None
2 3 iris 5 3 2024-03-11 08:18:13 done 2024-03-11 08:18:22 example_notebook Worklaptop linear linear 2024-03-11 08:18:27 None
3 4 iris 5 4 2024-03-11 08:18:13 done 2024-03-11 08:18:27 example_notebook Worklaptop linear linear 2024-03-11 08:18:31 None
4 5 iris 5 5 2024-03-11 08:18:13 done 2024-03-11 08:18:31 example_notebook Worklaptop linear linear 2024-03-11 08:18:35 None
[7]:
experimenter.get_logtable('train_scores')
[7]:
ID experiment_id timestamp f1 accuracy kernel
0 1 1 2024-03-11 08:18:17 0.971667 0.971667 'linear'
1 2 1 2024-03-11 08:18:18 0.936667 0.936667 'poly'
2 3 1 2024-03-11 08:18:18 0.975000 0.975000 'rbf'
3 4 1 2024-03-11 08:18:18 0.896667 0.896667 'sigmoid'
4 5 2 2024-03-11 08:18:22 0.971667 0.971667 'linear'
5 6 2 2024-03-11 08:18:22 0.936667 0.936667 'poly'
6 7 2 2024-03-11 08:18:22 0.975000 0.975000 'rbf'
7 8 2 2024-03-11 08:18:22 0.896667 0.896667 'sigmoid'
8 9 3 2024-03-11 08:18:26 0.971667 0.971667 'linear'
9 10 3 2024-03-11 08:18:26 0.936667 0.936667 'poly'
10 11 3 2024-03-11 08:18:26 0.975000 0.975000 'rbf'
11 12 3 2024-03-11 08:18:27 0.896667 0.896667 'sigmoid'
12 13 4 2024-03-11 08:18:31 0.971667 0.971667 'linear'
13 14 4 2024-03-11 08:18:31 0.936667 0.936667 'poly'
14 15 4 2024-03-11 08:18:31 0.975000 0.975000 'rbf'
15 16 4 2024-03-11 08:18:31 0.896667 0.896667 'sigmoid'
16 17 5 2024-03-11 08:18:35 0.971667 0.971667 'linear'
17 18 5 2024-03-11 08:18:35 0.936667 0.936667 'poly'
18 19 5 2024-03-11 08:18:35 0.975000 0.975000 'rbf'
19 20 5 2024-03-11 08:18:35 0.896667 0.896667 'sigmoid'
[8]:
experimenter.get_logtable('test_f1')
[8]:
ID experiment_id timestamp test_f1
0 1 1 2024-03-11 08:18:17 0.966667
1 2 1 2024-03-11 08:18:18 0.933333
2 3 1 2024-03-11 08:18:18 0.966667
3 4 1 2024-03-11 08:18:18 0.893333
4 5 2 2024-03-11 08:18:22 0.966667
5 6 2 2024-03-11 08:18:22 0.933333
6 7 2 2024-03-11 08:18:22 0.966667
7 8 2 2024-03-11 08:18:22 0.893333
8 9 3 2024-03-11 08:18:26 0.966667
9 10 3 2024-03-11 08:18:26 0.933333
10 11 3 2024-03-11 08:18:26 0.966667
11 12 3 2024-03-11 08:18:27 0.893333
12 13 4 2024-03-11 08:18:31 0.966667
13 14 4 2024-03-11 08:18:31 0.933333
14 15 4 2024-03-11 08:18:31 0.966667
15 16 4 2024-03-11 08:18:31 0.893333
16 17 5 2024-03-11 08:18:35 0.966667
17 18 5 2024-03-11 08:18:35 0.933333
18 19 5 2024-03-11 08:18:35 0.966667
19 20 5 2024-03-11 08:18:35 0.893333
[9]:
experimenter.get_logtable('test_accuracy')
[9]:
ID experiment_id timestamp test_accuracy
0 1 1 2024-03-11 08:18:17 0.966667
1 2 1 2024-03-11 08:18:18 0.933333
2 3 1 2024-03-11 08:18:18 0.966667
3 4 1 2024-03-11 08:18:18 0.893333
4 5 2 2024-03-11 08:18:22 0.966667
5 6 2 2024-03-11 08:18:22 0.933333
6 7 2 2024-03-11 08:18:22 0.966667
7 8 2 2024-03-11 08:18:22 0.893333
8 9 3 2024-03-11 08:18:26 0.966667
9 10 3 2024-03-11 08:18:26 0.933333
10 11 3 2024-03-11 08:18:26 0.966667
11 12 3 2024-03-11 08:18:27 0.893333
12 13 4 2024-03-11 08:18:31 0.966667
13 14 4 2024-03-11 08:18:31 0.933333
14 15 4 2024-03-11 08:18:31 0.966667
15 16 4 2024-03-11 08:18:31 0.893333
16 17 5 2024-03-11 08:18:35 0.966667
17 18 5 2024-03-11 08:18:35 0.933333
18 19 5 2024-03-11 08:18:35 0.966667
19 20 5 2024-03-11 08:18:35 0.893333

CodeCarbon

Note that CodeCarbon is activated by default, collecting information about the carbon emissions of each experiment. Have a look at our general usage example and the according documentation of CodeCarbon fields for more information.