External Model¶

You can get code for runing this guide on the Getting started guide

First import all the modules

 1import previsionio as pio
 2import yaml
 3from sklearn.linear_model import LogisticRegression
 4from sklearn.pipeline import  make_pipeline
 5from sklearn.preprocessing import OrdinalEncoder
 6from sklearn.neighbors import KNeighborsClassifier
 7from skl2onnx import convert_sklearn
 8from skl2onnx.common.data_types import FloatTensorType
 9import numpy as np
10import logging

Setup your token account ( see Using The API ) and some parameter for your project, like its name, the name of the datasets…

Note that you always must create a Project for hosting datasets and experiments.

 1import os
 2from os.path import join
 3from dotenv import load_dotenv
 4
 5load_dotenv()
 6
 7PROJECT_NAME="Sklearn models Comparison"
 8TRAINSET_NAME="fraud_train"
 9HOLDOUT_NAME="fraud_holdout"
10INPUT_PATH=join("data","assets")
11TARGET = 'fraude'
12
13
14pio.client.init_client(
15   token=os.environ['PIO_MASTER_TOKEN'],
16   prevision_url=os.environ['DOMAIN'])

Create a New project, or reuse an existing one

1projects_list = pio.Project.list()
2# Create a new Project or using the old one
3
4if  PROJECT_NAME not in [p.name for p in projects_list] :
5   project = pio.Project.new(name=PROJECT_NAME,  description="An experiment using ")
6else :
7   project = [p for p in projects_list if p.name==PROJECT_NAME] [0]

Add the dataset to the projects or get the existing one if already uploaded ( the dataset will be automatically uploaded to your account when you create them )

 1datasets_list = project.list_datasets()
 2for d in datasets_list:
 3   if TRAINSET_NAME in [d.name for d in datasets_list] :
 4      train = [d for d in datasets_list if d.name==TRAINSET_NAME] [0]
 5   else :
 6      train = project.create_dataset(file_name=join(INPUT_PATH,"trainset_fraud.csv"), name='fraud_train')
 7
 8   if HOLDOUT_NAME in [d.name for d in datasets_list] :
 9      test = [d for d in datasets_list if d.name==HOLDOUT_NAME] [0]
10   else :
11      test  = project.create_dataset(file_name=join(INPUT_PATH,"holdout_fraud.csv"), name='fraud_holdout')

Beware to converting the data to the right type before makgin your dataset

1train_data = train.data.astype(np.float32)
2test_data = test.data.astype(np.float32)
3
4X_train = train_data.drop(TARGET, axis=1)
5y_train = train_data[TARGET]

Then train some classifiers ( you may upload many models at once ) and create an yaml file to hodl the models configuration.

 1classifiers=[ {
 2               "name":"lrsklearn",
 3               "algo":LogisticRegression(max_iter=3000)
 4               },
 5               {
 6               "name":"knnsk",
 7               "algo": KNeighborsClassifier(3)
 8               }
 9            ]
10
11initial_type = [('float_input', FloatTensorType([None,X_train.shape[1]]))]
12
13
14config={}
15config["class_names"] = [str(c) for c in set(y_train)]
16config["input"] = [str(feature) for feature in X_train.columns]
17with open(join(INPUT_PATH,'logreg_fraude.yaml'), 'w') as f:
18   yaml.dump(config, f)

Sklearn Pipeline are supported so you may build any pipeline you want as long as you provide the right config file. Convert each of your model to an onnx file once fitted :

1for clf in classifiers :
2   logging
3   clr = make_pipeline(OrdinalEncoder(),clf["algo"])
4   clr.fit(X_train, y_train )
5
6   onx = convert_sklearn(clr, initial_types=initial_type)
7   with open(join(INPUT_PATH,f'{clf["name"]}_logreg_fraude.onnx'), 'wb') as f:
8      f.write(onx.SerializeToString())

And last, use the Project create_external_classification method to upload all your models at once in the same experiment

Note

You can upload many onnx file in the same experiment in order to becnhmark them. To do that you must provide a list of tuple, one for each onnx file with :

  • a name

  • the path to your onnx file

  • the path to your config file ( often the same for each model

1external_models=[(clf["name"],join(INPUT_PATH,f'{clf["name"]}_logreg_fraude.onnx'), join(INPUT_PATH,'logreg_fraude.yaml')) for clf in classifiers ]
2exp = project.create_external_classification(experiment_name=f'churn_sklearn_{clf["name"]}',
3                                    dataset=train,
4                                    holdout_dataset=test,
5                                    target_column=TARGET,
6                                    external_models =  external_models
7                                 )