PDMS Scoring Pipeline#

The PDMS Scoring Pipeline takes in several data sources, and puts out a predicted medically safe dataframe. The pipeline applied MLOps best practices by incorporating a separate pdms-core-modules library, which allows for sharing all scoring and training functionalities so scoring and training of PDMS are practically similar. Which allows for:

  • easier development and deployment of new functionalities, whilst your model is in production,

  • less time to productionize your code, after training.

PDMS pipeline#

The pdms core scoring pipeline uses three repositories, nuh-models, catalyst-data-pipeline and pdms-core-modules.

  1. The scaler and model artifacts of pdms core modules can be found in nuh-models repository here,

  2. The scoring pipeline of the pdms core modules can be found in the catalyst-data-pipeline repository here and

  3. Lastly, the shared core module containing all pdms shared logic can be found here

Manual Deployment#

Until the 1st of August, the scoring pipeline is ran using the pipelines/Makefile. To run the manual deployment you have to

  1. update your .env file with the right port, password, user settings,

  2. update your config-dev.yml to have to right environment (only when running locally) and make config-dev

  3. ensure you have the nuh-models repository installed locally and on the main branch for the model artifacts and lookup files.

After, to update the pdms_model_versions and pdms_predictions table, you update the dev database with the following syntax:

make clean 
make all-pdms-transformers

This will generate the in-spell predictions, the out-of-range predictions and the admissions predictions with three different models. Respectively the in spell model, no model (with uuid 000000-0000-0000-0000-000000000001) and admission model.

To run with the latest data update the config-dev/acc/prod.yml with the following parameters:

  • PAST_BOOKED_CASES_IN_FILE,

  • WARD_STAYS_IN_FILE,

  • NERVECENTRE_OBS_IN_FILE

  • PATIENT_ARRIVAL_HISTORY_IN_FILE,

  • NERVECENTRE_EWS_IN_FILE,

  • and database configuration: PORT and PASSWORD

Scheduled Deployment#

NOT YET IMPLEMENTED, the scheduled run will be triggered with luigi starters. An example of a luigi starter can be found here and the user story to create the luigi starters can be found here

Bugs#

There is a shared catalyst-data-pipeline bug recorded as an issue here:

GenericDatabase defines shared parameters like port, user etc. An shared parameter is also table. In case several sequential parameters use the parameter table, the string for each luigi tasks will be the same. This will be a problem when you are trying to write to two different tables with two sequential luigi tasks.