Starters#

Configs#

Config comes from:

  • ./starters/resources/config.yml - Overall starters config

    • General luigi params

  • ./starters/src/*/config.yml - Per starter configs

    • Need to update port and password

  • ./pipelines/config.ini - Pipelines config. Needs to be generated using make config-dev

    • Need to update ENVIRONMENT , INITIAL_LOAD_DIRECTORY , INCREMENTAL_LOAD_DIRECTORY , DAILY_LOAD_DIRECTORY , and INTERMEDIATE_PATH

  • ./pipelines/.env - Makefile vars. Only needed for pre-seeding

Steps#

First time#

  1. Make sure DB is deployed and up to date:

    • https://gitlab.stopstaringatme.org/patient-catalyst-team/catalyst-data-pipeline/-/blob/main/database_manager/migrations/patient_catalyst_web/template_00_init.sql

    • https://gitlab.stopstaringatme.org/patient-catalyst-team/patient-catalyst-web/-/blob/develop/NUH.PatientCatalyst.Infrastructure/Scripts/Deployment/DeployDatabase.sql

  2. Update configs to match reqired paths

    • Also .env needed for first run

  3. Run make pre-seeding-pt-transformers from the pipelines folder.

    • This is required because the seed script requires some tables to already be filled

    • Alternatively run the PT starter until the KPI ingestor fails

  4. Run seed script: https://gitlab.stopstaringatme.org/patient-catalyst-team/patient-catalyst-web/-/raw/develop/NUH.PatientCatalyst.Infrastructure/Scripts/Deployment/SeedDatabase.sql

Running the pipeline#

  1. Delete intermediate files. Probably something like rm /var/local/data/tmp/*PROD*

    1. This needs to be run everytime before the starters, otherwise they will use old data

    2. Starters can be run with old data though, for example, to rebuild the database quickly

  2. Run python pipeliner/main.py --pipelines_path ./pipelines/src/ --starters_path ./starters/src/ --config ./starters/resources/config.yml from the project root

    1. Or use the docker container

Other things#

  • Take note of the project name in the .env next to the docker compose file because if this isn’t set, there is risk of the different envs interfering with each other

  • For PDMS, at is logging to my mail spool and so will have secrets in. Please clean /var/spool/mail/mbudden