PolarSPARC

Hands-on with Pydantic


Bhaskar S 02/19/2022


Overview

Pydantic is an elegant and popular Python library, that is used for performing data parsing and validation at runtime, using Python's type hints, and providing user-friendly errors, when the data is invalid.

Pydantic can be used in the preprocessing step of a data pipeline to ensure clean and valid data flows down the pipeline.


Installation

Installation is assumed to be Linux desktop running Ubuntu 20.04 LTS. To install the pydantic Python module and its type extensions, open a terminal window and execute the following command:

$ pip3 install pydantic typing-extensions

On successful installation, one can start using pydantic.


Hands-on pydantic

The following is a simple Python script that demonstrates a hypothetical online item listing data class using pydantic:


sample-1.py
#
# @Author: Bhaskar S
# @Blog:   https://www.polarsparc.com
# @Date:   19 Feb 2022
#

from pydantic import BaseModel
from typing import Optional
import logging

logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO)


class Listing(BaseModel):
    category: str
    title: str
    description: Optional[str] = None
    condition: str
    price: float


def main():
    ipad_json = {'category': 'Electronics',
                 'title': 'iPad Air 2',
                 'description': 'Sparingly used iPad Air 2 in Excellent working condition',
                 'condition': 'Excellent',
                 'price': 55.00}

    ipad = Listing(**ipad_json)

    logging.info(ipad)
    logging.info('{} ({}) - {}'.format(ipad.title, ipad.condition, ipad.price))

    table_json = {'category': 'Furniture',
                  'title': 'Oak Dressing Table',
                  'condition': 'Excellent',
                  'price': 75.00}

    table = Listing(**table_json)

    logging.info(table)


if __name__ == '__main__':
    main()

Some aspects of the sample-1.py from the above needs a little explanation.

To run the Python script sample-1.py, execute the following command:

$ python3 sample-1.py

The following would be a typical output:

Output.1

2022-02-19 19:29:31,707 - category='Electronics' title='iPad Air 2' description='Sparingly used iPad Air 2 in Excellent working condition' condition='Excellent' price=55.0
2022-02-19 19:29:31,707 - iPad Air 2 (Excellent) - 55.0
2022-02-19 19:29:31,707 - category='Furniture' title='Oak Dressing Table' description=None condition='Excellent' price=75.0

The following is the same Python script as in sample-1.py, except that it has been enhanced to use custom enum classes:


sample-2.py
#
# @Author: Bhaskar S
# @Blog:   https://www.polarsparc.com
# @Date:   19 Feb 2022
#

from enum import Enum
from pydantic import BaseModel
from typing import Optional
import logging

logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO)


class Category(Enum):
    ELECTRONICS = 'Electronics'
    FURNITURE = 'Furniture'
    TOYS = 'Toys'


class Condition(Enum):
    NEW = 'New'
    USED = 'Used'


class Listing(BaseModel):
    category: Category
    title: str
    description: Optional[str] = None
    condition: Condition
    price: float


def main():
    ipad_json = {'category': Category.ELECTRONICS,
                 'title': 'iPad Air 2',
                 'description': 'Sparingly used iPad Air 2 in Excellent working condition',
                 'condition': Condition.USED,
                 'price': 55.00}

    ipad = Listing(**ipad_json)

    logging.info(ipad)
    logging.info('{} ({}) - {}'.format(ipad.title, ipad.condition.value, ipad.price))

    table_json = {'category': Category.FURNITURE,
                  'title': 'Oak Dressing Table',
                  'condition': Condition.USED,
                  'price': 75.00}

    table = Listing(**table_json)

    logging.info(table)


if __name__ == '__main__':
    main()

In sample-2.py we have created two enum classes - one for the category and the other for the condition.

To run the Python script sample-2.py, execute the following command:

$ python3 sample-2.py

The following would be a typical output:

Output.2

2022-02-19 19:45:06,787 - category=<Category.ELECTRONICS: 'Electronics'> title='iPad Air 2' description='Sparingly used iPad Air 2 in Excellent working condition' condition=<Condition.USED: 'Used'> price=55.0
2022-02-19 19:45:06,787 - iPad Air 2 (Used) - 55.0
2022-02-19 19:45:06,787 - category=<Category.FURNITURE: 'Furniture'> title='Oak Dressing Table' description=None condition=<Condition.USED: 'Used'> price=75.0

The following is the modified version of the Python script sample-2.py to demonstrate how pydantic behaves on missing data field(s):


sample-3.py
#
# @Author: Bhaskar S
# @Blog:   https://www.polarsparc.com
# @Date:   19 Feb 2022
#

from enum import Enum
from pydantic import BaseModel, ValidationError
from typing import Optional
import logging

logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO)


class Category(Enum):
    ELECTRONICS = 'Electronics'
    FURNITURE = 'Furniture'
    TOYS = 'Toys'


class Condition(Enum):
    NEW = 'New'
    USED = 'Used'


class Listing(BaseModel):
    category: Category
    title: str
    description: Optional[str] = None
    condition: Condition
    price: float


def main():
    ipad_json = {'title': 'iPad Air 2',
                 'condition': Condition.USED,
                 'price': 55.00}

    try:
        Listing(**ipad_json)
    except ValidationError as ve:
        logging.error(ve.json())


if __name__ == '__main__':
    main()

In sample-3.py we initialize an instance of Listing using the input JSON that is missing a mandatory category field.

One aspect of the sample-3.py from the above needs a little explanation.

To run the Python script sample-3.py, execute the following command:

$ python3 sample-3.py

The following would be a typical output:

Output.3

2022-02-19 20:05:40,157 - [
  {
    "loc": [
      "category"
    ],
    "msg": "field required",
    "type": "value_error.missing"
  }
]

The following is the modified version of the Python script sample-2.py to demonstrate the use of extended type hints from pydantic that allow one to define field level constraints:


sample-4.py
#
# @Author: Bhaskar S
# @Blog:   https://www.polarsparc.com
# @Date:   19 Feb 2022
#

from enum import Enum
from pydantic import BaseModel, ValidationError, constr
from typing import Optional
import logging

logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO)


class Category(Enum):
    ELECTRONICS = 'Electronics'
    FURNITURE = 'Furniture'
    TOYS = 'Toys'


class Condition(Enum):
    NEW = 'New'
    USED = 'Used'


class Listing(BaseModel):
    category: Category
    title: constr(max_length=25)
    description: Optional[str]
    condition: Condition
    price: float


def main():
    ipad_json = {'title': 'iPad Air 2 (Generation Two)',
                 'description': 'Sparingly used iPad Air 2 in Excellent working condition',
                 'condition': Condition.USED,
                 'price': 100.00}

    try:
        Listing(**ipad_json)
    except ValidationError as ve:
        logging.error(ve.json())


if __name__ == '__main__':
    main()

Some aspects of the sample-4.py from the above needs a little explanation.

To run the Python script sample-4.py, execute the following command:

$ python3 sample-4.py

The following would be a typical output:

Output.4

2022-02-19 20:22:18,687 - [
  {
    "loc": [
      "category"
    ],
    "msg": "field required",
    "type": "value_error.missing"
  },
  {
    "loc": [
      "title"
    ],
    "msg": "ensure this value has at most 25 characters",
    "type": "value_error.any_str.max_length",
    "ctx": {
      "limit_value": 25
    }
  }
]

The following is the modified version of the Python script sample-4.py to demonstrate the support for custom field validators in pydantic:


sample-5.py
#
# @Author: Bhaskar S
# @Blog:   https://www.polarsparc.com
# @Date:   19 Feb 2022
#

from enum import Enum
from pydantic import BaseModel, ValidationError, validator, constr
from typing import Optional
import logging

logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO)


class Category(Enum):
    ELECTRONICS = 'Electronics'
    FURNITURE = 'Furniture'
    TOYS = 'Toys'


class Condition(Enum):
    NEW = 'New'
    USED = 'Used'


class Listing(BaseModel):
    category: Category
    title: constr(max_length=25)
    description: Optional[str]
    condition: Condition
    price: float

    @validator('price')
    def valid_price_check(cls, val, values) -> float:
        logging.info(values)
        if val <= 0.0:
            raise ValueError('price cannot be <= 0.0')
        if val > 99.99:
            raise ValueError('price cannot be > 99.99')
        return val


def main():
    ipad_json = {'category': Category.ELECTRONICS,
                 'title': 'iPad Air 2',
                 'condition': Condition.USED,
                 'price': 100.00}

    try:
        Listing(**ipad_json)
    except ValidationError as ve:
        logging.error(ve.json())


if __name__ == '__main__':
    main()

One aspect of the sample-5.py from the above needs a little explanation.

To run the Python script sample-5.py, execute the following command:

$ python3 sample-5.py

The following would be a typical output:

Output.5

2022-02-19 20:49:03,591 - {'category': <Category.ELECTRONICS: 'Electronics'>, 'title': 'iPad Air 2', 'description': None, 'condition': <Condition.USED: 'Used'>}
2022-02-19 20:49:03,592 - [
  {
    "loc": [
      "price"
    ],
    "msg": "price cannot be > 99.99",
    "type": "value_error"
  }
]

The following is the modified version of the Python script sample-4.py to demonstrate the support for a custom root level validator, that is applicable for the entire data class in pydantic:


sample-6.py
#
# @Author: Bhaskar S
# @Blog:   https://www.polarsparc.com
# @Date:   19 Feb 2022
#

from enum import Enum
from pydantic import BaseModel, ValidationError, root_validator, constr
from typing import Optional
import logging

logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO)


class Category(Enum):
    ELECTRONICS = 'Electronics'
    FURNITURE = 'Furniture'
    TOYS = 'Toys'


class Condition(Enum):
    NEW = 'New'
    USED = 'Used'


class Listing(BaseModel):
    category: Category
    title: constr(max_length=25)
    description: Optional[str]
    condition: Condition
    price: float

    @root_validator
    def valid_price_check(cls, values) -> dict:
        val = values.get('price')
        if val <= 0.0:
            raise ValueError('price cannot be <= 0.0')
        if val > 99.99:
            raise ValueError('price cannot be > 99.99')
        return values


def main():
    table_json = {'category': Category.FURNITURE,
                  'title': 'Coffee Table',
                  'condition': Condition.NEW,
                  'price': 35.99}

    table = Listing(**table_json)
    table.description = 'Beautiful glass top coffee table'

    logging.info(table.dict())


if __name__ == '__main__':
    main()

Some aspects of the sample-6.py from the above needs a little explanation.

To run the Python script sample-6.py, execute the following command:

$ python3 sample-6.py

The following would be a typical output:

Output.6

2022-02-19 21:18:03,472 - {'category': <Category.FURNITURE: 'Furniture'>, 'title': 'Coffee Table', 'description': 'Beautiful glass top coffee table', 'condition': <Condition.NEW: 'New'>, 'price': 35.99}

In the Output.6, notice how the enums have been encoded and displayed. What if we want the values from the enum to be used instead ???

Also, what if we want to prevent mutation of field values once a data class instance is created ???

The following is the modified version of the Python script sample-4.py to demonstrate the support for customization of the pydantic behavior:


sample-7.py
#
# @Author: Bhaskar S
# @Blog:   https://www.polarsparc.com
# @Date:   19 Feb 2022
#

from enum import Enum
from pydantic import BaseModel, root_validator, constr
from typing import Optional
import logging

logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO)


class Category(Enum):
    ELECTRONICS = 'Electronics'
    FURNITURE = 'Furniture'
    TOYS = 'Toys'


class Condition(Enum):
    NEW = 'New'
    USED = 'Used'


class Listing(BaseModel):
    category: Category
    title: constr(max_length=25)
    description: Optional[str]
    condition: Condition
    price: float

    class Config:
        allow_mutation = False
        use_enum_values = True

    @root_validator
    def valid_price_check(cls, values) -> dict:
        val = values.get('price')
        if val <= 0.0:
            raise ValueError('price cannot be <= 0.0')
        if val > 99.99:
            raise ValueError('price cannot be > 99.99')
        return values


def main():
    toy_json = {'category': Category.TOYS,
                'title': 'Chutes and Ladders',
                'condition': Condition.USED,
                'price': 4.99,
                'extra': 'Extra information'}

    toy = Listing(**toy_json)

    logging.info(toy.dict())

    try:
        toy.description = 'Changing the description'
    except TypeError as te:
        logging.error('***ERROR*** {}'.format(te))


if __name__ == '__main__':
    main()

Some aspects of the sample-7.py from the above needs a little explanation.

To run the Python script sample-7.py, execute the following command:

$ python3 sample-7.py

The following would be a typical output:

Output.7

2022-02-19 21:22:07,194 - {'category': 'Toys', 'title': 'Chutes and Ladders', 'description': None, 'condition': 'Used', 'price': 4.99}
2022-02-19 21:22:07,194 - ***ERROR*** "Listing" is immutable and does not support item assignment

We have barely scratched the surface of pydantic. There are many more features and capabilities in pydantic.

References

Pydantic


© PolarSPARC