PolarSPARC |
Hands-on with Pydantic
Bhaskar S | 02/19/2022 |
Overview
Pydantic is an elegant and popular Python library, that is used for performing data parsing and validation at runtime, using Python's type hints, and providing user-friendly errors, when the data is invalid.
Pydantic can be used in the preprocessing step of a data pipeline to ensure clean and valid data flows down the pipeline.
Installation
Installation is assumed to be Linux desktop running Ubuntu 20.04 LTS. To install the pydantic Python module and its type extensions, open a terminal window and execute the following command:
$ pip3 install pydantic typing-extensions
On successful installation, one can start using pydantic.
Hands-on pydantic
The following is a simple Python script that demonstrates a hypothetical online item listing data class using pydantic:
# # @Author: Bhaskar S # @Blog: https://www.polarsparc.com # @Date: 19 Feb 2022 # from pydantic import BaseModel from typing import Optional import logging logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO) class Listing(BaseModel): category: str title: str description: Optional[str] = None condition: str price: float def main(): ipad_json = {'category': 'Electronics', 'title': 'iPad Air 2', 'description': 'Sparingly used iPad Air 2 in Excellent working condition', 'condition': 'Excellent', 'price': 55.00} ipad = Listing(**ipad_json) logging.info(ipad) logging.info('{} ({}) - {}'.format(ipad.title, ipad.condition, ipad.price)) table_json = {'category': 'Furniture', 'title': 'Oak Dressing Table', 'condition': 'Excellent', 'price': 75.00} table = Listing(**table_json) logging.info(table) if __name__ == '__main__': main()
Some aspects of the sample-1.py from the above needs a little explanation.
BaseModel :: is the base class for defining pydantic data classes, that ensures the fields conform to the specified field types (via type hints) and the mandatory fields have associated value, etc. By default, the fields of a pydantic data class are MUTABLE, meaning they can be changed later after an instance of the data class is initially created with some values
FIELD_NAME: TYPE :: defines a field that has the name FIELD_NAME and is of the specified TYPE. For example, the field title is of type str, which is a string, the field price is of type float, etc
Optional[TYPE] :: the pydantic type that indicates the field is optional and the default value is for such fields is None
Listing(**kwargs) :: Notice how an item listing data instance is created from a JSON object. The fields in the data class are initialized using the corresponding field values from the JSON
To run the Python script sample-1.py, execute the following command:
$ python3 sample-1.py
The following would be a typical output:
2022-02-19 19:29:31,707 - category='Electronics' title='iPad Air 2' description='Sparingly used iPad Air 2 in Excellent working condition' condition='Excellent' price=55.0 2022-02-19 19:29:31,707 - iPad Air 2 (Excellent) - 55.0 2022-02-19 19:29:31,707 - category='Furniture' title='Oak Dressing Table' description=None condition='Excellent' price=75.0
The following is the same Python script as in sample-1.py, except that it has been enhanced to use custom enum classes:
# # @Author: Bhaskar S # @Blog: https://www.polarsparc.com # @Date: 19 Feb 2022 # from enum import Enum from pydantic import BaseModel from typing import Optional import logging logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO) class Category(Enum): ELECTRONICS = 'Electronics' FURNITURE = 'Furniture' TOYS = 'Toys' class Condition(Enum): NEW = 'New' USED = 'Used' class Listing(BaseModel): category: Category title: str description: Optional[str] = None condition: Condition price: float def main(): ipad_json = {'category': Category.ELECTRONICS, 'title': 'iPad Air 2', 'description': 'Sparingly used iPad Air 2 in Excellent working condition', 'condition': Condition.USED, 'price': 55.00} ipad = Listing(**ipad_json) logging.info(ipad) logging.info('{} ({}) - {}'.format(ipad.title, ipad.condition.value, ipad.price)) table_json = {'category': Category.FURNITURE, 'title': 'Oak Dressing Table', 'condition': Condition.USED, 'price': 75.00} table = Listing(**table_json) logging.info(table) if __name__ == '__main__': main()
In sample-2.py we have created two enum classes - one for the category and the other for the condition.
To run the Python script sample-2.py, execute the following command:
$ python3 sample-2.py
The following would be a typical output:
2022-02-19 19:45:06,787 - category=<Category.ELECTRONICS: 'Electronics'> title='iPad Air 2' description='Sparingly used iPad Air 2 in Excellent working condition' condition=<Condition.USED: 'Used'> price=55.0 2022-02-19 19:45:06,787 - iPad Air 2 (Used) - 55.0 2022-02-19 19:45:06,787 - category=<Category.FURNITURE: 'Furniture'> title='Oak Dressing Table' description=None condition=<Condition.USED: 'Used'> price=75.0
The following is the modified version of the Python script sample-2.py to demonstrate how pydantic behaves on missing data field(s):
# # @Author: Bhaskar S # @Blog: https://www.polarsparc.com # @Date: 19 Feb 2022 # from enum import Enum from pydantic import BaseModel, ValidationError from typing import Optional import logging logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO) class Category(Enum): ELECTRONICS = 'Electronics' FURNITURE = 'Furniture' TOYS = 'Toys' class Condition(Enum): NEW = 'New' USED = 'Used' class Listing(BaseModel): category: Category title: str description: Optional[str] = None condition: Condition price: float def main(): ipad_json = {'title': 'iPad Air 2', 'condition': Condition.USED, 'price': 55.00} try: Listing(**ipad_json) except ValidationError as ve: logging.error(ve.json()) if __name__ == '__main__': main()
In sample-3.py we initialize an instance of Listing using the input JSON that is missing a mandatory category field.
One aspect of the sample-3.py from the above needs a little explanation.
ValidationError :: is the exception raised by pydantic when an error is encountered during data validation, such as missing mandatory fields or invalid values, etc
To run the Python script sample-3.py, execute the following command:
$ python3 sample-3.py
The following would be a typical output:
2022-02-19 20:05:40,157 - [ { "loc": [ "category" ], "msg": "field required", "type": "value_error.missing" } ]
The following is the modified version of the Python script sample-2.py to demonstrate the use of extended type hints from pydantic that allow one to define field level constraints:
# # @Author: Bhaskar S # @Blog: https://www.polarsparc.com # @Date: 19 Feb 2022 # from enum import Enum from pydantic import BaseModel, ValidationError, constr from typing import Optional import logging logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO) class Category(Enum): ELECTRONICS = 'Electronics' FURNITURE = 'Furniture' TOYS = 'Toys' class Condition(Enum): NEW = 'New' USED = 'Used' class Listing(BaseModel): category: Category title: constr(max_length=25) description: Optional[str] condition: Condition price: float def main(): ipad_json = {'title': 'iPad Air 2 (Generation Two)', 'description': 'Sparingly used iPad Air 2 in Excellent working condition', 'condition': Condition.USED, 'price': 100.00} try: Listing(**ipad_json) except ValidationError as ve: logging.error(ve.json()) if __name__ == '__main__': main()
Some aspects of the sample-4.py from the above needs a little explanation.
constr(max_length=LEN) :: allows one to ensure that the string value in the field does NOT exceeds the specified LEN
conint(ge=N1, le=N2) :: allows one to ensure that the int value of the field is within the inclusive range [N1, N2]
confloat(gt=N1, lt=N2) :: allows one to ensure that the float value of the field is within the range (N1, N2)
To run the Python script sample-4.py, execute the following command:
$ python3 sample-4.py
The following would be a typical output:
2022-02-19 20:22:18,687 - [ { "loc": [ "category" ], "msg": "field required", "type": "value_error.missing" }, { "loc": [ "title" ], "msg": "ensure this value has at most 25 characters", "type": "value_error.any_str.max_length", "ctx": { "limit_value": 25 } } ]
The following is the modified version of the Python script sample-4.py to demonstrate the support for custom field validators in pydantic:
# # @Author: Bhaskar S # @Blog: https://www.polarsparc.com # @Date: 19 Feb 2022 # from enum import Enum from pydantic import BaseModel, ValidationError, validator, constr from typing import Optional import logging logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO) class Category(Enum): ELECTRONICS = 'Electronics' FURNITURE = 'Furniture' TOYS = 'Toys' class Condition(Enum): NEW = 'New' USED = 'Used' class Listing(BaseModel): category: Category title: constr(max_length=25) description: Optional[str] condition: Condition price: float @validator('price') def valid_price_check(cls, val, values) -> float: logging.info(values) if val <= 0.0: raise ValueError('price cannot be <= 0.0') if val > 99.99: raise ValueError('price cannot be > 99.99') return val def main(): ipad_json = {'category': Category.ELECTRONICS, 'title': 'iPad Air 2', 'condition': Condition.USED, 'price': 100.00} try: Listing(**ipad_json) except ValidationError as ve: logging.error(ve.json()) if __name__ == '__main__': main()
One aspect of the sample-5.py from the above needs a little explanation.
@validator(FIELD_NAME) :: custom class method decorator that allows one to perform custom validation on the specified FIELD_NAME. Note that the method is a CLASS method and the first argument will be the Listing class and not an instance. The second argument is the FIELD_NAME value. The third argument is the dictionary of the valid fields along with their respective values
To run the Python script sample-5.py, execute the following command:
$ python3 sample-5.py
The following would be a typical output:
2022-02-19 20:49:03,591 - {'category': <Category.ELECTRONICS: 'Electronics'>, 'title': 'iPad Air 2', 'description': None, 'condition': <Condition.USED: 'Used'>} 2022-02-19 20:49:03,592 - [ { "loc": [ "price" ], "msg": "price cannot be > 99.99", "type": "value_error" } ]
The following is the modified version of the Python script sample-4.py to demonstrate the support for a custom root level validator, that is applicable for the entire data class in pydantic:
# # @Author: Bhaskar S # @Blog: https://www.polarsparc.com # @Date: 19 Feb 2022 # from enum import Enum from pydantic import BaseModel, ValidationError, root_validator, constr from typing import Optional import logging logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO) class Category(Enum): ELECTRONICS = 'Electronics' FURNITURE = 'Furniture' TOYS = 'Toys' class Condition(Enum): NEW = 'New' USED = 'Used' class Listing(BaseModel): category: Category title: constr(max_length=25) description: Optional[str] condition: Condition price: float @root_validator def valid_price_check(cls, values) -> dict: val = values.get('price') if val <= 0.0: raise ValueError('price cannot be <= 0.0') if val > 99.99: raise ValueError('price cannot be > 99.99') return values def main(): table_json = {'category': Category.FURNITURE, 'title': 'Coffee Table', 'condition': Condition.NEW, 'price': 35.99} table = Listing(**table_json) table.description = 'Beautiful glass top coffee table' logging.info(table.dict()) if __name__ == '__main__': main()
Some aspects of the sample-6.py from the above needs a little explanation.
@root_validator :: custom class method decorator that allows one to perform validation on the entire class. Note that the method is a CLASS method and the first argument will be the Listing class and not an instance. The second argument is the dictionary of the valid fields along with their respective values
dict() :: is method defined in the pydantic model, to encode the fields from the data class as a Python dictionary
To run the Python script sample-6.py, execute the following command:
$ python3 sample-6.py
The following would be a typical output:
2022-02-19 21:18:03,472 - {'category': <Category.FURNITURE: 'Furniture'>, 'title': 'Coffee Table', 'description': 'Beautiful glass top coffee table', 'condition': <Condition.NEW: 'New'>, 'price': 35.99}
In the Output.6, notice how the enums have been encoded and displayed. What if we want the values from the enum to be used instead ???
Also, what if we want to prevent mutation of field values once a data class instance is created ???
The following is the modified version of the Python script sample-4.py to demonstrate the support for customization of the pydantic behavior:
# # @Author: Bhaskar S # @Blog: https://www.polarsparc.com # @Date: 19 Feb 2022 # from enum import Enum from pydantic import BaseModel, root_validator, constr from typing import Optional import logging logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO) class Category(Enum): ELECTRONICS = 'Electronics' FURNITURE = 'Furniture' TOYS = 'Toys' class Condition(Enum): NEW = 'New' USED = 'Used' class Listing(BaseModel): category: Category title: constr(max_length=25) description: Optional[str] condition: Condition price: float class Config: allow_mutation = False use_enum_values = True @root_validator def valid_price_check(cls, values) -> dict: val = values.get('price') if val <= 0.0: raise ValueError('price cannot be <= 0.0') if val > 99.99: raise ValueError('price cannot be > 99.99') return values def main(): toy_json = {'category': Category.TOYS, 'title': 'Chutes and Ladders', 'condition': Condition.USED, 'price': 4.99, 'extra': 'Extra information'} toy = Listing(**toy_json) logging.info(toy.dict()) try: toy.description = 'Changing the description' except TypeError as te: logging.error('***ERROR*** {}'.format(te)) if __name__ == '__main__': main()
Some aspects of the sample-7.py from the above needs a little explanation.
class Config :: is in inner class (within the data class) to control the behavior of pydantic
use_enum_values :: option that allows one to control how pydantic deals with enums. By setting it to True, pydantic will use the enum values
allow_mutation :: option that allows one to control if mutation of field values is allowed after initialization. By setting it to False, pydantic prevents mutation
To run the Python script sample-7.py, execute the following command:
$ python3 sample-7.py
The following would be a typical output:
2022-02-19 21:22:07,194 - {'category': 'Toys', 'title': 'Chutes and Ladders', 'description': None, 'condition': 'Used', 'price': 4.99} 2022-02-19 21:22:07,194 - ***ERROR*** "Listing" is immutable and does not support item assignment
We have barely scratched the surface of pydantic. There are many more features and capabilities in pydantic.
References