version: 2models: - name: < model name > columns: - name: < column name > data_tests: - elementary.unstructured_data_validation: arguments: expectation_prompt: "Description of what the text should contain or represent" llm_model_name: "model_name"
AI Data Tests (Beta)
Unstructured Data Validations
Copy
Ask AI
version: 2models: - name: < model name > columns: - name: < column name > data_tests: - elementary.unstructured_data_validation: arguments: expectation_prompt: "Description of what the text should contain or represent" llm_model_name: "model_name"
Beta Feature: Unstructured data validation tests is currently in beta. The functionality and interface may change in future releases.Version Requirement: This feature requires Elementary dbt package version 0.18.0 or above.
Elementary’s elementary.unstructured_data_validation test allows you to validate unstructured data using AI and LLM language models. Instead of writing complex code, you can simply describe what you expect from your data in plain English, and Elementary will check if your data meets those expectations.For example, you can verify that customer feedback comments are in English, product descriptions contain required information, or support tickets follow a specific format or a sentiment.
expectation_prompt: Describe what you expect from the text in plain English
llm_model_name: Specify which AI model to use (see recommendations above for each warehouse)
This test works with any column containing unstructured text data such as descriptions, comments, or other free-form text fields. It can also be applied to structured columns that can be converted to strings, enabling natural language data validations.
Copy
Ask AI
version: 2models: - name: < model name > columns: - name: < column name > data_tests: - elementary.unstructured_data_validation: arguments: expectation_prompt: "Description of what the text should contain or represent" llm_model_name: "model_name"
models: - name: medicine_prescriptions description: "A table containing medicine prescriptions." columns: - name: doctor_notes description: "A column containing the doctor notes on the prescription" data_tests: - elementary.unstructured_data_validation: arguments: expectation_prompt: "The prescription has to include a limited time period and recommendations to the patient" llm_model_name: "claude-3-5-sonnet"
Test fails if: A doctor’s note does not specify a time period or lacks recommendations for the patient.
models: - name: summarized_pdfs description: "A table containing a summary of our ingested PDFs." columns: - name: pdf_summary description: "A column containing the main PDF's content summary." data_tests: - elementary.validate_similarity: arguments: to: ref('pdf_source_table') column: pdf_content match_by: pdf_name
Test fails if: A PDF summary does not accurately represent the original PDF’s content. The validation will use the pdf name as the key to match a summary from the pdf_summary table to the pdf_content in the pdf_source_table.