Evaluation Parameters
Often, the specific parameters associated with an Expectation will be derived from upstream steps in a processing
pipeline. For example, we may want to expect_table_row_count_to_equal
a value stored in a previous step.
Great Expectations makes it possible to use "Evaluation Parameters" to accomplish that goal. We declare Expectations using parameters that need to be provided at validation time. During interactive development, we can even provide a temporary value that should be used during the initial evaluation of the Expectation.
my_df.expect_table_row_count_to_equal( value={"$PARAMETER": "upstream_row_count", "$PARAMETER.upstream_row_count": 10}, result_format={'result_format': 'BOOLEAN_ONLY'})
This will return {'success': True}
.
More typically, when validating Expectations, you can provide Evaluation Parameters that are only available at runtime:
my_df.validate( expectation_suite=my_dag_step_config, evaluation_parameters={"upstream_row_count": upstream_row_count})
#
Evaluation Parameter expressionsIn many cases, Evaluation Parameters are most useful when they can allow a range of values. For example, we might want to specify that a new table's row count should be between 90 - 110 % of an upstream table's row count (or a count from a previous run). Evaluation parameters support basic arithmetic expressions to accomplish that goal:
my_df.expect_table_row_count_to_be_between( min_value={"$PARAMETER": "trunc(upstream_row_count * 0.9)"}, max_value={"$PARAMETER": "trunc(upstream_row_count * 1.1)"}, result_format={'result_format': 'BOOLEAN_ONLY'})
This will return {'success': True}
.
Evaluation Parameters are not limited to simple values, for example you could include a list as a parameter value:
my_df.expect_column_values_to_be_in_set( "my_column", value_set={"$PARAMETER": "runtime_values"})my_df.validate( evaluation_parameters={"runtime_values": [1, 2, 3]})
However, it is not possible to mix complex values with arithmetic expressions.
#
Storing Evaluation Parameters#
Data Context Evaluation Parameter StoreA Data Context can automatically identify and store Evaluation Parameters that are referenced in other Expectation Suites. The Evaluation Earameter Store uses a URN schema for identifying dependencies between Expectation Suites.
The Data Context-recognized URN must begin with the string urn:great_expectations
. Valid URNs must have one of the
following structures to be recognized by the Great Expectations Data Context:
urn:great_expectations:validations:<expectation_suite_name>:<metric_name>urn:great_expectations:validations:<expectation_suite_name>:<metric_name>:<metric_kwargs_id>
Replace names in <>
with the desired name. For example:
urn:great_expectations:validations:dickens_data:expect_column_proportion_of_unique_values_to_be_between.result.observed_value:column=Title
#
Storing Parameters in an Expectation SuiteYou can also store parameter values in a special dictionary called evaluation_parameters
that is stored in the
expectation_suite
to be available to multiple Expectations or while declaring additional Expectations:
my_df.set_evaluation_parameter("upstream_row_count", 10)my_df.get_evaluation_parameter("upstream_row_count")
If a parameter has been stored, then it does not need to be provided for a new expectation to be declared:
my_df.set_evaluation_parameter("upstream_row_count", 10)my_df.expect_table_row_count_to_be_between( max_value={"$PARAMETER": "upstream_row_count"})