How to install Great Expectations on a Spark EMR Cluster
This guide will help you Install Great Expectations on a Spark EMR cluster.
#
Steps#
1. Install Great ExpectationsThe guide demonstrates the recommended path for instantiating a Data Context without a full configuration directory and without using the Great Expectations Command Line Interface (CLI)
sc.install_pypi_package("great_expectations")
#
2. Configure a Data Context in codeFollow the steps for creating an in-code Data Context in How to instantiate a Data Context without a yml file
Here is Python code that instantiates and configures a Data Context in code for an EMR Spark cluster. Copy this snippet into a cell in your EMR Spark notebook or use the other examples to customize your configuration. Execute the snippet to instantiate a Data Context in memory.
Then copy the following code snippet into a cell in your EMR Spark notebook, run it and verify that no error is displayed:
context.list_datasources()
ππ Congratulations! ππ You successfully installed Great Expectations.