Greate Expectations
1. 개요
Great Expectations is a useful tool to profile, validate, and document data. It helps to maintain the quality of data throughout a data workflow and pipeline.
https://greatexpectations.io/
Dagster A data orchestrator for machine learning, analytics, and ETL.
greatexpectations.io
우선 아래 링크를 참고하면 아주 간단한 ge 설치 및 이를 사용하여 데이터에 대한 체크를 진행할 수 있다.
https://cloudywithachanceofbigdata.com/great-expectations-for-your-data/
Great Expectations (for your data...) | Cloudy with a chance of Big Data
This article provides an introduction to the Great Expectations Python library for data quality management (https://github.com/great-expectations/great_expectations). So what are expectations when it comes to data (and data quality)… An expectation is a
k40.c82.myftpupload.com
2. pandas 데이터 bigquery로 올리기
åfrom google.cloud import bigquery
from google.oauth2 import service_account
import pandas_gbq
import great_expectations as ge
import numpy as np
import pandas as pd
def exam():
df_raw = pd.DataFrame(np.random.randint(0, 100, size=(100, 4)), columns=list('ABCD'))
df = ge.from_pandas(df_raw)
credentials = service_account.Credentials.from_service_account_file(
r'greatexpectations-315602-a47ff0cdec8b.json'
)
pandas_gbq.context.credentials = credentials
client = bigquery.Client(credentials=credentials)
job = client.load_table_from_dataframe(df, 'greatexpectations-315602.test.my_table')
job.result()
if __name__ == '__main__':
exam()
3. Bigquery Ge 연결
export GOOGLE_APPLICATION_CREDENTIALS=키 위치
매번 이렇게 하긴 귀찮으니,, 환경변수에 등록해두는걸 추천드립니다^^''
4. suite scaffold
suite: Expectation Suites combine multiple expectations into an overall description of a dataset.
아래 명령어를 치고 몇가지 세팅을 하면 쥬피터 노트북이 자동으로 열린다.
great_expectations suite scaffold mysuite
REFERENCES
https://medium.com/hashmapinc/understanding-great-expectations-and-how-to-use-it-7754c78962f4
https://cloud.google.com/docs/authentication/getting-started
https://github.com/googleapis/python-bigquery-sqlalchemy#connection-string-parameters