2021년/개발공부

Greate Expectations

위지원 2021. 6. 2. 21:52

1. 개요

Great Expectations is a useful tool to profile, validate, and document data. It helps to maintain the quality of data throughout a data workflow and pipeline.

 

https://greatexpectations.io/

Dagster A data orchestrator for machine learning, analytics, and ETL.

greatexpectations.io

우선 아래 링크를 참고하면 아주 간단한 ge 설치 및 이를 사용하여 데이터에 대한 체크를 진행할 수 있다.

https://cloudywithachanceofbigdata.com/great-expectations-for-your-data/

 

Great Expectations (for your data...) | Cloudy with a chance of Big Data

This article provides an introduction to the Great Expectations Python library for data quality management (https://github.com/great-expectations/great_expectations). So what are expectations when it comes to data (and data quality)… An expectation is a

k40.c82.myftpupload.com

이런식으로 ,, 

 

2. pandas 데이터 bigquery로 올리기

åfrom google.cloud import bigquery
from google.oauth2 import service_account
import pandas_gbq
import great_expectations as ge
import numpy as np
import pandas as pd


def exam():
    df_raw = pd.DataFrame(np.random.randint(0, 100, size=(100, 4)), columns=list('ABCD'))
    df = ge.from_pandas(df_raw)
    credentials = service_account.Credentials.from_service_account_file(
        r'greatexpectations-315602-a47ff0cdec8b.json'
    )
    pandas_gbq.context.credentials = credentials
    client = bigquery.Client(credentials=credentials)
    job = client.load_table_from_dataframe(df, 'greatexpectations-315602.test.my_table')
    job.result()


if __name__ == '__main__':
    exam()

 

3. Bigquery Ge 연결

 export GOOGLE_APPLICATION_CREDENTIALS=키 위치
 
매번 이렇게 하긴 귀찮으니,, 환경변수에 등록해두는걸 추천드립니다^^''

 

4.  suite scaffold

suite: Expectation Suites combine multiple expectations into an overall description of a dataset. 

아래 명령어를 치고 몇가지 세팅을 하면 쥬피터 노트북이 자동으로 열린다.

great_expectations suite scaffold mysuite

 

REFERENCES

https://docs.greatexpectations.io/en/0.11.6/reference/core_concepts/expectations/expectations.html#expectation-suites

https://medium.com/hashmapinc/understanding-great-expectations-and-how-to-use-it-7754c78962f4

https://cloud.google.com/docs/authentication/getting-started

https://github.com/googleapis/python-bigquery-sqlalchemy#connection-string-parameters