-
1. 개요
Great Expectations is a useful tool to profile, validate, and document data. It helps to maintain the quality of data throughout a data workflow and pipeline.
우선 아래 링크를 참고하면 아주 간단한 ge 설치 및 이를 사용하여 데이터에 대한 체크를 진행할 수 있다.
https://cloudywithachanceofbigdata.com/great-expectations-for-your-data/
2. pandas 데이터 bigquery로 올리기
åfrom google.cloud import bigquery from google.oauth2 import service_account import pandas_gbq import great_expectations as ge import numpy as np import pandas as pd def exam(): df_raw = pd.DataFrame(np.random.randint(0, 100, size=(100, 4)), columns=list('ABCD')) df = ge.from_pandas(df_raw) credentials = service_account.Credentials.from_service_account_file( r'greatexpectations-315602-a47ff0cdec8b.json' ) pandas_gbq.context.credentials = credentials client = bigquery.Client(credentials=credentials) job = client.load_table_from_dataframe(df, 'greatexpectations-315602.test.my_table') job.result() if __name__ == '__main__': exam()
3. Bigquery Ge 연결
export GOOGLE_APPLICATION_CREDENTIALS=키 위치 매번 이렇게 하긴 귀찮으니,, 환경변수에 등록해두는걸 추천드립니다^^''
4. suite scaffold
suite: Expectation Suites combine multiple expectations into an overall description of a dataset.
아래 명령어를 치고 몇가지 세팅을 하면 쥬피터 노트북이 자동으로 열린다.
great_expectations suite scaffold mysuite
REFERENCES
https://medium.com/hashmapinc/understanding-great-expectations-and-how-to-use-it-7754c78962f4
https://cloud.google.com/docs/authentication/getting-started
https://github.com/googleapis/python-bigquery-sqlalchemy#connection-string-parameters
'2021년 > 개발공부' 카테고리의 다른 글
ngrok[jupyter local host collaboration] (0) 2021.06.11 apache Airflow (0) 2021.04.22 [논문 정리] A Contextual-Bandit Approach to Personalized News Article Recommendation (0) 2021.04.17 오늘부터 나도 인싸! 인터넷 친구만들기 😎 (0) 2021.04.15 [논문 정리] Wide & Deep Learning for Recommender Systems (0) 2021.04.14