-
1. 개요
Great Expectations is a useful tool to profile, validate, and document data. It helps to maintain the quality of data throughout a data workflow and pipeline.
https://greatexpectations.io/
Dagster A data orchestrator for machine learning, analytics, and ETL.
greatexpectations.io
우선 아래 링크를 참고하면 아주 간단한 ge 설치 및 이를 사용하여 데이터에 대한 체크를 진행할 수 있다.
https://cloudywithachanceofbigdata.com/great-expectations-for-your-data/
Great Expectations (for your data...) | Cloudy with a chance of Big Data
This article provides an introduction to the Great Expectations Python library for data quality management (https://github.com/great-expectations/great_expectations). So what are expectations when it comes to data (and data quality)… An expectation is a
k40.c82.myftpupload.com
이런식으로 ,, 2. pandas 데이터 bigquery로 올리기
åfrom google.cloud import bigqueryfrom google.oauth2 import service_accountimport pandas_gbqimport great_expectations as geimport numpy as npimport pandas as pddef exam():df_raw = pd.DataFrame(np.random.randint(0, 100, size=(100, 4)), columns=list('ABCD'))df = ge.from_pandas(df_raw)credentials = service_account.Credentials.from_service_account_file(r'greatexpectations-315602-a47ff0cdec8b.json')pandas_gbq.context.credentials = credentialsclient = bigquery.Client(credentials=credentials)job = client.load_table_from_dataframe(df, 'greatexpectations-315602.test.my_table')job.result()if __name__ == '__main__':exam()3. Bigquery Ge 연결
export GOOGLE_APPLICATION_CREDENTIALS=키 위치매번 이렇게 하긴 귀찮으니,, 환경변수에 등록해두는걸 추천드립니다^^''4. suite scaffold
suite: Expectation Suites combine multiple expectations into an overall description of a dataset.
아래 명령어를 치고 몇가지 세팅을 하면 쥬피터 노트북이 자동으로 열린다.
great_expectations suite scaffold mysuite
REFERENCES
https://medium.com/hashmapinc/understanding-great-expectations-and-how-to-use-it-7754c78962f4
https://cloud.google.com/docs/authentication/getting-started
https://github.com/googleapis/python-bigquery-sqlalchemy#connection-string-parameters
위지원데이터 엔지니어로 근무 중에 있으며 데이터와 관련된 일을 모두 좋아합니다!. 특히 ETL 부분에 관심이 가장 크며 데이터를 빛이나게 가공하는 일을 좋아한답니다 ✨
'2021년 > 개발공부' 카테고리의 다른 글
ngrok[jupyter local host collaboration] (0) 2021.06.11 apache Airflow (0) 2021.04.22 [논문 정리] A Contextual-Bandit Approach to Personalized News Article Recommendation (0) 2021.04.17 오늘부터 나도 인싸! 인터넷 친구만들기 😎 (0) 2021.04.15 [논문 정리] Wide & Deep Learning for Recommender Systems (0) 2021.04.14