3장 클라이언트 API : 기본기능 - Get 메서드

2018. 8. 17. 16:51

by. 위지원

O'REILLY HBASE 완벽가이드를 보고 따라한 내용

2018/07/17 - [2018년 상반기/DataBase] - 험난한 HBASE 설치기

로 HBASE 설치를 완료했다. 이제 책을 보고 공부할 수 있게 되었다 ^-^

출처: http://weejw.tistory.com/category/2018년 하반기/HBASE [위지원의 블로그]

O'REILLY HBASE 완벽가이드를 보고 따라한 내용

2018/07/17 - [2018년 상반기/DataBase] - 험난한 HBASE 설치기

로 HBASE 설치를 완료했다. 이제 책을 보고 공부할 수 있게 되었다 ^-^

출처: http://weejw.tistory.com/category/2018년 하반기/HBASE [위지원의 블로그]

O'REILLY HBASE 완벽가이드를 보고 따라한 내용

2018/07/17 - [2018년 상반기/DataBase] - 험난한 HBASE 설치기

로 HBASE 설치를 완료했다. 이제 책을 보고 공부할 수 있게 되었다 ^-^

출처: http://weejw.tistory.com/category/2018년 하반기/HBASE [위지원의 블로그]

O'REILLY HBASE 완벽가이드를 보고 따라한 내용

2018/07/17 - [2018년 상반기/DataBase] - 험난한 HBASE 설치기

로 HBASE 설치를 완료했다. 이제 책을 보고 공부할 수 있게 되었다 ^-^

출처: http://weejw.tistory.com/category/2018년 하반기/HBASE [위지원의 블로그]

O'REILLY HBASE 완벽가이드를 보고 따라한 내용

2018/07/17 - [2018년 상반기/DataBase] - 험난한 HBASE 설치기

로 HBASE 설치를 완료했다. 이제 책을 보고 공부할 수 있게 되었다 ^-^

출처: http://weejw.tistory.com/category/2018년 하반기/HBASE [위지원의 블로그]

2018/08/10 - [2018년 하반기/HBASE] - 3장 클라이언트 API : 기본기능 - Put 메서드

이어서..

CRUD(Create,Read,Update,Delete) 기능

출처: http://weejw.tistory.com/category/2018년 하반기/HBASE# [위지원의 블로그]

CRUD(Create,Read,Update,Delete) 기능

2.Get 메서드

O'REILLY HBASE 완벽가이드를 보고 따라한 내용

2018/07/17 - [2018년 상반기/DataBase] - 험난한 HBASE 설치기

로 HBASE 설치를 완료했다. 이제 책을 보고 공부할 수 있게 되었다 ^-^

출처: http://weejw.tistory.com/category/2018년 하반기/HBASE [위지원의 블로그]

CRUD(Create,Read,Update,Delete) 기능

1. Put 메서드 (단일로우대상/멀티로우대상)

출처: http://weejw.tistory.com/category/2018년 하반기/HBASE# [위지원의 블로그]

CRUD(Create,Read,Update,Delete) 기능

1. Put 메서드 (단일로우대상/멀티로우대상)

출처: http://weejw.tistory.com/category/2018년 하반기/HBASE# [위지원의 블로그

Get메서드를 이용하면 저장한 데이터를 반환 할 수 있다. Get메서드 역시 하나의 로우를 대상으로 읽기를 수행하거나 한번의 실행으로 여러개의 row를 반환하는 것이 있다.

하나의 row를 받는 메서드 형식은 아래와 같다.

Result get(Get get) throws IOException

Get 메서드의 생성자를 살펴보면 아래와 같다.

메서드야 역시 당연히 무수하게 많지만 책을 기준으로 몇가지를 가져와보았다. 우선 addColumn이나 addFamily를 이용해서 요청의 범위를 특정 컬럼패밀리 특정 컬럼으로 지정할 수 있다.

이와 비슷하게 TimeRange나 TimeStamp를 이용해서 일정 특정 TimeStamp나 기간을 설정할 수 있다.

그리고 반환 받을때 어떤 version으로 받을지 지정할 수 있다. 기본 값은 1로 지정되어있다(최근 값만 반환한다는 의미) 파라미터 없이 지정할때는 Integer.MAX_VALUE로 설정되어 실행된다. (API가 조건에 맞는 모든 버전을 다 반환 한다.)

그러나,,, 위에 써있듯이 Deprecated! API보는거 참 쉽쥬? 버전 이해를 잘 못할 수 도 있기때문에,, readAllVersions를 사용하라고 한다. readAllVersions는 아래와 같이 사용할 수 있다.

그 외의 메서드는 API를 참고하면 볼 수 있다. Put 메서드할때 언급한 것 처럼 HBase는 toByte..와 같은 메서드로 Byte 타입으로 변환해주는 헬퍼 클래스 Bytes가 있었다. 이 반대의 경우도 당연히 있다.

static String toString(byte[] b)
static boolean toBoolean(byte[] b)
static float toFloat(byte[] bytes)
static int toInt(byte[] bytes)

API를 알아봤으니 한번 써보자. 지난 Put때 만들어놓은 table이 있다. 이 table의 데이터를 Get을 이용해 가져와보자.

Result 클래스

get() 메서드를 사용하면 조건에 맞는 모든 Cell을 담고 있는 Result Class 인스턴스를 반환한다.(위 코드 참조) Result 클래스는 컬럼 패밀리, 퀄리파이어,타임스탬프등 반환한 데이터에 대한 정보에 접근할 수 있는 방법을 제공한다.

역시 Result Class도 많은 메서드를 가지고 있다. 책에서 소개한 메서드로는 다음과 같은 메서드들이 있다. (연두색으로 색칠한 두개는 책에서는 Cell을 Keyvalue로 소개하고 있으며 원래는 raw(),list() 였으나 Cells가 덧붙여졌다. (자세한건 API참조)

        int size() : 서버에서 반환한 KeyValue 인스턴스 수를 반환한다.

        byte[] getRow() : Get 클래스를 생성할 때 지정한 row key를 반환한다.

        byte[] value() : 발견된 최초 col의 최근 Cell data 반환(col은 사전 편찬식으로 정렬됨)

        byte[] getValue(byte[] family, byte[] qualifier) : 특정 Cell에 저장된 데이터를 얻을 수 있다.

        boolean isEmpty() : size()나 이 메서드를 써서 결과를 잘 반환했는지 확인하는데에 유용하다.

        List<Cell> listCells() : List<Cell>의 형태로 Result에 있는 결과를 반환

        Cell[] rawCells() : 배열 형태로 Result에 있는 결과를 반환

        *KeyValue 인스턴스는 먼저 colfam -> qualifier -> time stemp -> keyvale 유형 순으로 정렬된다.

아래의 메서드로 Result내의 정보에 컬럼 지향적으로 접근할 수 있다. 그림에도 나와있듯이 이들은 좀 더 많은 파라미터를 요구하고 있다.

        boolean containsColumn(byte[] family, byte[] qualifier) : 지정한 col에 실제로 반환된 cell이 있는지 확인할때 사용된다

        Cell getColumnLatestCell(byte[] family, byte[] qualifier)

        List<Cell> getColumnCells(byte[] family, byte[] qualifier) : 두번째 메서드와의 차이는 반환 타입뿐이다.

    * qualifier를 NULL로 할때도 있는데 드물지만 빈 퀄리파이어를 써서 컬럼패밀리 안에 컬럼이 하나밖에 없다는 의미로 사용할 수 도 있다고 한다.

    이번에는 Map 지향적인 메서드를 살펴보자.

            NavigableMap<byte[],byte[]> getFamilyMap(byte[] family) : 존재하는 모든 버전을 반환한다.
            NavigableMap<byte[],NavigableMap<byte[],NavigableMap<Long,byte[]>>> getMap() : 가장 포괄적인 메서드이다. Map 클래스 인스턴스에 담아 반환하기때문에 이 Map 인스턴스를 이터레이트 하면 모든 값에 접근이 가능하다.
            NavigableMap<byte[],NavigableMap<byte[],byte[]>> getNoVersionMap() : 최근 Cell만을 반환한다.

    어차피 데이터는 이미,,, 클라이언트에게 전송된 상태이기때문에 위 어떤 메서드를 쓰든지 성능/리소스에 불이익을 주지않는다. 원하는 형태의 메서드를 사용하면 된다.

Get 리스트

한번의 요청으로 하나 이상의 row를 받을 수 있다.

Result[] get(List<Get> gets) throws IOException

백개의 API 설명서만 주구장창 읽는것보다 한번 코딩하는것이 더 유익크한것..., 코딩해보자


    HBaseConfiguration conf =new HBaseConfiguration(new Configuration());

    Connection connection = ConnectionFactory.createConnection(conf);
    Admin admin = connection.getAdmin();

    Table table = connection.getTable(TableName.valueOf("testtable"));

    byte[] cf1 = Bytes.toBytes("colfam1"); //넘겨주기위해 파라미터를 미리 준비한다.
    byte[] qf1 = Bytes.toBytes("qual1");
    byte[] qf2 = Bytes.toBytes("qual2");
    byte[] row1 = Bytes.toBytes("row1");
    byte[] row2 = Bytes.toBytes("row2");

    List<Get> gets = new ArrayList<Get>(); //Get 인스턴스들을 담을 List를 준비한다.

    Get get1 = new Get(row1); //Get 인스턴스를 추가해준다.
    get1.addColumn(cf1, qf1); 
    gets.add(get1);

    Get get2 = new Get(row2);
    get2.addColumn(cf1, qf1);
    gets.add(get2);

    Get get3 = new Get(row2);
    get3.addColumn(cf1, qf2);
    gets.add(get3);


    Result[] results = table.get(gets); //row를 HBASE로부터 GET!

    System.out.println("First iteration..."); //이터레이터 하여 어떤 값을 반환했는지 확인한다.
    for (Result result : results) {
        String row = Bytes.toString(result.getRow());
        System.out.print("Row: " + row + " ");
        byte[] val = null;
        if (result.containsColumn(cf1, qf1)) {
            val = result.getValue(cf1, qf1);
            System.out.println("Value: " + Bytes.toString(val));
        }
        if (result.containsColumn(cf1, qf2)) {
            val = result.getValue(cf1, qf2);
            System.out.println("Value: " + Bytes.toString(val));
        }
    }

    System.out.println("Second iteration..."); //first와 second를 나눈 이유는 결과 화면을 보면 결과는 같다. 
    for (Result result : results) {              다양한 방법으로 접근이 가능하다는것을 보여주는 것이다.
        for (Cell cell : result.listCells()) {
            System.out.println(
                    "Row: " + Bytes.toString(
                            cell.getRowArray(), cell.getRowOffset(), cell.getRowLength()) +
                            " Value: " + Bytes.toString(CellUtil.cloneValue(cell)));
        }
    }

    System.out.println("Third iteration...");
    for (Result result : results) {
        System.out.println(result);
    }
    table.close();
    connection.close();
}

First iteration...
Row: row1 Value: val1
Row: row2 Value: val1
Row: row2 Value: val3
Second iteration...
Row: row1 Value: val1
Row: row2 Value: val1
Row: row2 Value: val3
Third iteration...
keyvalues={row1/colfam1:qual1/1533893759170/Put/vlen=4/seqid=0} //result는 이러한 값들을 담고 있다!
keyvalues={row2/colfam1:qual1/1534487638824/Put/vlen=4/seqid=0}
keyvalues={row2/colfam1:qual2/1534487638824/Put/vlen=4/seqid=0}

** 나는 원래 테이블에 값이 부족해서 null로 뜨는게 좀 있어서,,아래와 같이 코드를 좀 추가해서 값을 넣어주었다. (지난번 배운 Put 이용)

Put put = new Put(Bytes.toBytes("row2"));

put.addColumn(Bytes.toBytes("colfam1"),Bytes.toBytes("qual1"),Bytes.toBytes("val1"));
put.addColumn(Bytes.toBytes("colfam1"),Bytes.toBytes("qual2"),Bytes.toBytes("val2"));
put.addColumn(Bytes.toBytes("colfam1"),Bytes.toBytes("qual2"),Bytes.toBytes("val3"));

table.put(put);

Put put2 = new Put(Bytes.toBytes("row1"));
put2.addColumn(Bytes.toBytes("colfam1"),Bytes.toBytes("qual2"),Bytes.toBytes("val3"));

table.put(put2);

Put을 배울때는 에러가 나도 그 외의 요청은 데이터가 잘 들어갔었다.(아래 그림 참고)

하지만 Get은 냉혈한 메서드이다. 제대로 반환을 하던지 오류를 내던지 확고하다. 아래 코드를 추가해보았다. 고냥 바로 에러만 뱉어낸다.

Get get4 = new Get(row2);
get4.addColumn(Bytes.toBytes("BOGUS"), qf2);
gets.add(get4);

Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action:

org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family BOGUS does not exist in region testtable,,1533893757600.3b24bd35395c0f0a78503601b015a474. in table 'testtable', {NAME => 'colfam1', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}
        at org.apache.hadoop.hbase.regionserver.HRegion.checkFamily(HRegion.java:7982)
        at org.apache.hadoop.hbase.regionserver.HRegion.prepareGet(HRegion.java:7269)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2512)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:834)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2673)
        at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42014)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
: 1 time, servers with issues: netdb.slave2.com,16020,1534483987538
        at org.apache.hadoop.hbase.client.BatchErrors.makeException(BatchErrors.java:54)
        at org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.getErrors(AsyncRequestFutureImpl.java:1225)
        at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:455)
        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:405)
        at PutExample.main(PutExample.java:78)

실패를 처리할 수 있는 방법은 batch()가 있는데 이는 뒤에서 살펴 볼 것이다.

Get 관련 기타 메서드

Result class를 보면 아래와 같은 메서드가 있다. exists=>getExists로 변경되었다. ( ㅡㅡ 어디 크래스에 있는지 왜안알랴줌 )

getExists
```
public Boolean getExists()
```

아래와 같이 코드를 작성한다.

Table table = connection.getTable(TableName.valueOf("testtable"));

Get get1 = new Get(Bytes.toBytes("row2"));
get1.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"));
get1.setCheckExistenceOnly(true);
Result result1 = table.get(get1);

System.out.println("Get 1 Exists: "+result1.getExists());

Get 1 Exists: true

위에 코드를 보면 아래 메서드를 사용하고 있다. 이를 사용해야 exists의 값이 나온다. ( 이 코드를 빼고 출력해보니 true가 아니라 null이 나온다 )

setCheckExistenceOnly

public Get setCheckExistenceOnly(boolean checkExistenceOnly)

HBASE-13954 | Major | Remove HTableInterface#getRowOrBefore related server side code

Removed Table#getRowOrBefore, Region#getClosestRowBefore, Store#getRowKeyAtOrBefore, RemoteHTable#getRowOrBefore apis and Thrift support for getRowOrBefore. Also removed two coprocessor hooks preGetClosestRowBefore and postGetClosestRowBefore. User using this api can instead use reverse scan something like below, {code} Scan scan = new Scan(row); scan.setSmall(true); scan.setCaching(1); scan.setReversed(true); scan.addFamily(family); {code} pass this scan object to the scanner and retrieve the first Result from scanner output.

네! 사라졌습니다!! 대신 Scanner를 사용해서 할 수 있습니다. 아래는 그 예제다.

table에 데이터를 넣어줍니당..

List<Put> puts = new ArrayList<Put>();
Put put1 = new Put(Bytes.toBytes("row1"));
put1.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"),
        Bytes.toBytes("val1"));
puts.add(put1);
Put put2 = new Put(Bytes.toBytes("row2"));
put2.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"),
        Bytes.toBytes("val2"));
puts.add(put2);
Put put3 = new Put(Bytes.toBytes("row2"));
put3.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("qual2"),
        Bytes.toBytes("val3"));
puts.add(put3);
table.put(puts);

hbase(main):007:0> scan 'testtable'
ROW                           COLUMN+CELL
row1                         column=colfam1:qual1, timestamp=1534490202898, value=val1
row1                         column=colfam1:qual2, timestamp=1534489206240, value=val3
row2                         column=colfam1:qual1, timestamp=1534490202898, value=val2
row2                         column=colfam1:qual2, timestamp=1534490202898, value=val3
2 row(s)
Took 0.2184 seconds

그리고 아래와 같이 코드를 작성해서 결과를 확인!

Table table = connection.getTable(TableName.valueOf("testtable"));

Get get1 = new Get(Bytes.toBytes("row3"));
get1.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"));
Result result1 = table.get(get1);

System.out.println("Get 1 isEmpty: " + result1.isEmpty());
CellScanner scanner1 = result1.cellScanner();
while (scanner1.advance()) {
    System.out.println("Get 1 Cell: " + scanner1.current());
}

//row3은 table에 없다.  그렇기때문에 isEmpty는 true가 나오고 스캐너로 스캔해봤자 결과는 나오지 않는다.
/////////////////////////////////////////////////////////////////////////////////////////////////
Get get2 = new Get(Bytes.toBytes("row3"));
get2.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"));
get2.setClosestRowBefore(true);
Result result2 = table.get(get2);


System.out.println("Get 2 isEmpty: " + result2.isEmpty());
CellScanner scanner2 = result2.cellScanner();
while (scanner2.advance()) {
    System.out.println("Get 2 Cell: " + scanner2.current());
}

get2역시 row3을 원하고 있기때문에 empty가 true가 나온다.  그리고  setClosestRowBefore(true)
/////////////////////////////////////////////////////////////////////////////////////////////////
Get get3 = new Get(Bytes.toBytes("row2"));
get3.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"));
get3.setClosestRowBefore(true);
Result result3 = table.get(get3);

System.out.println("Get 3 isEmpty: " + result3.isEmpty());
CellScanner scanner3 = result3.cellScanner();
while (scanner3.advance()) {
    System.out.println("Get 3 Cell: " + scanner3.current());
}
row2는 존재한다.  empty는 false가 나오고 스캐너로 스캔했을때는 원했던 데이터를 화면에 출력한다.

Get 1 isEmpty: true
Get 2 isEmpty: true
Get 3 isEmpty: false
Get 3 Cell: row2/colfam1:qual1/1534490202898/Put/vlen=4/seqid=0

저작자표시 (새창열림)

'2018년 > DataBase' 카테고리의 다른 글

3장 클라이언트 API : 기본기능 - 일괄처리 연산 (0)	2018.08.17
3장 클라이언트 API : 기본기능 - Delete 메서드 (0)	2018.08.17
3장 클라이언트 API : 기본기능 - Put 메서드 (0)	2018.08.10
험난한 HBASE 설치기 (0)	2018.07.17
데이터마이닝 개념과 기법 개요 (2)	2018.03.15

잠깐만요~! 읽으신김에 이런 글들은 어떠세요? 👀

맨 위로

데이터를 사랑하고 궁금해하는 기록쟁이입니다! 😉 Super Data Girl이 되는 그날까지🏃‍♀️ 화이팅!

3장 클라이언트 API : 기본기능 - Get 메서드

getExists

setCheckExistenceOnly

'2018년 > DataBase' 카테고리의 다른 글

티스토리툴바

티스토리툴바