알고 쓰자! hbase | devon 2012
TRANSCRIPT
![Page 1: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/1.jpg)
알고 쓰자! HBase
데이터기술팀 유응섭
![Page 2: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/2.jpg)
HBase의 특징
부족한 기능에 대한 보완
주의사항
![Page 3: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/3.jpg)
HBase?
![Page 4: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/4.jpg)
HBase?
Hadoop Database
for Real-time Services
![Page 5: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/5.jpg)
HBase는 이런 점이 좋아요
![Page 6: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/6.jpg)
Scalable
노드 수에 비례해서 읽기 쓰기 성능 증가
수 백대의 노드
Peta Byte 수준의 데이터
![Page 7: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/7.jpg)
Strong Consistency
Not Eventual Consistency
클라이언트는 항상 최신 데이터를 엑세스
메시징 서비스 등에 적합
![Page 8: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/8.jpg)
Ordered Key-value
Row Key, Column Key
Range Scan 가능
![Page 9: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/9.jpg)
이런 기능은 부족해요
![Page 10: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/10.jpg)
이런 기능은 부족해요
복잡한 오퍼레이션
Secondary Index
…
![Page 11: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/11.jpg)
부족한 기능은
어떻게 보완할까요?
![Page 12: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/12.jpg)
Coprocessor
![Page 13: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/13.jpg)
Coprocessor
Endpoint = RDBMS Procedure
Observer = RDBMS Trigger
서버 프로세스 내에서 동작
![Page 14: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/14.jpg)
Endpoint
Parallel Execution
MapReduce와 유사한 방식
![Page 15: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/15.jpg)
Observer
서버에서 발생하는 이벤트에 반응
pre/post hook 가능
![Page 16: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/16.jpg)
부족한 기능에 대한 보완
Endpoint: Group-by Operator
Observer: Secondary Index
![Page 17: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/17.jpg)
Group-by Operator
select c1, c2, count(v1) from tab group by c1, c2;
Configuration conf = HBaseConfiguration.create(); GroupByClient client = new GroupByClient(conf); long queryID = client.aggregate("tab", "d:c1,d:c2", "count(d:v1)"); GroupByClient.printResult(queryID);
![Page 18: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/18.jpg)
Group-by Operator
![Page 19: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/19.jpg)
Group-by Operator
Performance 9 Region Servers. 24GB RAM. 4Cores(with HT). 4 HDDs
1억 건. 8.6GB(Snappy. 압축율 약 28%)
38
640
0
200
400
600
800
Duration
GroupBy
Count(shell)
(단위: 초)
![Page 20: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/20.jpg)
Secondary Index
![Page 21: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/21.jpg)
Secondary Index
Dual Write
![Page 22: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/22.jpg)
Secondary Index
Dual Write
Observer를 이용한 구현
클라이언트는 코드 수정이 필요 없음
부하를 많이 주었을 때 문제 발생해서 보류
![Page 23: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/23.jpg)
Secondary Index
Dual Write
HIndexedTable extends HTable
Override put/scan method
![Page 24: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/24.jpg)
Secondary Index
IndexManager
![Page 25: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/25.jpg)
Secondary Index
INDEX_META table
![Page 26: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/26.jpg)
Secondary Index
Scan Performance 9 Region Servers. 24GB RAM. 4Cores(with HT). 4 HDDs
1억 건. 8.6GB(Snappy. 압축율 약 28%)
308
172
7 1 0
100
200
300
400
Filtered Scan
(w/o index)
1% 0.10% 0.01%
(단위: 초)
![Page 27: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/27.jpg)
HBase를 쓸 때
주의할 점은 뭐가 있나요?
![Page 28: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/28.jpg)
Durability
![Page 29: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/29.jpg)
Durability
HBase에 성공적으로 저장된 데이터는
유실 되어서는 안 된다
![Page 30: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/30.jpg)
Durability
HBase는 HDFS에 데이터파일을 저장
HDFS는 kernel buffer로 flush만 함
kernel buffer는 정전 시 유실 됨
![Page 31: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/31.jpg)
Durability
다수 노드 장애 시
데이터 복구 불가능할 수 있음
UPS, queue & snapshot
![Page 32: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/32.jpg)
Durability
다음 HDFS버전에서 해결될 예정
![Page 33: 알고 쓰자! HBase | Devon 2012](https://reader034.vdocuments.mx/reader034/viewer/2022050922/558d24dad8b42a2c478b464c/html5/thumbnails/33.jpg)
감사합니다.