ååã®èšäºã§ã¯ãããŒãã«ã®äœæãããŒã¿ã®ããŒããåçŽãªSELECTã¯ãšãªã®å®è¡ãªã©ãåºæ¬çãªãã€ãæ©èœã«ã€ããŠæ€èšããŸããã 次ã«ãHiveãæ倧éã«æŽ»çšããé«åºŠãªæ©èœã«ã€ããŠèª¬æããŸãã

ãŠãŒã¶ãŒå®çŸ©é¢æ°
Hiveã䜿çšããéã®äž»ãªé害ã®1ã€ã¯ãæšæºSQLãã¬ãŒã ã¯ãŒã¯ã®å¶çŽã§ãã ãã®åé¡ã¯ãèšèªæ¡åŒµæ©èœãããããããŠãŒã¶ãŒå®çŸ©é¢æ°ãã䜿çšããããšã§è§£æ±ºã§ããŸãã ããªã䟿å©ãªæ©èœãHiveèšèªã«çµã¿èŸŒãŸããŠããŸãã ç§ã®æèŠã§æãèå³æ·±ããã®ã®ããã€ãã以äžã«ç€ºããŸãïŒ å ¬åŒææžããåŸãæ å ±ïŒã
ãžã§ã³ãœã³
倧ããªããŒã¿ãæ±ããšãã®ããªãäžè¬çãªã¿ã¹ã¯ã¯ãjson圢åŒã§ä¿åãããéæ§é åããŒã¿ã®åŠçã§ãã jsonãã€ãã䜿çšããã«ã¯ãjsonããã¥ã¡ã³ãããå€ãæœåºã§ããç¹å¥ãªget_json_objectã¡ãœããããµããŒãããŸãã JSONPathè¡šèšã®éå®ããŒãžã§ã³ã¯ããªããžã§ã¯ãããå€ãæœåºããããã«äœ¿çšãããŸãã 次ã®æäœããµããŒããããŠããŸãã
- $ïŒã«ãŒããªããžã§ã¯ããè¿ããŸã
- ãïŒåãªããžã§ã¯ããè¿ããŸã
- []ïŒé åã®ã€ã³ããã¯ã¹ã«ãã¯ã¢ãã
- *ïŒã®ã¯ã€ã«ãã«ãŒã
å ¬åŒããã¥ã¡ã³ãããJsonãæäœããäŸïŒ
1ã€ã®åïŒjsonïŒãš1ã€ã®è¡ã§æ§æãããããŒãã«src_jsonããããšããŸãã
{"store": {"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}], "bicycle":{"price":19.95,"color":"red"} }, "email":"amy@only_for_json_udf_test.net", "owner":"amy" }
ããŒãã«ã¯ãšãªã®äŸïŒ
hive> SELECT get_json_object(src_json.json, '$.owner') FROM src_json; amy hive> SELECT get_json_object(src_json.json, '$.store.fruit\[0]') FROM src_json; {"weight":8,"type":"apple"} hive> SELECT get_json_object(src_json.json, '$.non_exist_key') FROM src_json; NULL
Xpath
åæ§ã«ããã€ãã䜿çšããŠåŠçããå¿ èŠãããããŒã¿ãjsonã§ã¯ãªãXMLã«æ ŒçŽãããŠããå Žåã xpathé¢æ°ã䜿çšããŠåŠçã§ããŸããããã«ããã é©åãªèšèªã䜿çšããŠXMLã解æã§ããŸãã xpathã䜿çšããŠxmlããŒã¿ã解æããäŸïŒ
hive> select xpath('<a><b>b1</b><b>b2</b></a>','a/*/text()') from sample_table limit 1 ; ["b1","b2"]
ãã®ä»ã®äŸ¿å©ãªçµã¿èŸŒã¿é¢æ°ïŒ
çµã¿èŸŒã¿ã©ã€ãã©ãªã«ã¯ãããªãè±å¯ãªçµã¿èŸŒã¿é¢æ°ã®ã»ãããå«ãŸããŠããŸãã ããã€ãã®ã°ã«ãŒããåºå¥ã§ããŸãã
- æ°åŠé¢æ°ïŒsinãcosãlogã...ïŒ
- æå»ãæäœããããã®é¢æ°ïŒfrom_unix_timestampãto_dateãçŸåšã®æ¥ä»ãæéïŒæååæ¥ä»ïŒãtimediffãªã©ïŒ-æ¥ä»ãšæå»ãå€æããããã®éåžžã«è±å¯ãªé¢æ°ã®éžæ
- æååãæäœããããã®é¢æ°ã lenghãreverseãregexpãªã©ã®äžè¬çã«é©çšå¯èœãªé¢æ°ãšããã§ã«èæ ®ãããŠããparse_urlãget_json_objectãªã©ã®ç¹å®ã®é¢æ°ã®äž¡æ¹ããµããŒããããŠããŸãïŒ
- å€ãã®ç°ãªãã·ã¹ãã é¢æ°-current_userãcurrent_databaseã...
- æå·åé¢æ°-shaãmd5ãaes_encryptãaes_decrypt ...
ãã€ãã«çµã¿èŸŒãŸããé¢æ°ã®å®å šãªãªã¹ãã¯ã ããã«ãããŸã ã
ç¬èªã®UDFã®äœæ
åé¡ã解決ããããã«ããã€ãã«çµã¿èŸŒãŸããæ©èœãåžžã«ååãšã¯éããŸããã çµã¿èŸŒã¿é¢æ°ããªãå Žåã¯ãç¬èªã®UDFãäœæã§ããŸãã ããã¯ãJavaèšèªã§è¡ãããŸãã
æååãå°æåã«å€æããç°¡åãªé¢æ°ã®äŸã䜿çšããŠãç¬èªã®UDFã®äœæã調ã¹ãŠã¿ãŸãããã
1. com / example / hive / udfããã±ãŒãžãäœæãããã®äžã«Lower.javaã¯ã©ã¹ãäœæããŸãã
mkdir -p com/example/hive/udf edit com/example/hive/udf/Lower.java
2. Lowerã¯ã©ã¹èªäœãå®è£ ããŸãã
package com.example.hive.udf; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text; public final class Lower extends UDF { public Text evaluate(final Text s) { if (s == null) { return null; } return new Text(s.toString().toLowerCase()); } }
3.å¿ èŠãªã©ã€ãã©ãªãCLASSPATHã«è¿œå ããŸãïŒhadoopãã£ã¹ããªãã¥ãŒã·ã§ã³ã§ã¯ãjarãã¡ã€ã«ãžã®ãªã³ã¯ãè¥å¹²ç°ãªãå ŽåããããŸãïŒã
export CLASSPATH=/opt/cloudera/parcels/CDH/lib/hive/lib/hive-exec.jar:/opt/cloudera/parcels/CDH/lib/hadoop/hadoop-common.jar
4. UDF-kuãã³ã³ãã€ã«ããjarã¢ãŒã«ã€ããåéããŸãã
javac com/example/hive/udf/Lower.java jar cvf my_udf.jar *
5.é¢æ°ããã€ãã§äœ¿çšããã«ã¯ãæ瀺çã«å®£èšããå¿ èŠããããŸãã
hive> ADD JAR my_udf.jar; hive> create temporary function my_lower as 'com.example.hive.udf.Lower'; hive> select my_lower('HELLO') from sample_table limit 1; hello
ã¹ã¯ãªããã䜿çšããŠããŒãã«ãå€æãã
æšæºã®HIVEæ©èœãæ¡åŒµãããã1ã€ã®æ¹æ³ã¯ãTRANSFORMã¡ãœããã䜿çšããããšã§ããããã«ãããä»»æã®ããã°ã©ãã³ã°èšèªã®ã«ã¹ã¿ã ã¹ã¯ãªããã䜿çšããŠããŒã¿ãå€æã§ããŸãïŒããã¯ãJavaãæ°ã«å ¥ãããudfsãæžããããªãå Žåã«ç¹ã«é©ããŠããŸãïŒã
ã³ãã³ãã䜿çšããããã®æ§æã¯æ¬¡ã®ãšããã§ãã
SELECT TRANSFORM(<columns>) USING <script> as <new_columns>
<script>-ãã®å Žåãstdinã§ããŒã¿ãåä¿¡ããããããå€æããå€æãããããŒã¿ãstdoutã«æäŸããããã°ã©ã ã§ãã å®éãããã¯ããã°ããŒã¿ã®èšäºAããZã§èª¬æãã map-reduceã¿ã¹ã¯ãèµ·åããããã®ã¹ããªãŒãã³ã°ã€ã³ã¿ãŒãã§ã€ã¹ã«éåžžã«äŒŒãŠããŸãã ããŒã2ïŒHadoop
äŸïŒ
ç°ãªãé貚ã§çµŠäžãåãåããŠãŒã¶ãŒã®çµŠäžãå«ãããŒãã«ããããšããŸãïŒ
+-------------------+---------------------+-----------------------+ | user_salary.name | user_salary.salary | user_salary.currency | +-------------------+---------------------+-----------------------+ | alexander | 100000 | RUB | | evgeniy | 4000 | EUR | | alla | 50000 | RUB | | elena | 1500 | EUR | +-------------------+---------------------+-----------------------+
ç§ãã¡ã¯ããã¹ãŠã®ãŠãŒã¶ãŒã«ã«ãŒãã«çµŠäžãæ¯æããããããªãã¬ãŒããæã«å ¥ããããšæã£ãŠããŸãã ãããè¡ãã«ã¯ãããŒã¿å€æãå®è¡ããPythonã¹ã¯ãªãããäœæããŸãã
import sys EXCHANGE_RATE = 75 for line in sys.stdin: name, salary, currency = line.rstrip("\n").split('\t') if currency == 'EUR': print name + "\t" + str(int(salary) * EXCHANGE_RATE) else: print name + "\t" + salary
ã¹ã¯ãªããã¯ãå ¥åããŒã¿ãtsv圢åŒã§ããããšãæ瀺ããŠããŸãïŒåã¯ã¿ãã§åºåãããŠããŸãïŒã ããŒãã«ã§NULLå€ãæ€åºããããšãå€ã\ Nããã¹ã¯ãªããå ¥åã«ãªããŸã
次ã«ããã®ã¹ã¯ãªããã䜿çšããŠããŒãã«ãå€æããŸãã
0: jdbc:hive2://localhost:10000/default> select transform(name, salary, currency) using 'python transform_to_rub.py' as (name, rub_salary) from user_salary; +------------+-------------+ | name | rub_salary | +------------+-------------+ | alexander | 100000 | | evgeniy | 300000 | | alla | 50000 | | elena | 112500 | +------------+-------------+
å®éãTRANSFORMæäœã䜿çšãããšãåŸæ¥ã®MapReduceããã€ãã«å®å šã«çœ®ãæããããšãã§ããŸãã
Mapjoin
MapReduceãæäœããããã®ãã¯ããã¯ãšæŠç¥ã«ã€ããŠã®èšäºã§æžããããã«ã2ã€ã®ããŒãã«JOINãå®è£ ããã«ã¯ãäžè¬ã«ãããã€ãã®MapReduceã¿ã¹ã¯ãå¿ èŠã§ãã ãã€ãã¯MapReduceã§ã®ã¿æ©èœãããããJOINã¯ãã€ãã«ãšã£ãŠãé«äŸ¡ãªæäœã§ãã
ãã ããçµåããå¿ èŠããã2ã€ã®ããŒãã«ã®ãããããåããŒãã®RAMã«å®å šã«åãŸãå Žåããã¬ãŒããã¡ã¢ãªã«ããŒãããããšã§1ã€ã®MapReduceã§å¯Ÿå¿ã§ããŸãã ãã®ãã¿ãŒã³ã¯MapJoinãšåŒã°ããŸãã Hiveã§MapJoinãå ·äœçã«äœ¿çšããã«ã¯ããã³ãïŒHiveã®çšèªã§ã¯ããã³ããïŒãæå®ããå¿ èŠããããŸãã
äŸïŒ
SELECT /*+ MAPJOIN(time_dim) */ COUNT(*) from store_sales JOIN time_dim on (ss_sold_time_sk = t_time_sk)
ãã®äŸã§ã¯ãstore_salesããŒãã«ã倧ãããtime_dimããŒãã«ãå°ãããã¡ã¢ãªã«åãŸããšæ³å®ããŠããŸãã / * + MAPJOINïŒtime_dimïŒ* /-ããã¯ãMAPJOINã¿ã¹ã¯ã®éå§ã«é¢ããHIVEã®ãã³ãã§ãã
ãã©ã³ã¶ã¯ã·ã§ã³ã¢ãã«
ãã©ã³ã¶ã¯ã·ã§ã³ACIDã¢ãã«ã¯ã4ã€ã®äž»èŠãªããããã£ã®ãµããŒããæå³ããŸãã
- ã¢ãããã¯æ§ -æäœãå®å
šã«å®è¡ãããŠããŒã¿ãå®å
šã«å€æŽããããããŸãã¯èœã¡ãŠçè·¡ãæ®ããŸããã
- äžè²«æ§ -ã¢ããªã±ãŒã·ã§ã³ãæäœãå®è¡ããåŸããã®çµæã¯åŸç¶ã®ãã¹ãŠã®æäœã§äœ¿çšå¯èœã«ãªããŸãã
- åé¢ -äžéšã®ãŠãŒã¶ãŒã®æäœã«ã¯ãä»ã®ãŠãŒã¶ãŒã«å¯äœçšã¯ãããŸããã
- èä¹
æ§ -æåãããªãã·ã§ã³ã®çµæãšããŠè¡ãããå€æŽã¯ãã·ã¹ãã é害ãçºçããå Žåã§ãçµæãä¿æããŸãã
äžè¬çã«ãHiveã¯ããŒã¿ã®å€æŽãæ±ãã«ã¯ããŸãé©ããŠããŸããããããŒã¿ã®å€æŽã®ãµããŒããå¿ èŠãªå Žåãããã€ããããŸãã ãŸã第äžã«ããããã¯æ¬¡ã®ãšããã§ãã
- ã¹ããªãŒãã³ã°ã¢ãŒãã§è¿œå ãããããŒã¿ïŒ flume ã kafkaãªã©ã®ã·ã¹ãã ããïŒã ããŒã¿ãå°çãããããã«ãã€ãã§åæã§ããããã«ããã
- ã¹ããŒãã®æŽæ°-ããšãã°ããã€ãããŒãã«ã«æ°ããåãè¿œå ããŸãã åãåã¬ã³ãŒãã«æ£åžžã«è¿œå ããããããããã®åã«è¿œå ããŠããããã«ãè¿œå ããªãã§ãã ããã
- åã
ã®ã¬ã³ãŒããæŽæ°ããå¿
èŠãããå ŽåããããŸãã
ãããã®ç®çã®ããã«ãããŒãžã§ã³0.14ããããã€ãã¯ãã©ã³ã¶ã¯ã·ã§ã³ã¢ãã«ã®ãµããŒããå®è£ ããINSERTãUPDATEãDELETEã®3ã€ã®æäœã§å®è£ ãããŸããã
ãããã®æäœã®ãµããŒãã¯éââåžžã«å¶éãããŠããŸãã
- çŸåšãµããŒããããŠããã®ã¯ORCãã¡ã€ã«ã®ã¿ã§ãã
- ããã©ã«ãã§ã¯ããã©ã³ã¶ã¯ã·ã§ã³ã®ãµããŒãã¯ç¡å¹ã«ãªã£ãŠããŸãã æå¹ã«ããã«ã¯ããã€ãæ§æãã¡ã€ã«ã«é©åãªå€æŽãå ããå¿
èŠããããŸãã
- ãªã¬ãŒã·ã§ãã«ããŒã¿ããŒã¹ã§ããªãã¿ã®BEGINãCOMMITãROLLBACKã³ãã³ãã¯ãµããŒããããŠããŸããã
ãã©ã³ã¶ã¯ã·ã§ã³ãµããŒãã¯ããã«ã¿ãã¡ã€ã«ã䜿çšããŠå®è£ ãããŸãã ã€ãŸããããŒã¿æŽæ°æäœãå®è¡ããããšããœãŒã¹ãã¡ã€ã«å ã®ããŒã¿ã¯æŽæ°ãããŸããããå€æŽãããè¡ã瀺ãæ°ãããã¡ã€ã«ãäœæãããŸãã åŸã§ããã€ãã¯å§çž®æäœã䜿çšããŠããããããŒãžããŸãïŒhbaseã§åæ§ã®æäœã䜿çšãããŸãïŒã
äžè¬ã«ããã©ã³ã¶ã¯ã·ã§ã³ãµããŒãã¯éââåžžã«éãããŠãããããHiveã§ãã®æ©èœã䜿çšããåã«éåžžã«çå£ã«æ€èšãã䟡å€ããããŸãã HBaseãŸãã¯åŸæ¥ã®ãªã¬ãŒã·ã§ãã«ããŒã¿ããŒã¹ã®æ¹åã調ã¹ã䟡å€ããããããããŸããã
ãããã«
ãã®ã·ãªãŒãºããã³ååã®èšäºã§ã¯ãMapReduceã¿ã¹ã¯ã®æäœã容æã«ãã匷åãªããŒã«ã§ããHiveã®äž»ãªæ©èœã«ã€ããŠæ€èšããŸããã Hiveã¯ãSQLã®æäœã«æ £ããŠããã¢ããªã¹ãã«æé©ã§ãããJDBCãã©ã€ããŒã®ãµããŒãã䜿çšããŠæ¢åã®ã€ã³ãã©ã¹ãã©ã¯ãã£ã«ç°¡åã«çµ±åã§ãããŠãŒã¶ãŒå®çŸ©é¢æ°ãšã«ã¹ã¿ã å€æã®ãµããŒããèæ ®ããŠãåŸæ¥ã®MapReduceããèªåã«ããŒã¿åŠçãå®å šã«è»¢éã§ããŸãã ãã ãããã€ãã¯ãéã®äžžè¬ãã§ã¯ãããŸãããé »ç¹ã«æŽæ°ãããããŒã¿ã«ã€ããŠã¯ãHbaseãåŸæ¥ã®ãªã¬ãŒã·ã§ãã«ããŒã¿ããŒã¹ãªã©ã®ããŒã«ã®æ¹åãèŠãããšãã§ããŸãã
ã·ãªãŒãºã®æ¬¡ã®èšäºã§ã¯ãããã°ããŒã¿ãæäœããããã®ããŒã«ãšãã®åŠçæ¹æ³ã«ã€ããŠåŒãç¶ãæ€èšããŸãã
ããŒã¿åæã«é¢ããèè ã®YouTubeãã£ã³ãã«