Skip to content

afsalthaj/beeswax

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

beeswax

Build Status Gitter chat

Monadic wrapper for the Hive API.

Scaladoc

Usage

See https://commbank.github.io/beeswax/index.html

Creating a table

Table schemas are derived from a thrift struct. The underlying storage format can either be text or parquet. Here is an example:

Hive.createParquetTable[Pair]("database", "table", List(("year", "int"), ("name", "string")))

Querying

Create actions that perform hive queries.

Hive.query("SELECT COUNT(*) FROM datable.table")

Multiple operations

Creates a table and inserts data into it from another table.

for {
  _ <- Hive.createTexttable[Pair]("test", "pairs", List.empty)
  _ <- Hive.query("INSERT INTO TABLE test.pairs FROM SELECT * FROM test2.pairs")
} yield ()

Running the Hive monad

import org.apache.hadoop.hive.conf.HiveConf

val hc: HiveConf                 = new HiveConf
val q: Hive[List[String]]        = Hive.query("SELECT COUNT(*) FROM datable.table")
val result: Result[List[String]] = q.run(hc)

Known Issues

  • Need to specify the Hive metastore as a thrift endpoint instead of the database.

      <property>
        <name>hive.metastore.uris</name>
        <value>thrift://metastore:9083</value>
      </property>
    
  • In order to run queries the hive-site.xml need to include the yarn.resourcemanager.address property even if the value is bogus.

      <property>
        <name>yarn.resourcemanager.address</name>
        <value>bogus</value>
      </property>
    
  • In order to run queries with partitioning the partition mode needs to be set to nonstrict.

      <property>
        <name>hive.exec.dynamic.partition.mode</name>
        <value>nonstrict</value>
      </property>