Making your HBase Client Work in a Kerberized Environment

I am currently working in a cluster with security enabled, in which authentication  based on the presence of a Kerberos token (see more details here).

I have been recently working on adapting the HBase connector for Scalding, SpyGlass in this kind of environment. Whilst, the code itself is quite straightforward, the kind of error you might encounter trying to make it work are quite nasty so I think you might find useful to find a dump of my experience of making it work on a CDH5.3 cluster with Kerberos security here.

The code

Below you can find a very simple HBase client, writing a new entry into an HBase table and then scanning the full content of it and printing it out to console.


package io.scalding.examples.hbase

import scala.collection.JavaConversions._

import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client._
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.util.Bytes._
object HBaseSample extends App {

  val username = System.getenv("USER")
  val namespace = username.replace(".", "_")

  val hbaseConf = HBaseConfiguration.create()

  val hbaseQuorum = hbaseConf.get("hbase.zookeeper.quorum")
  val hbaseQuorumPort = hbaseConf.get("hbase.zookeeper.property.clientPort")

  println(s"Connecting to HBase with quorum '$hbaseQuorum' and port $hbaseQuorumPort")
  val connection = HConnectionManager.createConnection(hbaseConf)

  val tableName = s"$namespace:leaders"

  println(s"Connected, creating table $tableName")
  val table = connection.getTable(tableName)

  println(s"Done")

  try {

    writeIntoTable(table, s"ROW-ID-${System.currentTimeMillis}")
    scanTable(table)

  } finally {
    table.close()
    connection.close()
  }

  def scanTable(table: HTableInterface): Unit = {

    println(s"Scanning table")
    val scan = new Scan();
    val tableScanner = table.getScanner(scan);
    tableScanner.foreach { result =>
      result.raw.foreach { kv =>
        println(s"Row name : ${Bytes.toString(kv.getRow)}")
        println(s"Coulomn family : ${Bytes.toString(kv.getFamily)}")
        println(s"Qualifier family : ${Bytes.toString(kv.getQualifier)}")
        println(s"Timestamp family : ${kv.getTimestamp}")
        println(s"Value family : ${Bytes.toString(kv.getValue)}");
      }
    }
    println(s"Done")
  }


  def writeIntoTable(table: HTableInterface, rowId: String): Unit = {
    try {

      println(s"Writing into table")
      val put = new Put(toBytes(rowId))

      put.add(toBytes("name"), toBytes("first-name"), toBytes("your first name"))
      put.add(toBytes("name"), toBytes("surname"), toBytes("your surname"))

      put.add(toBytes("address"), toBytes("postcode"), toBytes("NE1 C22"))

      table.put(put)

      println(s"Done")
    }
  }
}

 

The code is very straightforward and doesn’t contain any reference to security. It can be built using SBT using this very basic build.sbt file


lazy val root = (project in file(".")).
  settings(
    name := "hbase",
    version := "1.0",
    scalaVersion := "2.11.4",
    libraryDependencies ++= libraries,
    resolvers ++= commonResolvers,
    assemblyExcludedJars in assembly := {
      val cp = (fullClasspath in assembly).value
      val excludesJar = Set("commons-beanutils-1.7.0.jar", "commons-beanutils-core-1.8.0.jar")
      cp filter { jar => excludesJar.contains(jar.data.getName)}
    }
  )
  .settings(net.virtualvoid.sbt.graph.Plugin.graphSettings: _*)

val cdhVersion = "cdh5.3.0"
val hbaseVersion = s"0.98.6-$cdhVersion"
val hadoopVersion = s"2.5.0-$cdhVersion"

lazy val libraries = Seq(
	"org.apache.hbase" % "hbase-client" % hbaseVersion,
	"org.apache.hbase" % "hbase-common" % hbaseVersion,
	"org.apache.hbase" % "hbase-hadoop2-compat" % hbaseVersion,
	"org.apache.hbase" % "hbase-hadoop-compat" % hbaseVersion,
	"org.apache.hadoop" % "hadoop-common" % hadoopVersion
)

lazy val commonResolvers = Seq(
	"Cloudera repo" at "//repository.cloudera.com/artifactory/cloudera-repos/"
)

Please remember to add the following plugin.sbt file under your project directory


addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.12.0")

addSbtPlugin("net.virtual-void" % "sbt-dependency-graph" % "0.7.4")

 

Running the code

To run the given code you might just run the following commands into a box configured as an Hadoop Client Node for your cluster:


sbt assembly

kinit

echo "Running test program"
java  -cp target/scala-2.11/hbase-assembly-1.0.jar io.scalding.examples.hbase.HBaseSample

Remember to create your Kerberos token with the kinit command before running the program.

What is going to happen is that your program will hang while trying to write the first data into HBase. The connection will hang trying to connect with HBase at localhost. You might decide to specify your HBase connection with adding the following lines to your code:


 val hbaseQuorum = System.getProperty("hbase.quorum")

  val hbaseQuorumPort = System.getProperty("hbase.port", "2181")

  val hbaseConf = HBaseConfiguration.create()
  hbaseConf.set("hbase.zookeeper.quorum", hbaseQuorum)
  hbaseConf.set("hbase.zookeeper.property.clientPort", hbaseQuorumPort )

You will see then that, even if you are connecting to the right server, your program will hang anyway.
I’m listing below some of the sources I tried to follow before applying the correct solution, mostly trying to create the right HBaseConfiguration object adding the right settings but I’m going to immediately post the right approach so you can stop reading if you are not so curious.

Making the code work

To make the code work you don’t have to change any line from the one written in the top of this post, you just have to make your client able to access the full HBase configuration. This just implies to change your running classpath to:


/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hbase/conf:target/scala-2.11/hbase-assembly-1.0.jar

This will make everything run smoothly. It is specific for CDH 5.3 but you can adapt it for your cluster configuration.

On line sources that made me lose time

While fighting my battle against kerberos I have been sidetracked by a series of on line sources all appearing very useful but in the end making me only lose time. I track then here so you can avoid the same error or comment to this post on why I haven’t been able to make the most of them.

  • This IBM article explains how you can load your Kerberos ticket into your program context. It ended up to be useless since the Hadoop client libraries are already taking care of it;
  • This part of the HBase documentation explains what are the configuration parameters to add to your HBaseConfiguration in order to make the Client work. I tried to manually add them all but this didn’t solve my problems. Probably they are not just enough
  • This HBase-User post suggesting how to populate your HBaseConfiguration didn’t help too, mostly moved me in the wrong direction of trying to create the right HBaseConfiguration myself
3
Share
-->