3 min read

Installing Apache Pulsar Production on Kubernetes: A Step-by-Step Guide

Installing Apache Pulsar Production on Kubernetes: A Step-by-Step Guide

Apache Pulsar is a distributed pub-sub messaging system that can scale to millions of topics and millions of messages per second. In this tutorial, we will go through the steps of installing and configuring Apache Pulsar production on Kubernetes.

Step 1: Install Kubernetes

First, you need to have a Kubernetes cluster up and running. You can either use a cloud provider to create a managed Kubernetes cluster or install Kubernetes on your own infrastructure. Once you have a Kubernetes cluster, make sure you have kubectl installed and configured to connect to your cluster.

Step 2: Install Pulsar Operator

Apache Pulsar Operator is a Kubernetes operator that simplifies the deployment and management of Pulsar clusters on Kubernetes. You can install Pulsar Operator by running the following command:

kubectl apply -f https://github.com/streamnative/pulsar-operator/releases/latest/download/pulsar-operator.yaml

This will create a Pulsar Operator deployment in your Kubernetes cluster.

Step 3: Create a Pulsar Cluster

Next, you need to create a Pulsar cluster by creating a custom resource of kind “PulsarCluster”. Here is an example PulsarCluster configuration file:

apiVersion: pulsar.streamnative.io/v1alpha1
kind: PulsarCluster
metadata:
  name: my-pulsar-cluster
spec:
  pulsarVersion: 2.8.1
  zookeeperServers: "zookeeper-0.zookeeper:2181,zookeeper-1.zookeeper:2181,zookeeper-2.zookeeper:2181"
  bookkeeperServers: "bookkeeper-0.bookkeeper:3181,bookkeeper-1.bookkeeper:3181,bookkeeper-2.bookkeeper:3181"
  numBrokers: 3
  numBookiesPerBroker: 2
  numFunctionsWorkers: 2
  storageClassName: pulsar-data

This configuration file creates a Pulsar cluster with three brokers, each with two bookies, and two function workers. The zookeeper and bookkeeper instances are specified in the configuration file as Kubernetes services with the hostnames “zookeeper-0.zookeeper”, “zookeeper-1.zookeeper”, and “zookeeper-2.zookeeper”, and “bookkeeper-0.bookkeeper”, “bookkeeper-1.bookkeeper”, and “bookkeeper-2.bookkeeper”, respectively. You will need to create Kubernetes services for zookeeper and bookkeeper instances separately.

To create the Pulsar cluster, run the following command:

kubectl apply -f pulsar-cluster.yaml

This will create a Pulsar cluster with the configuration specified in the pulsar-cluster.yaml file.

Step 4: Scale the Bookkeeper and Zookeeper Instances

Apache Pulsar uses Apache Zookeeper and Apache BookKeeper to manage metadata and storage, respectively. As the load on the Pulsar cluster increases, you may need to scale up the number of bookkeeper and zookeeper instances. You can do this by modifying the corresponding Kubernetes deployments or stateful sets.

For example, to scale up the number of bookkeeper instances, you can modify the bookkeeper-statefulset.yaml file to increase the number of replicas, and then apply the changes using kubectl apply:

kubectl apply -f bookkeeper-statefulset.yaml

Step 5: Create a Sink to Postgres

Apache Pulsar supports sinks, which allow you to consume messages from a Pulsar topic and write them to an external system, such as a database.

To create a sink to Postgres, you need to first create a Postgres database and table to which the messages will be written. You can do this using the following SQL commands:

CREATE DATABASE mydatabase;
CREATE TABLE mytable (id SERIAL PRIMARY KEY, message TEXT);

Next, you need to create a sink connector that will read messages from a Pulsar topic and write them to the Postgres database. Here is an example sink connector configuration file:

name: my-sink
className: org.apache.pulsar.io.jdbc.JdbcSink
inputs: "persistent://public/default/my-topic"
archive: "builtin://jdbc"
configs:
  jdbcUrl: "jdbc:postgresql://postgres:5432/mydatabase"
  tableName: "mytable"
  username: "myusername"
  password: "mypassword"
  batchSize: "100"
  batchTimeMs: "1000"

This configuration file creates a sink connector named “my-sink” that reads messages from a Pulsar topic named “my-topic” and writes them to the “mytable” table in the “mydatabase” database in a Postgres server running at the host “postgres” and port 5432. The username and password to access the database are also specified in the configuration file.

To create the sink connector, run the following command:

kubectl apply -f my-sink.yaml

This will create a sink connector with the configuration specified in the my-sink.yaml file.

Step 6: Test the Sink

To test the sink, you can publish some messages to the “my-topic” topic and verify that they are being written to the “mytable” table in the Postgres database. You can do this using the Pulsar CLI by running the following command:

pulsar-client produce persistent://public/default/my-topic --messages "hello, world"

You can then connect to the Postgres database using a Postgres client and run a query to verify that the messages were written to the “mytable” table:

SELECT * FROM mytable;

This should return a row with the message “hello, world”.

Conclusion

In this tutorial, we went through the steps of installing and configuring Apache Pulsar production on Kubernetes, scaling the bookkeeper and zookeeper instances, and creating a sink to Postgres. With this setup, you can build a high-performance pub-sub messaging system that can scale to handle millions of messages per second and integrate with external systems.