Version: 0.13.0

Sink CRD configurations

This document lists CRD configurations available for Pulsar sink connectors. The sink CRD configurations consist of sink connector configurations and the common CRD configurations.

Sink configurations

This table lists sink configurations.

Field	Description
`name`	The connector name is a string of up to `43` characters.
`classname`	The class name of a sink connector.
`tenant`	The tenant of a sink connector.
`namespace`	The Pulsar namespace of a sink connector.
`clusterName`	The Pulsar cluster of a sink connector.
`replicas`	The number of instances that you want to run for a sink connector. If it is set to `0`, it means to stop the sink connector. When HPA is enabled, you cannot set the `replicas` parameter to `0` or a negative number.
`ShowPreciseParallelism`	Configure whether to show the precise parallelism. If it is set to `true`, the `Parallelism` is equal to value of the `replicas` parameter. In this situation, when you update the value of the `replicas` parameter, it will cause all Pods to be recreated. By default, it is set to `false`.
`minReplicas`	The minimum number of instances that you want to run for a sink connector. If it is set to `0`, it means to stop the sink connector. By default, it is set to `1`. When HPA auto-scaling is enabled, the HPA controller scales the Pods up / down based on the values of the `minReplicas` and `maxReplicas` options. The number of the Pods should be greater than the value of the `minReplicas` and be smaller than the value of the `maxReplicas`.
`downloaderImage`	The image for installing the init container that is used to download packages or functions from Pulsar if the download path is specified.
`maxReplicas`	The maximum number of instances that you want to run for this sink connector. When the value of the `maxReplicas` parameter is greater than the value of `replicas`, it indicates that the sink controller automatically scales the sink connector based on the CPU usage. By default, `maxReplicas` is set to 0, which indicates that auto-scaling is disabled.
`sinkConfig`	The sink connector configurations in YAML format.
`timeout`	The message timeout in milliseconds.
`negativeAckRedeliveryDelayMs`	The number of redelivered messages due to negative acknowledgement.
`autoAck`	Whether or not the framework acknowledges messages automatically. This field is required. You can set it to `true` or `false`.
`maxMessageRetry`	How many times to process a message before giving up.
`processingGuarantee`	The processing guarantees (delivery semantics) applied to the sink connector. Available values: `atleast_once`, `atmost_once`, `effectively_once`. When you set `ProcessingGuarantees` to `effectively_once`, the runtime will set the subscription type to `FAILOVER`. By default, the subscription type is set to `SHARED`.
`retainOrdering`	The sink connector consumes and processes messages in order. When you set `retainOrdering`, the runtime will set the subscription type to `FAILOVER`. By default, the subscription type is set to `SHARED`.
`retainKeyOrdering`	Configure whether to retain the key order of messages. When you set `retainKeyOrdering`, the runtime will set the subscription type to `KEY_SHARED`. By default, the subscription type is set to `SHARED`.
`deadLetterTopic`	The topic where all messages that were not processed successfully are sent.
`subscriptionName`	The subscription name of the sink connector if you want a specific subscription name for the input-topic consumer.
`cleanupSubscription`	Configure whether to clean up subscriptions.
`subscriptionPosition`	The subscription position.
`pulsar`	The configurations of the Pulsar cluster. For details, see messaging.

Annotations

In Kubernetes, an annotation defines an unstructured Key Value Map (KVM) that can be set by external tools to store and retrieve metadata. annotations must be a map of string keys and string values. Annotation values must pass Kubernetes annotations validation. For details, see Kubernetes documentation on Annotations.

This example shows how to use an annotation to make an object unmanaged. Therefore, the Controller will skip reconciling unmanaged objects in reconciliation loop.

apiVersion: compute.functionmesh.io/v1alpha1
kind: Sink
metadata:
  annotations:
    compute.functionmesh.io/managed: "false"

Images

This section describes image options available for Pulsar sink CRDs.

Base runner

The base runner is an image base for other runners. The base runner is located at ./pulsar-functions-base-runner. The base runner image contains basic tool-chains like /pulsar/bin, /pulsar/conf and /pulsar/lib to ensure that the pulsar-admin CLI tool works properly to support Apache Pulsar Packages.

Runner images

Function Mesh uses runner images as images of Pulsar connectors. Each runner image only contains necessary tool-chains and libraries for specified runtime.

Pulsar connectors support using the Java runner images as their images. The Java runner is based on the base runner and contains the Java function instance to run Java functions or connectors. The streamnative/pulsar-functions-java-runner Java runner is stored at the Docker Hub and is automatically updated to align with Apache Pulsar release.

Image pull policies

When the Function Mesh Operator creates a container, it uses the imagePullPolicy option to determine whether the image should be pulled prior to starting the container. There are three possible values for the imagePullPolicy option:

Field	Description
`Always`	Always pull the image.
`Never`	Never pull the image.
`IfNotPresent`	Only pull the image if the image does not already exist locally.

Messaging

Function Mesh provides Pulsar cluster configurations in the Function, Source, and Sink CRDs. You can configure TLS encryption, TLS authentication, and OAuth2 authentication using the following configurations.

Note
The tlsConfig and tlsSecret are exclusive. If you configure TLS configurations, the TLS Secret will not take effect.

Field	Description
`authConfig`	The authentication configurations of the Pulsar cluster. Currently, you can only configure generic authentication and OAuth2 authentication through this field. For other authentication methods, you can configure them using the `authSecret` field. Generic authentication `clientAuthenticationParameters`: specify the client authentication parameters. `clientAuthenticationPlugin`: specify the client authentication plugin. OAuth2 authentication `audience`: specify the OAuth2 resource server identifier for the Pulsar cluster. `issuerUrl`: specify the URL of the OAuth2 identity provider that allows a Pulsar client to obtain an access token. `scope`: specify the scope of an access request. For more information, see access token scope. `keySecretName`: specify the name of the Kubernetes Secret. `keySecretKey`: specify the key of the Kubernetes Secret that contains the content of the OAuth2 private key.
`authSecret`	The name of the authentication ConfigMap that stores authentication configurations of the Pulsar cluster. `clientAuthenticationPlugin`: specify the client authentication plugin. `clientAuthenticationParameters`: specify the client authentication parameters.
`pulsarConfig`	The name of the ConfigMap that stores Pulsar cluster configurations. `webServiceURL`: specify the web service URL for managing the Pulsar cluster. This URL should be a standard DNS name. `brokerServiceURL`: specify the Pulsar protocol URL for interaction with the brokers in the Pulsar cluster. This URL should not use the same DNS name as the web service URL but should use the `pulsar` scheme.
`tlsConfig`	The TLS configurations of the Pulsar cluster. `allowInsecure`: allow insecure TLS connection. `certSecretKey`: specify the TLS Secret key. `certSecretName`: specify the TLS Secret name. `enabled`: enable TLS configurations. `hostnameVerification`: enable hostname verification.
`tlsSecret`	The name of the TLS ConfigMap that stores TLS configurations of the Pulsar cluster. `tlsAllowInsecureConnection`: allow insecure TLS connection. By default, it is set to `false`. `tlsHostnameVerificationEnable`: enable hostname verification. By default, it is set to `true`. `tlsTrustCertsFilePath`: specify the path of the TLS trust certificate file.

State storage

Function Mesh provides the following fields for stateful configurations in the CRD definition.

Field	Description
`statefulConfig`	The state storage configuration for the sink connector.
`statefulConfig.pulsar.serviceUrl`	The service URL that points to the state storage service. By default, the state storage service is the BookKeeper table service.
`statefulConfig.pulsar.javaProvider`	(Optional) If you want to overwrite the default configuration, you can use the state storage configuration for the Java runtime. For example, you can change it to other backend services other than the BookKeeper table service.
`statefulConfig.pulsar.javaProvider.className`	The Java class name of the state storage provider implementation. The class must implement the `org.apache.pulsar.functions.instance.state.StateStoreProvider` interface. If not, `org.apache.pulsar.functions.instance.state.BKStateStoreProviderImpl` will be used.
`statefulConfig.pulsar.javaProvider.config`	The configurations that are passed to the state storage provider.

Input

The input topics of a Pulsar Function. The following table lists options available for the Input.

Field	Description
`topics`	The configuration of the topic from which messages are fetched.
`customSerdeSources`	The map of input topics to SerDe class names (as a JSON string).
`customSchemaSources`	The map of input topics to Schema class names (as a JSON string).
`sourceSpecs`	The map of source specifications to consumer specifications. Consumer specifications include these options: - `SchemaType`: the built-in schema type or custom schema class name to be used for messages fetched by the connector. - `SerdeClassName`: the SerDe class to be used for messages fetched by the connector. - `IsRegexPattern`: configure whether the input topic adopts a Regex pattern. - `SchemaProperties`: the schema properties for messages fetched by the connector. - `ConsumerProperties`: the consumer properties for messages fetched by the connector. - `ReceiverQueueSize`: the size of the consumer receive queue. - `cryptoConfig`: cryptography configurations of the consumer.

Resources

When you specify a function or connector, you can optionally specify how much of each resource they need. The resources available to specify are CPU and memory (RAM).

If the node where a Pod is running has enough of a resource available, it is possible (and allowed) for a Pod to use more resources than its request for that resource. However, a Pod is not allowed to use more than its resource limit.

Secrets

Function Mesh provides the secretsMap field for Function, Source, and Sink in the CRD definition. You can refer to the created secrets under the same namespace and the controller can include those referred secrets. The secrets are provide by EnvironmentBasedSecretsProvider, which can be used by context.getSecret() in Pulsar functions and connectors.

The secretsMap field is defined as a Map struct with String keys and SecretReference values. The key indicates the environment value in the container, and the SecretReference is defined as below.

Field	Description
`path`	The name of the secret in the Pod's namespace to select from.
`key`	The key of the secret to select from. It must be a valid secret key.

Suppose that there is a Kubernetes Secret named credential-secret defined as below:

apiVersion: v1
data:
  username: foo
  password: bar
kind: Secret
metadata:
  name: credential-secret
type: Opaque

To use it in Pulsar Functions in a secure way, you can define the secretsMap in the Custom Resource:

secretsMap:
  username:
    path: credential-secret
    key: username
  password:
    path: credential-secret
    key: password

Then, in the Pulsar Functions and Connectors, you can call context.getSecret("username") to get the secret value (foo).

Packages

Function Mesh supports running Pulsar connectors in Java.

Field	Description
`jarLocation`	The path to the JAR file for the connector.
`javaOpts`	It specifies JVM options to better configure JVM behaviors, including `exitOnOOMError`, Garbage Collection logs, Garbage Collection tuning, and so on.
`extraDependenciesDir`	It specifies the dependent directory for the JAR package.

Cluster location

In Function Mesh, the Pulsar cluster is defined through a ConfigMap. Pods can consume ConfigMaps as environment variables in a volume. The Pulsar cluster ConfigMap defines the Pulsar cluster URLs.

Field	Description
`webServiceURL`	The Web service URL of the Pulsar cluster.
`brokerServiceURL`	The broker service URL of the Pulsar cluster.

Health checks

Note
To enable health checks, you need to create a PVC and a PV, and bind the PVC to the PV.

With the Kubernetes liveness probe, Function Mesh supports monitoring and acting on the state of Pods (Containers) to ensure that only healthy Pods serve traffic. Implementing health checks using probes provides Function Mesh a solid foundation, better reliability, and higher uptime.

apiVersion: compute.functionmesh.io/v1alpha1
kind: Function
metadata:
  name: health-check-sample
  namespace: default
spec:
  image: streamnative/pulsar-functions-java-sample:2.9.2.23
  className: org.apache.pulsar.functions.api.examples.ExclamationFunction
  forwardSourceMessageProperty: true
  maxPendingAsyncRequests: 1000
  replicas: 1
  maxReplicas: 5
  logTopic: persistent://public/default/logging-function-logs
  pod:
    liveness:
      failureThreshold:              # --- [1]
      initialDelaySeconds: 10        # --- [2]
      periodSeconds: 10              # --- [3]
      successThreshold: 1            # --- [4]
... 
# Other configs

[1] failureThreshold: specify the times to restart a failed probe before giving up the probe. By default, it is set to 3.
[2] initialDelaySeconds: specify the time that should wait before performing the first liveness probe.
[3] periodSeconds: specify the frequency to perform a liveness probe.
[4] successThreshold: specify the minimum consecutive successes for the probe to be considered successful after having failed. By default, it is set to 1.

For more information about probe types, probe check mechanisms, and probe parameters, see Kubernetes documentation on Pod lifecycle and configure probes.

Security context

A security context defines privilege and access control settings for a Pod. By default, Function Mesh uses the following PodSecurityContext as the default value and applies to every function's Pod.

podSecurityContext:
     fsGroup: 10001
     runAsGroup: 10001
     runAsNonRoot: false
     runAsUser: 10000
     seccompProfile:
          type: "RuntimeDefault"

Apart from the PodSecurityContext, Function Mesh also applies the following SecurityContext to the Function's container to ensure the Pod Security Standard follows the restricted specification.

SecurityContext:
     capabilities:
          drop: 
               - ALL
      allowPrivilegeEscalation: false

Field	Description
`fsGroup`	A special supplemental group that applies to all containers in a Pod.
`fsGroupChangePolicy`	Define the behavior of changing ownership and permission of the volume before being exposed inside a Pod. This field only applies to volume types that support `fsGroup`-based ownership and permissions.
`runAsGroup`	The Group ID (GID) that is used to run the entry point of the container process. If it is unset, the runtime is used.
`runAsNonRoot`	Indicate that the container must run as a non-root user. If it is set to `true`, the system will validate the image at runtime to ensure that it does not run as a root user (User ID 0) and fail to start the container if it does. If it is unset or is set to `false`, no such validation will be performed.
`runAsUser`	The User ID (UID) that is used to run the entry point of the container process.
`seLinuxOptions`	The SELinux context that is applied to a container.
`seccompProfile`	The seccomp options that is used by a container.
`supplementalGroups`	A list of groups that is applied to the first process running in each container, in addition to the container's primary GID, the `fsGroup` (if specified), and group memberships defined in the container image for the UID of the container process.
`sysctls`	Sysctls hold a list of namespaced sysctls used for the Pod.
`windowsOptions`	The windows-specific settings that are applied to all containers.
`allowPrivilegeEscalation`	Control whether a process can gain more privileges than its parent process.
`capabilities`	The capabilities to add/drop when running a container.
`privileged`	Run the container in privileged or unprivileged mode.
`procMount`	The type of proc mount that is used by a container.
`readOnlyRootFilesystem`	Mount the container's root filesystem as read-only.

Pod specifications

Function Mesh supports customizing the Pod running Pulsar connectors. This table lists sub-fields available for the pod field.

Field	Description
`labels`	Specify labels attached to a Pod.
`liveness`	Specify the liveness probe properties for a Pod. `initialDelaySecond`: specify the time that should wait before performing the first liveness probe. `periodSeconds`: specify the frequency to perform a liveness probe. For details, see health checks.
`nodeSelector`	Specify a map of key-value pairs. For a Pod running on a node, the node must have each of the indicated key-value pairs as labels.
`affinity`	Specify the scheduling constraints of a Pod.
`tolerations`	Specify the tolerations of a Pod.
`annotations`	Specify the annotations attached to a Pod.
`securityContext`	Specify the security context for a Pod. For details, see [security context](#security-context).
`terminationGracePeriodSeconds`	The amount of time that Kubernetes gives for a Pod before terminating it.
`volumes`	A list of volumes that can be mounted by containers belonging to a Pod.
`imagePullSecrets`	An optional list of references to secrets in the same namespace for pulling any of the images used by a Pod.
`serviceAccountName`	Specify the name of the service account that is used to run Pulsar Functions or connectors.
`initContainers`	The initialization containers belonging to a Pod. A typical use case could be using an initialization container to download a remote JAR to a local path.
`sidecars`	Sidecar containers run together with the main function container in a Pod.
`builtinAutoscaler`	Specify the built-in autoscaling rules. CPU-based autoscaling: auto-scale the number of Pods based on the CPU usage (80%, 50%, or 20%). Memory-based autoscaling: auto-scale the number of Pods based on the memory usage (80%, 50%, or 20%). If you configure the `builtinAutoscaler` field, you do not need to configure the `autoScalingMetrics` and `autoScalingBehavior` options and vice versa.
`autoScalingMetrics`	Specify how to scale based on customized metrics defined in connectors. For details, see MetricSpec v2 autoscaling.
`autoScalingBehavior`	Configure the scaling behavior of the target in both up and down directions (`scaleUp` and `scaleDown` fields respectively). If not specified, the default Kubernetes scaling behaviors are adopted. For details, see HorizontalPodAutoscalerBehavior v2 autoscaling.
`env`	Specify the environment variables to expose on the containers. It is a key/value map. You can either use the `value` option to specify a particular value for the environment variable or use the `valueFrom` option to specify the source for the environment variable's value, as shown below. ```yaml env: - name: example1 value: simpleValue - name: example2 valueFrom: secretKeyRef: name: secret-name key: akey ```

Sink configurations​

Annotations​

Images​

Base runner​

Runner images​

Image pull policies​

Messaging​

State storage​

Input​

Resources​

Secrets​

Packages​

Cluster location​

Health checks​

Security context​

Pod specifications​