Cloud Pub/Sub | Mario Codes — Senior Software Developer

Fully managed real-time messaging architecture that enables you to build loosely coupled microservices, that can communicate asynchronously. You can use it to integrate components of your app. It enables your app to perform operations asynchronously and be loosely coupled and to build your app with open multi-cloud or hybrid architectures.

It delivers each message to every subscription at least once. A publisher can sometimes see duplicated messages.
You can send/receive Cloud Pub/Sub messages programmatically, open REST HTTP or gRCP service APIs, and an Apache Kafka connector.

It scales automatically depending on the volume of messages and enables you to securely integrate distributed systems.

Concepts

Publisher: app that delivers messages. It creates and publishes message to a topic.
Subscriber: app that receives messages. It creates a subscription to a topic
Subscription: it can use either push or pull methods for message delivery
topic: buffer to hold data

subscriptions

The subscriber will only receive messages that are published after the subscription was created. After receiving and processing the message, the subscriber sends an acknowledgement back to the service. If he doesn’t before the acknowledgement deadline, the message will be re-sent again to the subscriber.

You can deploy your subscribers code as Cloud functions. Your Cloud Functions will be triggered whenever a new message is received. This method enables you to implement a serverless approach and build highly scalable systems.

Alternatively you can deploy your subscribers app on a Compute Engine, GKE or App Engine flexible environment. Multiple instances of your app can spin up and split the workload by processing the messages and the topic. This instances can be automatically shutdown when there’re very few messages to process.

pull subscription (default model)

The subscriber explicitly calls the pull method to request messages for delivery and both an acknowledgement ID and the message are returned. To acknowlede receipt, the subscriber invokes a method with this ID.

The subscriber can be Cloud Dataflow or any app that uses Google Cloud Client Libraries to retrieve messages. It controls the rate of delivery and can modify the acknowledgement deadline, to allow more time to process messages.

To process messages rapidly, multiple subscribers can pull from the same subscription. It enables batch delivery as well as massively parallel consumption. Use it when you need to process a very large volume of messages.

push subscription (HTTP)

A push subscriber doesn’t need to implement Google Cloud Client Library methods to retrieve and process messages. In this model it sends each message as an HTTP request to the subscriber at a pre-configured HTTP endpoint.

The push endpoint can be a load balancer or App Engine standard app. The endpoint acknowledges the message by returning an HTTP success code status. A failure response indicates that the message should be sent again. It automatically adjusts the rate of push requests based on the rate at which it receives success responses.

Use this model in environments where Google Cloud dependencies can’t be configured or multiple topics must be processed by the same webhook. This model is also ideal when an HTTP endpoint will be invoked by Pub/Sub and other apps.

topics

A topic acts as a buffer to hold data that is coming in at larges volumes. If you have various data producers and one consumer, the consumer can become overwhelmed. To solve this instances of the producer can act as publishers and publish the data to a topic. The consumer then acts as a subscriber and consumes the data at a reasonable pace. You can automatically scale the number of consumer instances to handle the increase in data.

Using them you also avoid establishing directly P2P connections, because you push your messages to a centralized topic and the services that are interested, simply subscribe to it.

Message Ordering

The scalability pub/sub offers comes with a trade-off, message ordering is not guaranteed. Where possible, design your app to reduce or even eliminate dependencies on message ordering.

pub/sub works well naturally for use cases where order of message processing is not important, but you can implement ordered processing as the following

Unique ID contains order

The subscriber knows the order in which messages must be processed. The publisher can publish messages with a unique ID, the subscriber can store information about messages that have been processed so far.
When a new message is received, the subscriber can check whether this message should be processed immediately, or the unique ID indicates that there are pending messages that should be processed first.

Avoid processing duplicated messages

The publisher can publish messages with an unique ID. Your subscriber can store information about messages that have already been processed and check in the future for this IDs to see whether the incoming message is new and needs to be processed. If it has already been processed, the new message is just discarded.

Use cases

It can reliably receive and store large volumes of rapidly incoming data and fan out messages from one publisher to multiple subscribers. Some examples are

real-time gaming apps
clickstream data ingestion and processing
device or sensor data processing for healthcare and manufacturing
integrating various data sources in financial applications
build loosely-coupled apps
fan out messages to multiple subscribers
rapidly ingest large volumes of data

BigData

For use cases that involve BigData you can use Cloud Dataflow pub/sub IO to achieve exactly one processing of message streams. Pub/sub IO removes duplicated messages based on custom message ID.