back to blog
What is AWS KAFKA
AWS Kafka also known as Amazon Managed Streaming for Apache Kafka (MSK). Amazon MSK takes care of all the operational tasks associated with Apache Kafka clusters, such as server provisioning, configuration, patching, upgrades, and monitoring. It also ensures that the data is safely stored and secured, and provides scaling to support load changes. This fully managed service frees up your time to focus on building and running streaming event applications.
With Amazon MSK, you get open-source and highly secure Apache Kafka clusters distributed across multiple Availability Zones (AZ's), providing highly available streaming storage. Amazon MSK is highly configurable, observable, and scalable, allowing for the flexibility and control needed for various use cases.
Integration with other AWS services makes application development simpler with Amazon MSK. It integrates seamlessly with AWS Identity and Access Management (IAM) and AWS Certificate Manager for security, AWS Glue Schema Registry for schema governance, Amazon Kinesis Data Analytics and AWS Lambda for stream processing, and more. Amazon MSK provides the integration backbone for modern messaging and event-driven applications at the center of data ingest and processing services, as well as microservice application architectures.
What is AWS Kafka used for?
Kafka is frequently used as a tool for creating real-time streaming applications and pipelines that manage streaming data. In the following paragraphs, we'll delve into some of its common use cases.
Send and Receive stream of logs
Ingesting and processing log and event streams is an essential part of many modern data-driven applications. This allows SRE/DEVOPS team to monitor real time logs and setup alerts based on the log events. A typical setup for this would be
Shipping logs using fluentd/filebeat to kafka topic.
Logstash to consume logs from kafka topic and parse it.
Then, logstash write the logs to elasticsearch index.
Finally in our last step. We can use use Kibana to explore the logs in realtime/setup alerts based on log patterns
Stream database changes for building real-time data analytics dashboard
By streaming database changes with Kafka, developers can construct real-time data analytics dashboards that provide up-to-date insights into an organization's operations. To achieve this, developers can subscribe to the appropriate topics that track database changes and process the corresponding data stream. With Kafka's ability to handle large volumes of data and its support for distributed processing, developers can build highly scalable and responsive analytics applications that keep pace with changing data.
Typical setup would be
Setup Debezium to read databases changes from MySQL or any supported databases.
Write Databases changes to Kafka Topic.
Transform data by any supported ETL tool to read data from kafka topic.
Write transformed data to a supported database.
How to setup Kafka in AWS?
Sign in to the AWS Management Console https://console.aws.amazon.com/msk/home?region=us-east-1#/home/
Navigate to the Amazon MSK console.
Choose Create Cluster to Create a new Kafka cluster
Choose Quick Create this creation method will create cluster with default settings.
In General cluster properties, Select Provisioned as the Cluster type.
Choose Create cluster.
Check the cluster Status on the Cluster summary page. The status changes from Creating to Active as Amazon MSK provisions the cluster. When the status is Active, you can connect to the cluster. For more information about cluster status, see Cluster states.
Now You have create MSK Cluster. Let's connect using KAFKA CLI to create Topic.
To connect to Kafka you need to install the kafka cli. To install it on mac you can simply run the following command
brew install kafka
Once Kafka is install lets connect to MSK and create a topic
kafka-topics --bootstrap-server b-1.testkafka.xxxx.c6.kafka.eu-west-1.amazonaws.com:9092,b-2.estkafka.xxxxx.c6.kafka.eu-west-1.amazonaws.com:9092 --create --replication-factor 2 --topic test_topic
Now you have successfully created topic. you can produce data to this topic.
For more such content, make sure to check out our latest tech blog
Follow our LinkedIn Page