Sign in
Log inSign up

What's exactly Debezium and other stuff related to it?

Bhumika Satpathy's photo
Bhumika Satpathy
·Oct 23, 2021·

3 min read

Background

Few days ago I had to work with Database connectors. Initially I had to start wondering what exactly is that the connectors do and what is their significance, really? So I am going to start the blog with a little background on connectors, their purpose and how helpful they are. Please skip to other sections as per your convenience.

Change Data Capture

Change data capture (CDC) records insert, update, and delete activity that applies to a SQL Server table. This makes the details of the changes available in an easily consumed relational format. The source of change data for change data capture is the SQL Server transaction log. As inserts, updates, and deletes are applied to tracked source tables, entries that describe those changes are added to the log. The log serves as input to the capture process. This reads the log and adds information about changes to the tracked table's associated change table. There are various connectors which read the captured changes and utilize them for their use.

Event stream processing

Event stream processing (ESP) is the practice of taking action on a series of data points that originate from a system that continuously creates data. The term “event” refers to each data point in the system, and “stream” refers to the ongoing delivery of those events. A series of events can also be referred to as “streaming data” or “data streams.” Actions that are taken on those events include aggregations (e.g., calculations such as sum, mean, standard deviation), analytics (e.g., predicting a future event based on patterns in the data), transformations (e.g., changing a number into a date format), enrichment (e.g., combining the data point with other data sources to create more context and meaning), and ingestion (e.g., inserting the data into a database).

Finally, Debezium

Debezium is a set of distributed services to capture changes in your databases so that your applications can see those changes and respond to them. Debezium records all row-level changes within each database table in a change event stream, and applications simply read these streams to see the change events in the same order in which they occurred. This data can be used by applications to perform various operations or respond to the changes accordingly. Debezium also provides a low latency CDC platform for multiple databases like MySQL, Postgres using Apache Kafka. It doesn't really end here. Debezium also helps in transforming data into a particular format and this format could be used to access data whenever necessary. For instance, in my project I used Debezium to ingest data into a particular kafka topic(which is based on per db, per topic format) and then Debezium Helper in turn takes the data and stores it in another topic(which is based on per-table, per topic basis).

Conclusion

This is how Debezium helped me. Its just of the many available options for Database Ingestion. There are many others waiting for us to discover and then use them in our daily lives.