Oct 23, 2019

Debezium & MySQL v8 : Public Key Retrieval Is Not Allowed

I started hitting problems when trying Debezium against MySQL v8. When creating the connector:

Oct 16, 2019

Using Kafka Connect and Debezium with Confluent Cloud

This is based on using Confluent Cloud to provide your managed Kafka and Schema Registry. All that you run yourself is the Kafka Connect worker.

Optionally, you can use this Docker Compose to run the worker and a sample MySQL database.

Oct 15, 2019

Skipping bad records with the Kafka Connect JDBC sink connector

The Kafka Connect framework provides generic error handling and dead-letter queue capabilities which are available for problems with [de]serialisation and Single Message Transforms. When it comes to errors that a connector may encounter doing the actual pull or put of data from the source/target system, it’s down to the connector itself to implement logic around that. For example, the Elasticsearch sink connector provides configuration (behavior.on.malformed.documents) that can be set so that a single bad record won’t halt the pipeline. Others, such as the JDBC Sink connector, don’t provide this yet. That means that if you hit this problem, you need to manually unblock it yourself. One way is to manually move the offset of the consumer on past the bad message.

TL;DR : You can use kafka-consumer-groups --reset-offsets --to-offset <x> to manually move the connector past a bad message

Oct 7, 2019

Kafka Connect and Elasticsearch

I use the Elastic stack for a lot of my talks and demos because it complements Kafka brilliantly. A few things have changed in recent releases and this blog is a quick note on some of the errors that you might hit and how to resolve them. It was inspired by a lot of the comments and discussion here and here.

Sep 29, 2019

Copying data between Kafka clusters with Kafkacat

kafkacat gives you Kafka super powers 😎

I’ve written before about kafkacat and what a great tool it is for doing lots of useful things as a developer with Kafka. I used it too in a recent demo that I built in which data needed manipulating in a way that I couldn’t easily elsewhere. Today I want share a very simple but powerful use for kafkacat as both a consumer and producer: copying data from one Kafka cluster to another. In this instance it’s getting data from Confluent Cloud down to a local cluster.

Sep 23, 2019

Kafka Summit GoldenGate bridge run/walk

Coming to Kafka Summit in San Francisco next week? Inspired by similar events at Oracle OpenWorld in past years, I’m proposing an unofficial run (or walk) across the GoldenGate bridge on the morning of Tuesday 1st October. We should be up and out and back in plenty of time to still attend the morning keynotes. Some people will run, some may prefer to walk, it’s open to everyone :)

Sep 19, 2019

Staying sane on the road as a Developer Advocate

I’ve been a full-time Developer Advocate for nearly 1.5 years now, and have learnt lots along the way. The stuff I’ve learnt about being an advocate I’ve written about elsewhere (here/here/here); today I want to write about something that’s just as important: staying sane and looking after yourself whilst on the road. This is also tangentially related to another of my favourite posts that I’ve written: Travelling for Work, with Kids at Home.

Sep 2, 2019

Where I’ll be on the road for the remainder of 2019

I’ve had a relaxing couple of weeks off work over the summer, and came back today to realise that I’ve got a fair bit of conference and meetup travel to wrap my head around for the next few months :) If you’re interested in where I’ll be and want to come and say hi, hear about Kafka—or just grab a coffee or beer, herewith my itinerary as it currently stands.

Aug 15, 2019

Reset Kafka Connect Source Connector Offsets

Kafka Connect in distributed mode uses Kafka itself to persist the offsets of any source connectors. This is a great way to do things as it means that you can easily add more workers, rebuild existing ones, etc without having to worry about where the state is persisted. I personally always recommend using distributed mode, even if just for a single worker instance - it just makes things easier, and more standard.

Aug 9, 2019

Starting a Kafka Connect sink connector at the end of a topic

When you create a sink connector in Kafka Connect, by default it will start reading from the beginning of the topic and stream all of the existing—and new—data to the target. The setting that controls this behaviour is auto.offset.reset, and you can see its value in the worker log when the connector runs:

[2019-08-05 23:31:35,405] INFO ConsumerConfig values:
        allow.auto.create.topics = true
        auto.commit.interval.ms = 5000
        auto.offset.reset = earliest
…

Aug 9, 2019

Resetting a Consumer Group in Kafka

I’ve been using Replicator as a powerful way to copy data from my Kafka rig at home onto my laptop’s Kafka environment. It means that when I’m on the road I can continue to work with the same set of data and develop pipelines etc. With a VPN back home I can even keep them in sync directly if I want to. I hit a problem the other day where Replicator was running, but I had no data in my target topics on my laptop.

Aug 7, 2019

Migrating Alfred Clipboard to New Laptop

Alfred is one of my favourite productivity tools. One of its best features is the clipboard history, which when I moved laptops and it didn’t transfer I realised quite how much I rely on this functionality in my day-to-day work. Whilst Alfred has the options to syncronise its preferences across machines, it seems that it doesn’t synchronise the clipboard database. To get it to work I did the following:

Jul 11, 2019

So how DO you make those cool diagrams? July 2019 update

I write and speak lots about Kafka, and get a fair few questions from this. The most common question is actually nothing to do with Kafka, but instead: How do you make those cool diagrams? I wrote about this originally last year but since then have evolved my approach. I’ve now pretty much ditched Paper, in favour of Concepts. It was recommended to me after I published the previous post.

Jul 3, 2019

Taking the Vienna-Munich sleeper train

This week I was scheduled in to a couple of meetups, in Vienna and Munich. Flying is an inevitable part of travel since I also happen to like being home seeing my family and airplanes are usually the quickest way to make this happen. I don’t particularly enjoy flying, and there’s the environmental impact of it too—so when I realised that Vienna and Munich are relatively close to each other I looked at getting the train.

Jun 23, 2019

Manually delete a connector from Kafka Connect

Kafka Connect has as REST API through which all config should be done, including removing connectors that have been created. Sometimes though, you might have reason to want to manually do this—and since Kafka Connect running in distributed mode uses Kafka as its persistent data store, you can achieve this by manually writing to the topic yourself.

Jun 6, 2019

Automatically restarting failed Kafka Connect tasks

Here’s a hacky way to automatically restart Kafka Connect connectors if they fail. Restarting automatically only makes sense if it’s a transient failure; if there’s a problem with your pipeline (e.g. bad records or a mis-configured server) then you don’t gain anything from this. You might want to check out Kafka Connect’s error handling and dead letter queues too.

May 24, 2019

Putting Kafka Connect passwords in a separate file / externalising secrets

Kafka Connect configuration is easy - you just write some JSON! But what if you’ve got credentials that you need to pass? Embedding those in a config file is not always such a smart idea. Fortunately with KIP-297 which was released in Apache Kafka 2.0 there is support for external secrets. It’s extendable to use your own ConfigProvider, and ships with its own for just putting credentials in a file - which I’ll show here. You can read more here.

May 22, 2019

Deleting a Connector in Kafka Connect without the REST API

Kafka Connect exposes a REST interface through which all config and monitoring operations can be done. You can create connectors, delete them, restart them, check their status, and so on. But, I found a situation recently in which I needed to delete a connector and couldn’t do so with the REST API. Here’s another way to do it, by amending the configuration Kafka topic that Kafka Connect in distributed mode uses to persist configuration information for connectors. Note that this is not a recommended way of working with Kafka Connect—the REST API is there for a good reason :)

May 9, 2019

A poor man’s KSQL EXPLODE/UNNEST technique

There is an open issue for support of EXPLODE/UNNEST functionality in KSQL, and if you need it then do up-vote the issue. Here I detail a hacky, but effective, workaround for exploding arrays into multiple messages—so long as you know the upper-bound on your array.

May 8, 2019

When a Kafka Connect converter is not a converter

Kafka Connect is a API within Apache Kafka and its modular nature makes it powerful and flexible. Converters are part of the API but not always fully understood. I’ve written previously about Kafka Connect converters, and this post is just a hands-on example to show even further what they are—and are not—about.

Note	To understand more about Kafka Connect in general, check out my talk from Kafka Summit London From Zero to Hero with Kafka Connect.

rmoff’s random ramblings

✨ Data Engineering, Kafka, and other random geekery 🤓

Debezium & MySQL v8 : Public Key Retrieval Is Not Allowed

Using Kafka Connect and Debezium with Confluent Cloud

Skipping bad records with the Kafka Connect JDBC sink connector

Kafka Connect and Elasticsearch

Copying data between Kafka clusters with Kafkacat

kafkacat gives you Kafka super powers 😎

Kafka Summit GoldenGate bridge run/walk

Staying sane on the road as a Developer Advocate

Where I’ll be on the road for the remainder of 2019

Reset Kafka Connect Source Connector Offsets

Starting a Kafka Connect sink connector at the end of a topic

Resetting a Consumer Group in Kafka

Migrating Alfred Clipboard to New Laptop

So how DO you make those cool diagrams? July 2019 update

Taking the Vienna-Munich sleeper train

Manually delete a connector from Kafka Connect

Automatically restarting failed Kafka Connect tasks

Putting Kafka Connect passwords in a separate file / externalising secrets

Deleting a Connector in Kafka Connect without the REST API

A poor man’s KSQL EXPLODE/UNNEST technique

When a Kafka Connect converter is not a converter