
Table of Contents
Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Part I. Kafka
1. A Rapid Introduction to Kafka. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Communication Model 2
How Are Streams Stored? 6
Topics and Partitions 9
Events 11
Kafka Cluster and Brokers 12
Consumer Groups 13
Installing Kafka 15
Hello, Kafka 16
Summary 19
The Kafka Ecosystem 23
Before Kafka Streams 24
Enter Kafka Streams 25
Features at a Glance 27
Part II. Kafka Streams
2. Getting Started with Kafka Streams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Operational Characteristics 28
Scalability 28
Reliability 29
Maintainability 30
Comparison to Other Systems 30
Deployment Model 30
Processing Model 31
Kappa Architecture 32
Use Cases 33
Processor Topologies 35
Sub-Topologies 37
Depth-First Processing 39
Benefits of Dataflow Programming 41
Tasks and Stream Threads 41
High-Level DSL Versus Low-Level Processor API 44
Introducing Our Tutorial: Hello, Streams 45
Project Setup 46
Creating a New Project 46
Adding the Kafka Streams Dependency 47
DSL 48
Processor API 51
Streams and Tables 53
Stream/Table Duality 57
KStream, KTable, GlobalKTable 57
Summary 58
3. Stateless Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Stateless Versus Stateful Processing 62
Introducing Our Tutorial: Processing a Twitter Stream 63
Project Setup 65
Adding a KStream Source Processor 65
Serialization/Deserialization 69
Building a Custom Serdes 70
Defining Data Classes 71
Implementing a Custom Deserializer 72
Implementing a Custom Serializer 73
Building the Tweet Serdes 74
Filtering Data 75
Branching Data 77
Translating Tweets 79
Merging Streams 81
Enriching Tweets 82
Avro Data Class 83
Sentiment Analysis 85
Serializing Avro Data 87
Registryless Avro Serdes 88
Schema Registry–Aware Avro Serdes 88
Adding a Sink Processor 90
Running the Code 91
Empirical Verification 91
Summary 94
4. Stateful Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Benefits of Stateful Processing 96
Preview of Stateful Operators 97
State Stores 98
Common Characteristics 99
Persistent Versus In-Memory Stores 101
Introducing Our Tutorial: Video Game Leaderboard 102
Project Setup 104
Data Models 104
Adding the Source Processors 106
KStream 106
KTable 107
GlobalKTable 109
Registering Streams and Tables 110
Joins 111
Join Operators 112
Join Types 113
Co-Partitioning 114
Value Joiners 117
KStream to KTable Join (players Join) 119
KStream to GlobalKTable Join (products Join) 120
Grouping Records 121
Grouping Streams 121
Grouping Tables 122
Aggregations 123
Aggregating Streams 123
Aggregating Tables 126
Putting It All Together 127
Interactive Queries 129
Materialized Stores 129
Accessing Read-Only State Stores 131
Table of Contents | vii
Querying Nonwindowed Key-Value Stores 131
Local Queries 134
Remote Queries 134
Summary 142
5. Windows and Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Introducing Our Tutorial: Patient Monitoring Application 144
Project Setup 146
Data Models 147
Time Semantics 147
Timestamp Extractors 150
Included Timestamp Extractors 150
Custom Timestamp Extractors 152
Registering Streams with a Timestamp Extractor 153
Windowing Streams 154
Window Types 154
Selecting a Window 158
Windowed Aggregation 159
Emitting Window Results 161
Grace Period 163
Suppression 163
Filtering and Rekeying Windowed KTables 166
Windowed Joins 167
Time-Driven Dataflow 168
Alerts Sink 170
Querying Windowed Key-Value Stores 170
Summary 173
6. Advanced State Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Persistent Store Disk Layout 176
Fault Tolerance 177
Changelog Topics 178
Standby Replicas 180
Rebalancing: Enemy of the State (Store) 180
Preventing State Migration 181
Sticky Assignment 182
Static Membership 185
Reducing the Impact of Rebalances 186
Incremental Cooperative Rebalancing 187
Controlling State Size 189
Deduplicating Writes with Record Caches 195
viii | Table of Contents
State Store Monitoring 196
Adding State Listeners 196
Adding State Restore Listeners 198
Built-in Metrics 199
Interactive Queries 200
Custom State Stores 201
Summary 202
7. Processor API. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
When to Use the Processor API 204
Introducing Our Tutorial: IoT Digital Twin Service 205
Project Setup 208
Data Models 209
Adding Source Processors 211
Adding Stateless Stream Processors 213
Creating Stateless Processors 214
Creating Stateful Processors 217
Periodic Functions with Punctuate 221
Accessing Record Metadata 223
Adding Sink Processors 225
Interactive Queries 225
Putting It All Together 226
Combining the Processor API with the DSL 230
Processors and Transformers 231
Putting It All Together: Refactor 235
Summary 236
Part III. ksqlDB
8. Getting Started with ksqlDB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
What Is ksqlDB? 240
When to Use ksqlDB 241
Evolution of a New Kind of Database 243
Kafka Streams Integration 243
Connect Integration 246
How Does ksqlDB Compare to a Traditional SQL Database? 247
Similarities 248
Differences 249
Architecture 251
ksqlDB Server 251
Table of Contents | ix
ksqlDB Clients 253
Deployment Modes 255
Interactive Mode 255
Headless Mode 256
Tutorial 257
Installing ksqlDB 257
Running a ksqlDB Server 258
Precreating Topics 259
Using the ksqlDB CLI 259
Summary 262
9. Data Integration with ksqlDB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
Kafka Connect Overview 264
External Versus Embedded Connect 265
External Mode 266
Embedded Mode 267
Configuring Connect Workers 268
Converters and Serialization Formats 270
Tutorial 272
Installing Connectors 272
Creating Connectors with ksqlDB 273
Showing Connectors 275
Describing Connectors 276
Dropping Connectors 277
Verifying the Source Connector 277
Interacting with the Kafka Connect Cluster Directly 278
Introspecting Managed Schemas 279
Summary 279
10. Stream Processing Basics with ksqlDB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
Tutorial: Monitoring Changes at Netflix 281
Project Setup 284
Source Topics 284
Data Types 285
Custom Types 287
Collections 288
Creating Source Collections 289
With Clause 291
Working with Streams and Tables 292
Showing Streams and Tables 292
Describing Streams and Tables 294
x | Table of Contents
Altering Streams and Tables 295
Dropping Streams and Tables 295
Basic Queries 296
Insert Values 296
Simple Selects (Transient Push Queries) 298
Projection 299
Filtering 300
Flattening/Unnesting Complex Structures 302
Conditional Expressions 302
Coalesce 303
IFNULL 303
Case Statements 303
Writing Results Back to Kafka (Persistent Queries) 304
Creating Derived Collections 304
Putting It All Together 308
Summary 309
11. Intermediate and Advanced Stream Processing with ksqlDB. . . . . . . . . . . . . . . . . . . . 311
Project Setup 312
Bootstrapping an Environment from a SQL File 312
Data Enrichment 314
Joins 314
Windowed Joins 319
Aggregations 322
Aggregation Basics 323
Windowed Aggregations 325
Materialized Views 331
Clients 332
Pull Queries 333
Curl 335
Push Queries 336
Push Queries via Curl 336
Functions and Operators 337
Operators 337
Showing Functions 338
Describing Functions 339
Creating Custom Functions 340
Additional Resources for Custom ksqlDB Functions 345
Summary 346
Part IV. The Road to Production
12. Testing, Monitoring, and Deployment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
Testing 350
Testing ksqlDB Queries 350
Testing Kafka Streams 352
Behavioral Tests 359
Benchmarking 362
Kafka Cluster Benchmarking 364
Final Thoughts on Testing 366monitopring
Monitoring 366
Monitoring Checklist 367
Extracting JMX Metrics 367
Deployment 370
ksqlDB Containers 370
Kafka Streams Containers 372
Container Orchestration 373
Operations 374
Resetting a Kafka Streams Application 374
Rate-Limiting the Output of Your Application 376
Upgrading Kafka Streams 377
Upgrading ksqlDB 378
Summary 379
A. Kafka Streams Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
B. ksqlDB Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
Download
Comments
Post a Comment