GitHub - norrishuang/aws-sample-norris

Flink CDC & Import Data to Iceberg/Hudi With KDA

[TOC]

代码说明：

本项目代码，可用于运行在 Amazon Kinesis Data Analytics 中。

1.FlinkCDCPostgres

通过FlinkCDC实时采集Postgresql的数据，写入Kafka。

2.IcebergApplication

通过FlinkSQL，将mysql数据实时摄入 Iceberg。由于KDA与Iceberg集成存在问题，会遇到类似 java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration 这样的异常错误，可参见 #3044。本项目提供 workaround 解决该问题。

解决方案

参考 pom.xml 文件，通过 relocation 将冲突的类替换。
重写 HadoopUtils 类

参考

3.HuidApplication

消费Kafka的数据，以Hudi格式写入S3。MSK 先用无认证的模式。

执行：

编译后将target目录下的 jar文件上传至S3目录下。
创建Kinesis Data Application项目，Flink 选择 1.5 版本。
VPC 选择 MSK 所在的VPC

Runtime properties 配置如下

Group	Key	Value
FlinkApplicationProperties	brokers	MSK Boostrap Server
FlinkApplicationProperties	kafka-topic	需要消费的topic name
FlinkApplicationProperties	s3Path	Hudi写入的S3目录（用s3a：例如s3a://[your bucket name]/data）
FlinkApplicationProperties	hivemetastore	用于同步hive元数据的thriftserver

保存配置后，点击【Run】

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.idea		.idea
src/main/java		src/main/java
.gitignore		.gitignore
assembly.xml		assembly.xml
dependency-reduced-pom.xml		dependency-reduced-pom.xml
pom.xml		pom.xml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flink CDC & Import Data to Iceberg/Hudi With KDA

代码说明：

1.FlinkCDCPostgres

2.IcebergApplication

3.HuidApplication

About

Releases

Packages

Languages

norrishuang/aws-sample-norris

Folders and files

Latest commit

History

Repository files navigation

Flink CDC & Import Data to Iceberg/Hudi With KDA

代码说明：

1.FlinkCDCPostgres

2.IcebergApplication

3.HuidApplication

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages