当前位置: 首页 > news >正文

封面制作网站网站建设项目书

封面制作网站,网站建设项目书,做的好的自驾游网站,临淄关键词网站优化培训中心目录 一、前情提要 二、代码Demo #xff08;一#xff09;多写问题 #xff08;二#xff09;如果要两个流写一个表#xff0c;这种情况怎么处理#xff1f; #xff08;三#xff09;测试结果 三、后序 一、前情提要 基于数据湖对两条实时流进行拼接#xff0…目录 一、前情提要 二、代码Demo 一多写问题 二如果要两个流写一个表这种情况怎么处理 三测试结果 三、后序 一、前情提要 基于数据湖对两条实时流进行拼接如前端埋点服务端埋点、日志流订单流等 基础概念见前一篇文章基于数据湖的多流拼接方案-HUDI概念篇_Leonardo_KY的博客-CSDN博客 二、代码Demo 下文demo均使用datagen生成mock数据进行测试如到生产改成Kafka或者其他source即可。 第一个jobstream A落hudi表 env.getCheckpointConfig().setMinPauseBetweenCheckpoints(minPauseBetweenCP); // 1senv.getCheckpointConfig().setCheckpointTimeout(checkpointTimeout); // 2 minenv.getCheckpointConfig().setMaxConcurrentCheckpoints(maxConcurrentCheckpoints);// env.getCheckpointConfig().setCheckpointStorage(file:///D:/Users/yakang.lu/tmp/checkpoints/);TableEnvironment tableEnv StreamTableEnvironment.create(env);// datagentableEnv.executeSql(CREATE TABLE sourceA (\n uuid bigint PRIMARY KEY NOT ENFORCED,\n name VARCHAR(3), _ts1 TIMESTAMP(3)\n ) WITH (\n connector datagen, \n fields.uuid.kindsequence,\n fields.uuid.start0, \n fields.uuid.end1000000, \n rows-per-second 1 \n ));// huditableEnv.executeSql(create table hudi_tableA(\n uuid bigint PRIMARY KEY NOT ENFORCED,\n age int,\n name VARCHAR(3),\n _ts1 TIMESTAMP(3),\n _ts2 TIMESTAMP(3),\n d VARCHAR(10)\n )\n PARTITIONED BY (d)\n with (\n connector hudi,\n path hdfs://ns/user/hive/warehouse/ctripdi_prodb.db/hudi_mor_mutil_source_test, \n // hdfs path table.type MERGE_ON_READ,\n write.bucket_assign.tasks 10,\n write.tasks 10,\n write.partition.format yyyyMMddHH,\n write.partition.timestamp.type EPOCHMILLISECONDS,\n hoodie.bucket.index.num.buckets 2,\n changelog.enabled true,\n index.type BUCKET,\n hoodie.bucket.index.num.buckets 2,\n String.format( %s %s,\n, FlinkOptions.PRECOMBINE_FIELD.key(), _ts1) write.payload.class PartialUpdateAvroPayload.class.getName() ,\n hoodie.write.log.suffix job1,\n hoodie.write.concurrency.mode optimistic_concurrency_control,\n hoodie.write.lock.provider org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider,\n hoodie.cleaner.policy.failed.writes LAZY,\n hoodie.cleaner.policy KEEP_LATEST_BY_HOURS,\n hoodie.consistency.check.enabled false,\n// hoodie.write.lock.early.conflict.detection.enable true,\n // todo// hoodie.write.lock.early.conflict.detection.strategy // SimpleTransactionDirectMarkerBasedEarlyConflictDetectionStrategy.class.getName() ,\n // hoodie.keep.min.commits 1440,\n hoodie.keep.max.commits 2880,\n compaction.schedule.enabledfalse,\n compaction.async.enabledfalse,\n compaction.trigger.strategynum_or_time,\n compaction.delta_commits 3,\n compaction.delta_seconds 60,\n compaction.max_memory 3096,\n clean.async.enabled false,\n hive_sync.enable false\n// hive_sync.mode hms,\n// hive_sync.db %s,\n// hive_sync.table %s,\n// hive_sync.metastore.uris %s\n ));// sqlStatementSet statementSet tableEnv.createStatementSet();String sqlString insert into hudi_tableA(uuid, name, _ts1, d) select * from (select *,date_format(CURRENT_TIMESTAMP,yyyyMMdd) AS d from sourceA) view1;statementSet.addInsertSql(sqlString);statementSet.execute(); 第二个jobstream B落hudi表 StreamExecutionEnvironment env manager.getEnv();env.getCheckpointConfig().setMinPauseBetweenCheckpoints(minPauseBetweenCP); // 1senv.getCheckpointConfig().setCheckpointTimeout(checkpointTimeout); // 2 minenv.getCheckpointConfig().setMaxConcurrentCheckpoints(maxConcurrentCheckpoints);// env.getCheckpointConfig().setCheckpointStorage(file:///D:/Users/yakang.lu/tmp/checkpoints/);TableEnvironment tableEnv StreamTableEnvironment.create(env);// datagentableEnv.executeSql(CREATE TABLE sourceB (\n uuid bigint PRIMARY KEY NOT ENFORCED,\n age int, _ts2 TIMESTAMP(3)\n ) WITH (\n connector datagen, \n fields.uuid.kindsequence,\n fields.uuid.start0, \n fields.uuid.end1000000, \n rows-per-second 1 \n ));// huditableEnv.executeSql(create table hudi_tableB(\n uuid bigint PRIMARY KEY NOT ENFORCED,\n age int,\n name VARCHAR(3),\n _ts1 TIMESTAMP(3),\n _ts2 TIMESTAMP(3),\n d VARCHAR(10)\n )\n PARTITIONED BY (d)\n with (\n connector hudi,\n path hdfs://ns/user/hive/warehouse/ctripdi_prodb.db/hudi_mor_mutil_source_test, \n // hdfs path table.type MERGE_ON_READ,\n write.bucket_assign.tasks 10,\n write.tasks 10,\n write.partition.format yyyyMMddHH,\n hoodie.bucket.index.num.buckets 2,\n changelog.enabled true,\n index.type BUCKET,\n hoodie.bucket.index.num.buckets 2,\n String.format( %s %s,\n, FlinkOptions.PRECOMBINE_FIELD.key(), _ts1) write.payload.class PartialUpdateAvroPayload.class.getName() ,\n hoodie.write.log.suffix job2,\n hoodie.write.concurrency.mode optimistic_concurrency_control,\n hoodie.write.lock.provider org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider,\n hoodie.cleaner.policy.failed.writes LAZY,\n hoodie.cleaner.policy KEEP_LATEST_BY_HOURS,\n hoodie.consistency.check.enabled false,\n// hoodie.write.lock.early.conflict.detection.enable true,\n // todo// hoodie.write.lock.early.conflict.detection.strategy // SimpleTransactionDirectMarkerBasedEarlyConflictDetectionStrategy.class.getName() ,\n hoodie.keep.min.commits 1440,\n hoodie.keep.max.commits 2880,\n compaction.schedule.enabledtrue,\n compaction.async.enabledtrue,\n compaction.trigger.strategynum_or_time,\n compaction.delta_commits 2,\n compaction.delta_seconds 90,\n compaction.max_memory 3096,\n clean.async.enabled false\n// hive_sync.mode hms,\n// hive_sync.db %s,\n// hive_sync.table %s,\n// hive_sync.metastore.uris %s\n ));// sqlStatementSet statementSet tableEnv.createStatementSet();String sqlString insert into hudi_tableB(uuid, age, _ts1, _ts2, d) select * from (select *, _ts2 as ts1, date_format(CURRENT_TIMESTAMP,yyyyMMdd) AS d from sourceB) view2;// statementSet.addInsertSql(insert into hudi_tableB(uuid, age, _ts2) select * from sourceB);statementSet.addInsertSql(sqlString);statementSet.execute(); 也可以将两个 writer 放到同一个app中使用statement import java.time.ZoneOffset; import org.apache.flink.configuration.Configuration; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.table.api.StatementSet; import org.apache.flink.table.api.TableConfig; import org.apache.flink.table.api.TableEnvironment; import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; import org.apache.flink.table.api.config.TableConfigOptions; import org.apache.hudi.common.model.PartialUpdateAvroPayload; import org.apache.hudi.configuration.FlinkOptions; // import org.apache.hudi.table.marker.SimpleTransactionDirectMarkerBasedEarlyConflictDetectionStrategy;public class Test00 {public static void main(String[] args) {Configuration configuration TableConfig.getDefault().getConfiguration();configuration.setString(TableConfigOptions.LOCAL_TIME_ZONE, ZoneOffset.ofHours(8).toString());//设置东八区// configuration.setInteger(rest.port, 8086);StreamExecutionEnvironment env StreamExecutionEnvironment.createLocalEnvironment(configuration);env.setParallelism(1);env.enableCheckpointing(12000L);// env.getCheckpointConfig().setCheckpointStorage(file:///Users/laifei/tmp/checkpoints/);TableEnvironment tableEnv StreamTableEnvironment.create(env);// datagentableEnv.executeSql(CREATE TABLE sourceA (\n uuid bigint PRIMARY KEY NOT ENFORCED,\n name VARCHAR(3), _ts1 TIMESTAMP(3)\n ) WITH (\n connector datagen, \n fields.uuid.kindsequence,\n fields.uuid.start0, \n fields.uuid.end1000000, \n rows-per-second 1 \n ));tableEnv.executeSql(CREATE TABLE sourceB (\n uuid bigint PRIMARY KEY NOT ENFORCED,\n age int, _ts2 TIMESTAMP(3)\n ) WITH (\n connector datagen, \n fields.uuid.kindsequence,\n fields.uuid.start0, \n fields.uuid.end1000000, \n rows-per-second 1 \n ));// huditableEnv.executeSql(create table hudi_tableA(\n uuid bigint PRIMARY KEY NOT ENFORCED,\n name VARCHAR(3),\n age int,\n _ts1 TIMESTAMP(3),\n _ts2 TIMESTAMP(3)\n )\n PARTITIONED BY (_ts1)\n with (\n connector hudi,\n path file:\\D:\\Ctrip\\dataWork\\tmp, \n // hdfs path table.type MERGE_ON_READ,\n write.bucket_assign.tasks 2,\n write.tasks 2,\n write.partition.format yyyyMMddHH,\n write.partition.timestamp.type EPOCHMILLISECONDS,\n hoodie.bucket.index.num.buckets 2,\n changelog.enabled true,\n index.type BUCKET,\n hoodie.bucket.index.num.buckets 2,\n// String.format( %s %s,\n, FlinkOptions.PRECOMBINE_FIELD.key(), _ts1:name|_ts2:age)// write.payload.class PartialUpdateAvroPayload.class.getName() ,\n hoodie.write.log.suffix job1,\n hoodie.write.concurrency.mode optimistic_concurrency_control,\n hoodie.write.lock.provider org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider,\n hoodie.cleaner.policy.failed.writes LAZY,\n hoodie.cleaner.policy KEEP_LATEST_BY_HOURS,\n hoodie.consistency.check.enabled false,\n hoodie.write.lock.early.conflict.detection.enable true,\n hoodie.write.lock.early.conflict.detection.strategy // SimpleTransactionDirectMarkerBasedEarlyConflictDetectionStrategy.class.getName() ,\n // hoodie.keep.min.commits 1440,\n hoodie.keep.max.commits 2880,\n compaction.schedule.enabledfalse,\n compaction.async.enabledfalse,\n compaction.trigger.strategynum_or_time,\n compaction.delta_commits 3,\n compaction.delta_seconds 60,\n compaction.max_memory 3096,\n clean.async.enabled false,\n hive_sync.enable false\n// hive_sync.mode hms,\n// hive_sync.db %s,\n// hive_sync.table %s,\n// hive_sync.metastore.uris %s\n ));tableEnv.executeSql(create table hudi_tableB(\n uuid bigint PRIMARY KEY NOT ENFORCED,\n name VARCHAR(3),\n age int,\n _ts1 TIMESTAMP(3),\n _ts2 TIMESTAMP(3)\n )\n PARTITIONED BY (_ts2)\n with (\n connector hudi,\n path /Users/laifei/tmp/hudi/local.db/mutiwrite1, \n // hdfs path table.type MERGE_ON_READ,\n write.bucket_assign.tasks 2,\n write.tasks 2,\n write.partition.format yyyyMMddHH,\n hoodie.bucket.index.num.buckets 2,\n changelog.enabled true,\n index.type BUCKET,\n hoodie.bucket.index.num.buckets 2,\n// String.format( %s %s,\n, FlinkOptions.PRECOMBINE_FIELD.key(), _ts1:name|_ts2:age)// write.payload.class PartialUpdateAvroPayload.class.getName() ,\n hoodie.write.log.suffix job2,\n hoodie.write.concurrency.mode optimistic_concurrency_control,\n hoodie.write.lock.provider org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider,\n hoodie.cleaner.policy.failed.writes LAZY,\n hoodie.cleaner.policy KEEP_LATEST_BY_HOURS,\n hoodie.consistency.check.enabled false,\n hoodie.write.lock.early.conflict.detection.enable true,\n hoodie.write.lock.early.conflict.detection.strategy // SimpleTransactionDirectMarkerBasedEarlyConflictDetectionStrategy.class.getName() ,\n hoodie.keep.min.commits 1440,\n hoodie.keep.max.commits 2880,\n compaction.schedule.enabledtrue,\n compaction.async.enabledtrue,\n compaction.trigger.strategynum_or_time,\n compaction.delta_commits 2,\n compaction.delta_seconds 90,\n compaction.max_memory 3096,\n clean.async.enabled false\n// hive_sync.mode hms,\n// hive_sync.db %s,\n// hive_sync.table %s,\n// hive_sync.metastore.uris %s\n ));// sqlStatementSet statementSet tableEnv.createStatementSet();statementSet.addInsertSql(insert into hudi_tableA(uuid, name, _ts1) select * from sourceA);statementSet.addInsertSql(insert into hudi_tableB(uuid, age, _ts2) select * from sourceB);statementSet.execute();} } 一多写问题 由于HUDI官方提供的code打成jar包是不支持“多写”的这里使用Tencent改造之后的code进行打包测试 如果使用官方包多个writer写入同一个hudi表则会报如下异常 而且 hudi中有个preCombineField在建表的时候只能指定其中一个字段为preCombineField但是如果使用官方版本双流写同一个hudi的时候出现两种情况 1. 一条流写preCombineField另一条流不写这个字段后者会出现 ordering value不能为null 2. 两条流都写这个字段出现字段冲突异常 二如果要两个流写一个表这种情况怎么处理 经过本地测试 hudi0.12-multiWrite版本Tencent修改版可以支持多 precombineField在此版本中只要保证主键、分区字段之外的字段在多个流中不冲突即可实现多写 hudi0.13版本不支持而且存在上述问题  三测试结果 Tencent文章链接https://cloud.tencent.com/developer/article/2189914 github链接GitHub - XuQianJin-Stars/hudi at multiwrite-master-7 hudi打包很麻烦如果需要我将后续上传打好的jar包 三、后序 基于上述code当流量比较大的时候似乎会存在一定程度的数据丢失在其中一条流进行compact则另一条流就会存在一定程度的数据丢失 可以尝试 1先将两个流UNION为一个流再sink到hudi表也避免了写冲突 2使用其他数据湖工具比如apache paimon参考新一代数据湖存储技术Apache Paimon入门Demo_Leonardo_KY的博客-CSDN博客
http://www.w-s-a.com/news/262357/

相关文章:

  • 站长工具一区品牌建设卓有成效
  • 电子商务网站建设案例wordpress批量编辑
  • 想代理个网站建设平台100个最佳市场营销案例
  • 钟表东莞网站建设石家庄做网站时光
  • 织梦 图片网站源码成都建设工程安监局网站
  • 做兼职的网站策划书湖北省建设工程造价信息网
  • 企业网站网址长期做网站应该购买稳定的空间
  • 网站静态化设计html5手机网站制作
  • 深圳最简单的网站建设家居网站建设全网营销
  • 如何取消网站备案佛山网站优化公司
  • 网站开发 成都广水网站设计
  • 音乐网站建设目标合同管理系统
  • jq网站特效插件如何知道网站是否被k
  • 自己的网站怎么接广告网站搭建收费
  • 宁波大型网站制作建立一个网站 优帮云
  • 大连零基础网站建设教学电话有哪些比较好的做ppt好的网站
  • 哪个网站做logo设计我的建筑网
  • php电子商务网站开发沂源手机网站建设公司
  • html和php做网站哪个好3gcms企业手机网站整站源码asp
  • 网站建设网页设计案例云南建设厅网站删除
  • 杏坛网站制作太原做网站要多少钱呢
  • 做新闻类网站还有市场吗东莞黄页网广告
  • 地方网站做外卖专业做互联网招聘的网站有哪些
  • 网站推广公司兴田德润紧急网页升级紧急通知
  • 厦门做网站哪家强企业网站网页设计的步骤
  • 普拓网站建设济南行业网站建设
  • 燕郊 网站开发网站里的地图定位怎么做
  • 门户网站建设招标互联网创业项目概述
  • 用什么做网站比较好市场调研公司是做什么的
  • 电商网站充值消费系统绍兴网站优化