Apachekylin进阶——配置篇

在Apache kylin的日常运维中，通常根据日常运行产生的日志调整相关配置参数，从而达到性能的提升和运行的稳定性，kylin官网并没有给出这些配置的相关说明和解释，下面介绍一下kylin的配置。在${KYLIN_HOME}/conf 下一共4个配置文件：kylin_hive_conf.xmlkylin_job_conf_inmem.xmlkylin_job_conf.xmlkylin.properties kylin_hive_conf.xml是kylin提交任务到hive的配置文件，kylin_job_conf_inmem.xml、kylin_job_conf.xml是 kylin提交任务到yarn中的配置文件，用户可根据自己的情况酌情修改，下面介绍一下kylin.properties的重要配置项： kylin.server.mode=all kylin服务器的运行模式，有all、job、query，涵义参见：https://zhuanlan.zhihu.com/p/22219602?refer=dataeyekylin.rest.servers=hostname1:7070,hostname2:7070,hostname3:7070 kylin实例服务器列表，注意：不包括以job模式运行的服务器实例！kylin.metadata.url=kylin_metadata@hbase kylin元数据配置，涵义参见：https://zhuanlan.zhihu.com/p/22223631?refer=dataeyekylin.job.retry=0 kylin job的重试次数，注意：这个job指cube build、fresh时生成的job,而不是每一个step 的mapreduce job。kylin.job.mapreduce.default.reduce.input.mb=500 kylin提交作业到hadoop中时，每个reduce的最大输入，该参数用来确定mapreduce的reduce个数，参见以下代码：public double getDefaultHadoopJobReducerInputMB() { return Double.parseDouble(getOptional("kylin.job.mapreduce.default.reduce.input.mb", "500"));}protected void setReduceTaskNum(Job job, KylinConfig config, String cubeName, int level) throws ClassNotFoundException, IOException, InterruptedException, JobException { Configuration jobConf = job.getConfiguration(); KylinConfig kylinConfig = KylinConfig.getInstanceFromEnv(); CubeDesc cubeDesc = CubeManager.getInstance(config).getCube(cubeName).getDescriptor(); kylinConfig = cubeDesc.getConfig(); double perReduceInputMB = kylinConfig.getDefaultHadoopJobReducerInputMB(); double reduceCountRatio = kylinConfig.getDefaultHadoopJobReducerCountRatio(); // total map input MB double totalMapInputMB = this.getTotalMapInputMB(); // output / input ratio int preLevelCuboids, thisLevelCuboids; if (level == 0) { // base cuboid preLevelCuboids = thisLevelCuboids = 1; } else { // n-cuboid int[] allLevelCount = CuboidCLI.calculateAllLevelCount(cubeDesc); preLevelCuboids = allLevelCount[level - 1]; thisLevelCuboids = allLevelCount[level]; } // total reduce input MB double totalReduceInputMB = totalMapInputMB * thisLevelCuboids / preLevelCuboids; // number of reduce tasks int numReduceTasks = (int) Math.round(totalReduceInputMB / perReduceInputMB * reduceCountRatio); // adjust reducer number for cube which has DISTINCT_COUNT measures for better performance if (cubeDesc.hasMemoryHungryMeasures()) {numReduceTasks = numReduceTasks * 4; } // at least 1 reducer numReduceTasks = Math.max(1, numReduceTasks); // no more than 5000 reducer by default numReduceTasks = Math.min(kylinConfig.getHadoopJobMaxReducerNumber(), numReduceTasks); jobConf.setInt(MAPRED_REDUCE_TASKS, numReduceTasks); logger.info("Having total map input MB " + Math.round(totalMapInputMB)); logger.info("Having level " + level + ", pre-level cuboids " + preLevelCuboids + ", this level cuboids " + thisLevelCuboids); logger.info("Having per reduce MB " + perReduceInputMB + ", reduce count ratio " + reduceCountRatio); logger.info("Setting " + MAPRED_REDUCE_TASKS + "=" + numReduceTasks);}用户可根据自己的数据量大小，性能要求及hadoop集群中的mapred-site.xml配置，酌情修改该项。kylin.job.run.as.remote.cmd=false 该项配置表示，是否以ssh命令方式，向hadoop、hbase、hive等发起CLI命令。一般将kylin部署在hadoop集群的客户机上，所以该值为false。假如kylin服务不部署在hadoop的客户机上，则该值为true;这样kylin访问hadoop集群，需要给出以下配置项的值：# Only necessary when kylin.job.run.as.remote.cmd=true kylin.job.remote.cli.hostname= # Only necessary when kylin.job.run.as.remote.cmd=true kylin.job.remote.cli.username= # Only necessary when kylin.job.run.as.remote.cmd=true kylin.job.remote.cli.password= ---------------------------------------------分割线---------------------------------------------------------------------以下配置项是kylin并发执行job的最大值：kylin.job.concurrent.max.limit=10kylin检查提交yarn中的mapreduce任务状态的时间间隔：kylin.job.yarn.app.rest.check.interval.seconds=10代码如下：while (!isDiscarded()) { JobStepStatusEnum newStatus = statusChecker.checkStatus(); if (status == JobStepStatusEnum.KILLED) { executableManager.updateJobOutput(getId(), ExecutableState.ERROR, Collections.<String, String> emptyMap(), "killed by admin"); return new ExecuteResult(ExecuteResult.State.FAILED, "killed by admin"); } if (status == JobStepStatusEnum.WAITING && (newStatus == JobStepStatusEnum.FINISHED || newStatus == JobStepStatusEnum.ERROR || newStatus == JobStepStatusEnum.RUNNING)) { final long waitTime = System.currentTimeMillis() - getStartTime(); setMapReduceWaitTime(waitTime); } status = newStatus; executableManager.addJobInfo(getId(), hadoopCmdOutput.getInfo()); if (status.isComplete()) { final Map<String, String> info = hadoopCmdOutput.getInfo(); readCounters(hadoopCmdOutput, info); executableManager.addJobInfo(getId(), info); if (status == JobStepStatusEnum.FINISHED) { return new ExecuteResult(ExecuteResult.State.SUCCEED, output.toString()); } else { return new ExecuteResult(ExecuteResult.State.FAILED, output.toString()); } } Thread.sleep(context.getConfig().getYarnStatusCheckIntervalSeconds() * 1000); } 以下配置项是kylin build cube时的第一步建立hive中间表所在的数据库：kylin.job.hive.database.for.intermediatetable=default以下是kylin build cube时在hbase中建表后，存储数据的压缩算法：kylin.hbase.default.compression.codec=snappy注意，设值时，先要检验hbase所指向的hadoop支不支持该压缩算法，检验命令如下：hadoop checknative -a检验结果如下：

该hadoop集群不支持snappy压缩算法，所以需修改默认值。

读过心理学相关书籍很多擅长分析。向TA提问a有打麻将的嗜好的人是没有记性的，作为女朋友应该告诉男朋友，这是赌博，会倾家荡产的，你如果继续交往下去，那就必须有准备倾家荡产的。已赞过已踩过评论匿名用户2018-05-20

有和俺一起比赛的童鞋应该知道~~ 所以俺最后只拿了30名..（呵呵呵，其实俺比赛的时候很紧张，童鞋们打的又很快，俺有点适应不了有木有真人棋牌游戏，第一次嘛~~这样的成绩不错了是吧?）

点击进入!

Apachekylin进阶——配置篇