在Apache kylin的日常运维中,通常根据日常运行产生的日志调整相关配置参数,从而达到性能的提升和运行的稳定性,kylin官网并没有给出这些配置的相关说明和解释,下面介绍一下kylin的配置。在${KYLIN_HOME}/conf 下一共4个配置文件:kylin_hive_conf.xmlkylin_job_conf_inmem.xmlkylin_job_conf.xmlkylin.properties kylin_hive_conf.xml是kylin提交任务到hive的配置文件,kylin_job_conf_inmem.xml、kylin_job_conf.xml是 kylin提交任务到yarn中的配置文件,用户可根据自己的情况酌情修改,下面介绍一下kylin.properties的重要配置项: kylin.server.mode=all
kylin服务器的运行模式,有all、job、query,涵义参见:https://zhuanlan.zhihu.com/p/22219602?refer=dataeyekylin.rest.servers=hostname1:7070,hostname2:7070,hostname3:7070
kylin实例服务器列表,注意:不包括以job模式运行的服务器实例!kylin.metadata.url=kylin_metadata@hbase
kylin元数据配置,涵义参见:https://zhuanlan.zhihu.com/p/22223631?refer=dataeyekylin.job.retry=0
kylin job的重试次数,注意:这个job指cube build、fresh时生成的job,而不是每一个step 的mapreduce job。kylin.job.mapreduce.default.reduce.input.mb=500
kylin提交作业到hadoop中时,每个reduce的最大输入,该参数用来确定mapreduce的reduce个数,参见以下代码:public double getDefaultHadoopJobReducerInputMB() {
return Double.parseDouble(getOptional("kylin.job.mapreduce.default.reduce.input.mb", "500"));}protected void setReduceTaskNum(Job job, KylinConfig config, String cubeName, int level) throws ClassNotFoundException, IOException, InterruptedException, JobException {
Configuration jobConf = job.getConfiguration();
KylinConfig kylinConfig = KylinConfig.getInstanceFromEnv();
CubeDesc cubeDesc = CubeManager.getInstance(config).getCube(cubeName).getDescriptor();
kylinConfig = cubeDesc.getConfig();
double perReduceInputMB = kylinConfig.getDefaultHadoopJobReducerInputMB();
double reduceCountRatio = kylinConfig.getDefaultHadoopJobReducerCountRatio();
// total map input MB
double totalMapInputMB = this.getTotalMapInputMB();
// output / input ratio
int preLevelCuboids, thisLevelCuboids;
if (level == 0) { // base cuboid
preLevelCuboids = thisLevelCuboids = 1;
} else { // n-cuboid
int[] allLevelCount = CuboidCLI.calculateAllLevelCount(cubeDesc);
preLevelCuboids = allLevelCount[level - 1];
thisLevelCuboids = allLevelCount[level];
}
// total reduce input MB
double totalReduceInputMB = totalMapInputMB * thisLevelCuboids / preLevelCuboids;
// number of reduce tasks
int numReduceTasks = (int) Math.round(totalReduceInputMB / perReduceInputMB * reduceCountRatio);
// adjust reducer number for cube which has DISTINCT_COUNT measures for better performance
if (cubeDesc.hasMemoryHungryMeasures()) {numReduceTasks = numReduceTasks * 4; }
// at least 1 reducer
numReduceTasks = Math.max(1, numReduceTasks);
// no more than 5000 reducer by default
numReduceTasks = Math.min(kylinConfig.getHadoopJobMaxReducerNumber(), numReduceTasks);
jobConf.setInt(MAPRED_REDUCE_TASKS, numReduceTasks);
logger.info("Having total map input MB " + Math.round(totalMapInputMB)); logger.info("Having level " + level + ", pre-level cuboids " + preLevelCuboids + ", this level cuboids " + thisLevelCuboids); logger.info("Having per reduce MB " + perReduceInputMB + ", reduce count ratio " + reduceCountRatio); logger.info("Setting " + MAPRED_REDUCE_TASKS + "=" + numReduceTasks);}用户可根据自己的数据量大小,性能要求及hadoop集群中的mapred-site.xml配置,酌情修改该项。kylin.job.run.as.remote.cmd=false
该项配置表示,是否以ssh命令方式,向hadoop、hbase、hive等发起CLI命令。一般将kylin部署在hadoop集群的客户机上,所以该值为false。假如kylin服务不部署在hadoop的客户机上,则该值为true;这样kylin访问hadoop集群,需要给出以下配置项的值:# Only necessary when kylin.job.run.as.remote.cmd=true
kylin.job.remote.cli.hostname=
# Only necessary when kylin.job.run.as.remote.cmd=true
kylin.job.remote.cli.username=
# Only necessary when kylin.job.run.as.remote.cmd=true
kylin.job.remote.cli.password=
---------------------------------------------分割线---------------------------------------------------------------------以下配置项是kylin并发执行job的最大值:kylin.job.concurrent.max.limit=10kylin检查提交yarn中的mapreduce任务状态的时间间隔:kylin.job.yarn.app.rest.check.interval.seconds=10代码如下:while (!isDiscarded()) {
JobStepStatusEnum newStatus = statusChecker.checkStatus();
if (status == JobStepStatusEnum.KILLED) {
executableManager.updateJobOutput(getId(), ExecutableState.ERROR, Collections.<
String, String> emptyMap(), "killed by admin");
return new ExecuteResult(ExecuteResult.State.FAILED, "killed by admin");
}
if (status == JobStepStatusEnum.WAITING && (newStatus == JobStepStatusEnum.FINISHED || newStatus == JobStepStatusEnum.ERROR || newStatus == JobStepStatusEnum.RUNNING)) {
final long waitTime = System.currentTimeMillis() - getStartTime();
setMapReduceWaitTime(waitTime);
}
status = newStatus;
executableManager.addJobInfo(getId(), hadoopCmdOutput.getInfo());
if (status.isComplete()) {
final Map<
String, String> info = hadoopCmdOutput.getInfo();
readCounters(hadoopCmdOutput, info);
executableManager.addJobInfo(getId(), info);
if (status == JobStepStatusEnum.FINISHED) {
return new ExecuteResult(ExecuteResult.State.SUCCEED, output.toString());
} else {
return new ExecuteResult(ExecuteResult.State.FAILED, output.toString());
}
}
Thread.sleep(context.getConfig().getYarnStatusCheckIntervalSeconds() * 1000);
}
以下配置项是kylin build cube时的第一步建立hive中间表所在的数据库:kylin.job.hive.database.for.intermediatetable=default以下是kylin build cube时在hbase中建表后,存储数据的压缩算法:kylin.hbase.default.compression.codec=snappy注意,设值时,先要检验hbase所指向的hadoop支不支持该压缩算法,检验命令如下:hadoop checknative -a检验结果如下:
该hadoop集群不支持snappy压缩算法,所以需修改默认值。
读过心理学相关书籍很多 擅长分析。向TA提问a有打麻将的嗜好的人是没有记性的,作为女朋友应该告诉男朋友,这是赌博,会倾家荡产的,你如果继续交往下去,那就必须有准备倾家荡产的。已赞过已踩过评论匿名用户2018-05-20
有和俺一起比赛的童鞋应该知道~~ 所以俺最后只拿了30名..(呵呵呵,其实俺比赛的时候很紧张,童鞋们打的又很快,俺有点适应不了有木有真人棋牌游戏,第一次嘛~~这样的成绩不错了是吧?)