Spark SQL性能优化

  • 时间:
  • 浏览:0
  • 来源:下载大发快三彩票代理—大发快三直播

针对Spark SQL 性能调优参数如下:

When caching SchemaRDDs, Spark SQL groups together the records in the RDD in batches of the size given by this option (default: 800), and compresses each batch. Very small batch sizes lead to low compression, but on the other hand very large sizes can also be problematic, as each batch might be too large to build up in memory.

spark.sql.codegen Spark SQL在每次执行次,先把SQL查询编译JAVA字节码。针对执行时间长的SQL查询或频繁执行的SQL查询,此配置能加快查询速率单位,肯能它产生特殊的字节码去执行。而且针对很短(1 - 2秒)的临时查询,这肯能增加开销,肯能它不到先编译每一另三个 多 查询。

spark.sql.inMemoryColumnarStorage.batchSize