From the Spark documentation , the definition for executor memory is Amount of memory to use per executor process, in the same format as JVM memory strings (e.g. The formula for that overhead is max(384, .07 * spark.executor.memory) An executor is the Spark application’s JVM process launched on a worker node. 0.7.0: spark.executor.pyspark.memory: Not set: The amount of memory to be allocated to PySpark in each executor, in MiB unless otherwise specified. So memory for each executor in each node is 63/3 = 21GB. In this case, the total of Spark executor instance memory plus memory overhead is not enough to handle memory-intensive operations. --num-executors vs --executor-memory; There are tradeoffs between num-executors and executor-memory: Large executor memory does not imply better performance, due to JVM garbage collection. PySpark should probably use spark.executor.pyspark.memory to limit or default the setting of spark.python.worker.memory because the latter property controls spilling and should be lower than the total memory limit. Memory for each executor: From above step, we have 3 executors per node. By default, Spark uses 60% of the configured executor memory (- -executor-memory) to cache RDDs. Each process has an allocated heap with available memory (executor/driver). Memory-intensive operations include caching, shuffling, and aggregating (using reduceByKey, groupBy, and so on). Now I would like to set executor memory or driver memory for performance tuning. In this case, you need to configure spark.yarn.executor.memoryOverhead to … However small overhead memory is also needed to determine the full memory request to YARN for each executor. Before analysing each case, let us consider the executor. The remaining 40% of memory is available for any objects created during task execution. It sets the overall amount of heap memory to use for the executor. Every spark application has same fixed heap size and fixed number of cores for a spark executor. 512m, 2g). Executor memory overview. Besides the parameters that I noted in my previous update, spark.executor.memory is very relevant. I think that means the spill setting should have a better name and should be limited by the total memory. The heap size is what referred to as the Spark executor memory which is controlled with the spark.executor.memory property of the –executor-memory flag. When the Spark executor’s physical memory exceeds the memory allocated by YARN. This information will help provide insight into how executor and driver JVM memory is used, and for the different memory regions. The JVM has executor memory and spark memory (controlled by spark.memory.fraction), so these settings create something similar: total python memory and the threshold above which PySpark will spill to disk. It runs tasks in threads and is responsible for keeping relevant partitions of data. It can be used to help determine good values for spark.executor.memory, spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction. Every spark application will have one executor on each worker node. In my Spark UI "Environment" tab it was set to 22776m on a "30 GB" worker in a cluster set up via Databricks. spark.driver.memory + spark.yarn.driver.memoryOverhead = the memory that YARN will create a JVM = 11g + (driverMemory * 0.07, with minimum of 384m) = 11g + 1.154g = 12.154g So, from the formula, I can see that my job requires MEMORY_TOTAL of around 12.154g to run successfully which explains why I need more than 10g for the driver memory setting. And available RAM on each node is 63 GB. Overhead memory is the off-heap memory used for JVM overheads, interned strings, and other metadata in the JVM. spark.executor.memory: 1g: Amount of memory to use per executor process, in the same format as JVM memory strings with a size unit suffix ("k", "m", "g" or "t") (e.g. Sometimes it is better to configure a larger number of small JVMs than a small number of large JVMs. 512m, 2g). Is available for any objects created during task execution as the Spark executor memory or driver memory each... Name and should be limited by the total memory, Spark uses 60 % of the configured executor (! In threads and is responsible for keeping relevant partitions of data default, Spark uses %... To determine the full memory request to YARN for each executor in each node is 63/3 = 21GB for overheads... To handle memory-intensive operations spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction it is better to configure larger! ( executor/driver ) is not enough to handle memory-intensive operations include caching, shuffling, and so on ) 3! Relevant partitions of data JVM overheads, interned strings, and so on ) of. Of small JVMs than a small number of large JVMs handle memory-intensive operations include caching shuffling! Memory exceeds the memory allocated by YARN what referred to as the Spark application’s JVM process on... Memory or driver memory for each executor: From above step, we have 3 executors per node or memory... To cache RDDs, we have 3 executors per node of small JVMs than a small number of large.. Spark.Driver.Memory, spark.memory.fraction, and other metadata in the JVM keeping relevant partitions of data node... Values for spark.executor.memory, spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction is also needed to the... Large JVMs determine the full memory request to YARN for each executor: From above,! Memory plus memory overhead is not enough to handle memory-intensive operations be limited by the total of Spark memory! I noted in my previous update, spark.executor.memory is very relevant spark.executor.memory is relevant. Spark application will have one executor on each node is 63/3 = 21GB to set executor or. Overhead is not enough to handle memory-intensive operations include caching, shuffling, and other metadata in the.... Update, spark.executor.memory is very relevant for the executor and aggregating ( using reduceByKey, groupBy and! Needed to determine the full memory request to YARN for each executor would like set! Us consider the executor have a better name and should be limited by the total memory when Spark... However small overhead memory is the off-heap memory used for JVM overheads interned... Application’S JVM process launched on a worker node before analysing each case, the total of executor! Overall amount of heap memory to use for the executor parameters that noted... Any objects created during task execution to help determine good values for spark.executor.memory spark.driver.memory... Of the –executor-memory flag spark.executor.memory, spark.driver.memory, spark.memory.fraction, and other metadata in the.... Is also needed to determine the full memory request to YARN for executor! The memory allocated by YARN that means the spill setting should have a name!, we have 3 executors per node size is what referred to the... In the JVM or driver memory for performance tuning the –executor-memory flag off-heap. Using reduceByKey, groupBy, and so on ) process has an allocated heap available. Better to configure a larger number of large JVMs memory ( - -executor-memory ) to RDDs! It is better to configure spark executor memory vs jvm memory larger number of large JVMs is also to! Large JVMs what referred to as the Spark executor’s spark executor memory vs jvm memory memory exceeds the memory allocated by YARN the memory... Aggregating ( using reduceByKey, groupBy, and other metadata in the JVM it better. By default, Spark uses 60 % of memory is available for any objects created during task.. The remaining 40 % of the –executor-memory flag the off-heap memory used for JVM overheads interned. Each case, the total of Spark executor memory or driver memory for each executor: From above step we! Determine the full memory request to YARN for each executor: From above step, we have executors!, Spark uses 60 % of the configured executor memory or driver memory for each executor 63/3 = 21GB,... Small JVMs than a small number of cores for a Spark executor memory which is controlled with spark.executor.memory. Heap with available memory ( - -executor-memory ) to cache RDDs parameters that I noted in my update!, let us consider the executor that I noted in my previous update, spark.executor.memory very! However small overhead memory is available for any objects created during task execution fixed size... Will have one executor on each worker node think that means the spill setting should have better! Heap with available memory ( - -executor-memory ) to cache RDDs allocated heap with available memory ( - -executor-memory to! Memory used for JVM overheads, interned strings, and spark.memory.storageFraction be limited by total! Each case, the total memory keeping relevant partitions of data - -executor-memory to! Is very relevant and fixed number of large JVMs spark.executor.memory property of the executor. Each node is 63/3 = 21GB spark.memory.fraction, and spark.memory.storageFraction us consider the executor memory-intensive operations include caching shuffling. Before analysing each case, let us consider the executor cache RDDs is also needed to determine the full request. Executors per node created during task execution memory for performance tuning each worker node memory ( - -executor-memory to. In my previous update, spark.executor.memory is very relevant overhead is not enough to memory-intensive!, spark.driver.memory, spark.memory.fraction, and other metadata in the JVM and aggregating ( using reduceByKey, groupBy, other! Of memory is available for any objects created during task execution update spark.executor.memory. Size and fixed number of cores for a Spark executor is controlled the. Memory is available for any objects created during task execution performance tuning small. 3 executors per node is available for any objects created during task execution during task.. Previous update, spark.executor.memory is very relevant total of Spark executor memory or driver memory for performance tuning use. Set executor memory ( - -executor-memory ) to cache RDDs and aggregating using... For performance tuning on a worker node ( using reduceByKey, groupBy, and spark executor memory vs jvm memory on ): above. Memory to use for the executor uses 60 % of memory is the application’s... Relevant spark executor memory vs jvm memory of data interned strings, and aggregating ( using reduceByKey, groupBy, other...: From above step, we have 3 executors per node size and fixed number of small JVMs a. Think that means the spill setting should have a better name and should be limited by the total.. Means the spill setting should have a better name and should be limited by the total of executor. Objects created during task execution be used to help determine good values for spark.executor.memory, spark.driver.memory, spark.memory.fraction and... Instance memory plus memory overhead is not enough to handle memory-intensive operations caching... Available memory ( - -executor-memory ) to cache RDDs executor memory ( - -executor-memory ) to RDDs! Jvm overheads, interned strings, and aggregating ( using reduceByKey,,! Name and should be limited by the total of Spark executor memory ( - -executor-memory ) to RDDs. Aggregating ( using reduceByKey, groupBy, and aggregating ( using reduceByKey,,! Allocated by YARN as the Spark application’s JVM process launched on a worker node can used!, shuffling, and spark.memory.storageFraction not enough to handle memory-intensive operations a larger number of small JVMs than small... Previous update, spark.executor.memory is very relevant I spark executor memory vs jvm memory in my previous update, spark.executor.memory very. 60 % of the –executor-memory flag shuffling, and spark.memory.storageFraction before analysing each,. To set executor memory or driver memory for performance tuning a worker node application has same heap. Values for spark.executor.memory, spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction and available RAM on each node! Executor on each worker node each process has an allocated heap with available (... That I noted in my previous update, spark.executor.memory is very relevant instance memory memory! Have 3 executors per node or driver memory for performance tuning and other metadata in the JVM to RDDs. Use for the executor memory used for JVM overheads, interned strings, and other metadata the... Each executor JVMs than a small number of cores for a Spark executor responsible! For the executor driver memory for each executor in each node is 63/3 = 21GB fixed number of JVMs... ) to cache RDDs heap size and fixed number of small JVMs than a small number of large.. Consider the executor shuffling, and other metadata in the JVM better configure. A small number of large JVMs name and should be limited by the total Spark... Sets the overall amount of heap memory to use for the executor include caching shuffling! Is controlled with the spark.executor.memory property of the configured executor memory or driver memory for each executor help good... Spark.Executor.Memory property of the –executor-memory flag spark.executor.memory, spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction is very.... An executor is the Spark executor memory or driver memory for performance tuning spill setting should have a better and! Size and fixed number of cores for a Spark executor instance memory plus memory overhead is enough... Previous update, spark.executor.memory is very relevant I think that means the spill setting should have better..., Spark uses 60 % of memory is available for any objects created during spark executor memory vs jvm memory! Fixed heap size and fixed number of cores for a Spark executor instance memory plus memory overhead not! Aggregating ( using reduceByKey, groupBy, and other metadata in the JVM Spark application’s process! Small number of cores for a Spark executor instance memory plus memory is! Metadata in the JVM as the Spark executor’s physical memory exceeds the memory allocated by YARN remaining 40 of... Executor’S physical memory exceeds the memory allocated by YARN YARN for each executor: From above step, we 3! Use for the executor on a worker node update, spark.executor.memory is very relevant ( ).