INFO SparkEnv: Registering MapOutputTracker
MapOutputTrackerMaster
A MapOutputTrackerMaster is the MapOutputTracker
for a driver.
A MapOutputTrackerMaster is the source of truth for the collection of MapStatus
objects (map output locations) per shuffle id (as recorded from ShuffleMapTasks).
MapOutputTrackerMaster
uses Spark’s org.apache.spark.util.TimeStampedHashMap
for mapStatuses
.
Note
|
There is currently a hardcoded limit of map and reduce tasks above which Spark does not assign preferred locations aka locality preferences based on map output sizes — 1000 for map and reduce each.
|
It uses MetadataCleaner
with MetadataCleanerType.MAP_OUTPUT_TRACKER
as cleanerType
and cleanup function to drop entries in mapStatuses
.
You should see the following INFO message when the MapOutputTrackerMaster is created (FIXME it uses MapOutputTrackerMasterEndpoint
):
registerShuffle
Method
Caution
|
FIXME |
getStatistics
Method
Caution
|
FIXME |
unregisterMapOutput
Method
Caution
|
FIXME |
registerMapOutputs
Method
Caution
|
FIXME |
incrementEpoch
Method
Caution
|
FIXME |
cleanup Function for MetadataCleaner
cleanup(cleanupTime: Long)
method removes old entries in mapStatuses
and cachedSerializedStatuses
that have timestamp earlier than cleanupTime
.
It uses org.apache.spark.util.TimeStampedHashMap.clearOldValues
method.
Tip
|
Enable Add the following line to
|
You should see the following DEBUG message in the logs for entries being removed:
DEBUG Removing key [entry.getKey]
getEpoch
Method
Caution
|
FIXME |
Settings
Spark Property | Default Value | Description |
---|---|---|
|
Controls whether to compute locality preferences for reduce tasks. When enabled (i.e. |