INFO SparkEnv: Registering MapOutputTracker
MapOutputTrackerMaster
A MapOutputTrackerMaster is the MapOutputTracker for a driver.
A MapOutputTrackerMaster is the source of truth for the collection of MapStatus objects (map output locations) per shuffle id (as recorded from ShuffleMapTasks).
MapOutputTrackerMaster uses Spark’s org.apache.spark.util.TimeStampedHashMap for mapStatuses.
|
Note
|
There is currently a hardcoded limit of map and reduce tasks above which Spark does not assign preferred locations aka locality preferences based on map output sizes — 1000 for map and reduce each.
|
It uses MetadataCleaner with MetadataCleanerType.MAP_OUTPUT_TRACKER as cleanerType and cleanup function to drop entries in mapStatuses.
You should see the following INFO message when the MapOutputTrackerMaster is created (FIXME it uses MapOutputTrackerMasterEndpoint):
registerShuffle Method
|
Caution
|
FIXME |
getStatistics Method
|
Caution
|
FIXME |
unregisterMapOutput Method
|
Caution
|
FIXME |
registerMapOutputs Method
|
Caution
|
FIXME |
incrementEpoch Method
|
Caution
|
FIXME |
cleanup Function for MetadataCleaner
cleanup(cleanupTime: Long) method removes old entries in mapStatuses and cachedSerializedStatuses that have timestamp earlier than cleanupTime.
It uses org.apache.spark.util.TimeStampedHashMap.clearOldValues method.
|
Tip
|
Enable Add the following line to
|
You should see the following DEBUG message in the logs for entries being removed:
DEBUG Removing key [entry.getKey]
getEpoch Method
|
Caution
|
FIXME |
Settings
| Spark Property | Default Value | Description |
|---|---|---|
|
Controls whether to compute locality preferences for reduce tasks. When enabled (i.e. |