TaskSet [stageId].[stageAttemptId]
TaskSets
Introduction
A TaskSet is a collection of tasks that belong to a single stage and a stage attempt. It has also priority and properties attributes. Priority is used in FIFO scheduling mode (see Priority Field and FIFO Scheduling) while properties are the properties of the first job in the stage.
Caution
|
FIXME Where are properties of a TaskSet used?
|
A TaskSet represents the missing partitions of a stage.
The pair of a stage and a stage attempt uniquely describes a TaskSet and that is what you can see in the logs when a TaskSet is used:
A TaskSet contains a fully-independent sequence of tasks that can run right away based on the data that is already on the cluster, e.g. map output files from previous stages, though it may fail if this data becomes unavailable.
TaskSet can be submitted (consult TaskScheduler Contract).
removeRunningTask
Caution
|
FIXME Review TaskSet.removeRunningTask(tid)
|
Where TaskSets are used
-
DAGScheduler.submitMissingTasks
-
TaskSchedulerImpl.submitTasks
-
-
TaskSchedulerImpl.createTaskSetManager
Priority Field and FIFO Scheduling
A TaskSet has priority
field that turns into the priority field’s value of TaskSetManager (which is a Schedulable).
The priority
field is used in FIFOSchedulingAlgorithm in which equal priorities give stages an advantage (not to say priority).
Note
|
FIFOSchedulingAlgorithm is only used for FIFO scheduling mode in a Pool (i.e. a schedulable collection of Schedulable objects).
|
Effectively, the priority
field is the job’s id of the first job this stage was part of (for FIFO scheduling).