FileSourceStrategy

FileSourceStrategy is a Strategy that uses a PhysicalOperation to destructure and then optimize a LogicalPlan.

Tip

Enable INFO logging level for org.apache.spark.sql.execution.datasources.FileSourceStrategy logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.execution.datasources.FileSourceStrategy=INFO

Refer to Logging.

Caution
FIXME

PhysicalOperation

PhysicalOperation is used to destructure a LogicalPlan into a tuple of (Seq[NamedExpression], Seq[Expression], LogicalPlan).

The following idiom is often used in Strategy implementations (e.g. HiveTableScans, InMemoryScans, DataSourceStrategy, FileSourceStrategy):

def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
  case PhysicalOperation(projections, predicates, plan) =>
    // do something
  case _ => Nil
}

Whenever used to pattern match to a LogicalPlan, PhysicalOperation's unapply is called.

unapply(plan: LogicalPlan): Option[ReturnType]

unapply uses collectProjectsAndFilters method that recursively destructures the input LogicalPlan.

Note
unapply is almost collectProjectsAndFilters method itself (with some manipulations of the return value).

collectProjectsAndFilters Method

collectProjectsAndFilters(plan: LogicalPlan):
  (Option[Seq[NamedExpression]], Seq[Expression], LogicalPlan, Map[Attribute, Expression])

collectProjectsAndFilters is a pattern used to destructure a LogicalPlan that can be Project, Filter or BroadcastHint. Any other LogicalPlan give an all-empty response.

results matching ""

    No results matching ""