Enum SubtaskStateMapper

  • All Implemented Interfaces:
    Serializable, Comparable<SubtaskStateMapper>

    @Internal
    public enum SubtaskStateMapper
    extends Enum<SubtaskStateMapper>
    The SubtaskStateMapper narrows down the subtasks that need to be read during rescaling to recover from a particular subtask when in-flight data has been stored in the checkpoint.

    Mappings of old subtasks to new subtasks may be unique or non-unique. A unique assignment means that a particular old subtask is only assigned to exactly one new subtask. Non-unique assignments require filtering downstream. That means that the receiver side has to cross-verify for a deserialized record if it truly belongs to the new subtask or not. Most SubtaskStateMapper will only produce unique assignments and are thus optimal. Some rescaler, such as RANGE, create a mixture of unique and non-unique mappings, where downstream tasks need to filter on some mapped subtasks.

    • Enum Constant Detail

      • ARBITRARY

        public static final SubtaskStateMapper ARBITRARY
        Extra state is redistributed to other subtasks without any specific guarantee (only that up- and downstream are matched).
      • FIRST

        public static final SubtaskStateMapper FIRST
        Restores extra subtasks to the first subtask.
      • FULL

        public static final SubtaskStateMapper FULL
        Replicates the state to all subtasks. This rescaling causes a huge overhead and completely relies on filtering the data downstream.

        This strategy should only be used as a fallback.

      • RANGE

        public static final SubtaskStateMapper RANGE
        Remaps old ranges to new ranges. For minor rescaling that means that new subtasks are mostly assigned 2 old subtasks.

        Example:
        old assignment: 0 -> [0;43); 1 -> [43;87); 2 -> [87;128)
        new assignment: 0 -> [0;64]; 1 -> [64;128)
        subtask 0 recovers data from old subtask 0 + 1 and subtask 1 recovers data from old subtask 1 + 2

        For all downscale from n to [n-1 .. n/2], each new subtasks get exactly two old subtasks assigned.

        For all upscale from n to [n+1 .. 2*n-1], most subtasks get two old subtasks assigned, except the two outermost.

        Larger scale factors (<n/2, >2*n), will increase the number of old subtasks accordingly. However, they will also create more unique assignment, where an old subtask is exclusively assigned to a new subtask. Thus, the number of non-unique mappings is upper bound by 2*n.

      • ROUND_ROBIN

        public static final SubtaskStateMapper ROUND_ROBIN
        Redistributes subtask state in a round robin fashion. Returns a mapping of newIndex -> oldIndexes. The mapping is accessed by using Bitset oldIndexes = mapping.get(newIndex).

        For oldParallelism < newParallelism, that mapping is trivial. For example if oldParallelism = 6 and newParallelism = 10.

        New indexOld indexes
        00
        11
        ...
        55
        6
        ...
        9

        For oldParallelism > newParallelism, new indexes get multiple assignments by wrapping around assignments in a round-robin fashion. For example if oldParallelism = 10 and newParallelism = 4.

        New indexOld indexes
        00, 4, 8
        11, 5, 9
        22, 6
        33, 7
    • Method Detail

      • values

        public static SubtaskStateMapper[] values()
        Returns an array containing the constants of this enum type, in the order they are declared. This method may be used to iterate over the constants as follows:
        for (SubtaskStateMapper c : SubtaskStateMapper.values())
            System.out.println(c);
        
        Returns:
        an array containing the constants of this enum type, in the order they are declared
      • valueOf

        public static SubtaskStateMapper valueOf​(String name)
        Returns the enum constant of this type with the specified name. The string must match exactly an identifier used to declare an enum constant in this type. (Extraneous whitespace characters are not permitted.)
        Parameters:
        name - the name of the enum constant to be returned.
        Returns:
        the enum constant with the specified name
        Throws:
        IllegalArgumentException - if this enum type has no constant with the specified name
        NullPointerException - if the argument is null
      • getOldSubtasks

        public abstract int[] getOldSubtasks​(int newSubtaskIndex,
                                             int oldNumberOfSubtasks,
                                             int newNumberOfSubtasks)
        Returns all old subtask indexes that need to be read to restore all buffers for the given new subtask index on rescale.
      • getNewToOldSubtasksMapping

        public RescaleMappings getNewToOldSubtasksMapping​(int oldParallelism,
                                                          int newParallelism)
        Returns a mapping new subtask index to all old subtask indexes.
      • isAmbiguous

        public boolean isAmbiguous()
        Returns true iff this mapper can potentially lead to ambiguous mappings where the different new subtasks map to the same old subtask. The assumption is that such replicated data needs to be filtered.