Schematic of the overall framework. Given a task (e.g., an analogy to solve), inputs (denoted as {A, B, C, D}) are represented by the grid cell code, consisting of units (“grid cells”) representing different combinations of frequencies and phases.
Grid cell embeddings (xA, xB, xC, xD) are multiplied elementwise (represented as a Hadamard product ⊙) by a set of learned attention gates g, then passed to the inference module R. The attention gates g are optimized using 𝓛DPP, which encourages attention to grid cell embeddings that maximize the volume of the representational space. The inference module outputs a score for each candidate analogy (consisting of A, B, C and a candidate answer choice D). The scores for all answer choices are passed through a softmax to generate an answer , which is compared against the target y to generate the task loss 𝓛task.