Figure 1:Schematic of the overall framework. Given a task (e.g., an analogy to solve), inputs (denoted as {A, B, C, D}) are represented by grid codes, consisting of units (“grid cells”) representing different combinations of frequencies and phases. Grid embeddings (xA, xB, xC, xD) are multiplied elementwise by a set of learned attention weights w, then passed to a inference module R. The attention weights w are optimized using LDPP, which encourages attention to grid embeddings that maximize the volume of the representational space. The inference module outputs a score for each candidate analogy (consisting of A, B, C and a candidate answer choice D). The scores for all answer choices are passed through a softmax to generate an answer ŷ, which is compared against the target y to generate the task loss Ltask.Figure 2:Generation of test analogies from training analogies (region marked in blue) by: a) translating both dimension values of A, B, C, D by the same amount; and b) scaling both dimension values of A, B, C, D by the same amount. Since both dimension values are transformed by the same amount, each input gets transformed along the diagonal.For a positive semidefinite matrix V and w ∈ [0, 1]N: Algorithm 1Training with DPP-AFigure 3:Results on analogy on each region for translation and scaling using LSTM in the inference module.Figure 4:Results on analogy on each region for translation and scaling using transformer in the inference module.Figure 5:Results on arithmetic on each region using LSTM in the inference module.Figure 6:Results on arithmetic on each region using transformer in the inference module.Figure 7:Results on analogy on each region using DPP-A, an LSTM in the inference module, and different embeddings (grid codes, one-hots, and smoothed one-hots passed through a learned encoder) for translation (left) and scaling (right). Each point is mean accuracy over three networks, and bars show standard error of the mean.Figure 8:Results on analogy on each region using different embeddings (grid codes, and one-hots or smoothed one-hots with and without an encoder) and an LSTM in the inference module, but without DPP-A, TCN, L1 Regularization, or Dropout for translation (left) and scaling (right).Figure 9:Results on analogy on each region using LSTM in the inference module for choosing top K frequencies with in Algorithm 1. Results show mean accuracy on each region averaged over 3 trained networks along with errorbar (standard error of the mean).Figure 10:Results on analogy on each region for translation and scaling using transformer in the inference module.Figure 11:Results on arithmetic with different embeddings (with DPP-A) using LSTM in the inference module. Results show mean accuracy on each region averaged over 3 trained networks along with errorbar (standard error of the mean).Figure 12:Results on arithmetic with different embeddings (without DPP-A, TCN, L1 Regularization, or Dropout) using LSTM in the inference module. Results show mean accuracy on each region averaged over 3 trained networks along with errorbar (standard error of the mean).Figure 13:Results on arithmetic for increasing number of grid cell frequencies Nf on each region using LSTM in the inference module. Results show mean accuracy on each region averaged over 3 trained networks along with errorbar (standard error of the mean).Figure 14:Results for regression on analogy using LSTM in the inference module. Results show mean squared error on each region averaged over 3 trained networks along with errorbar (standard error of the mean).Figure 15:Results for regression on arithmetic on each region using LSTM in the inference module. Results show mean squared error on each region averaged over 3 trained networks along with errorbar (standard error of the mean).Figure 16:Results on analogy for L1 regularization for various λs for translation and scaling using LSTM in the inference module. Results show mean accuracy on each region averaged over 3 trained networks along with errorbar (standard error of the mean).Figure 17:Results on arithmetic for L1 regularization for various λs using LSTM in the inference module. Results show mean accuracy on each region averaged over 3 trained networks along with errorbar (standard error of the mean).Figure 18:Results on analogy for one step DPP-A over the complete grid codes for various λs for translation and scaling using LSTM in the inference module. Results show mean accuracy on each region averaged over 3 trained networks along with errorbar (standard error of the mean).Figure 19:Results on analogy for one step DPP-A within frequencies for various λs for translation and scaling using LSTM in the inference module. Results show mean accuracy on each region averaged over 3 trained networks along with errorbar (standard error of the mean).