Misha Denil - Programmable Agents (2017)
History /
Edit /
PDF /
EPUB /
BIB /
Created: August 26, 2017 / Updated: November 2, 2024 / Status: finished / 2 min read (~296 words)
Created: August 26, 2017 / Updated: November 2, 2024 / Status: finished / 2 min read (~296 words)
- Define a RL-based learner where the reward function depends on the target program to generate
- Their toy environment has the constraint that every object set of properties is unique (no two red cubes)
- Programmable Networks impose a sophisticated bottleneck on the agent's representations whose structure ensures that their representation will generalize
- Our general framework is as follows: A goal is specified as a state of the world that satisfies a relation between two objects. Objects are associated with sets of properties (e.g., their color and shape). The vocabulary of properties gives rise to a system of base sets which are the sets of objects that share each named property (e.g., RED is the set of red objects, etc.). The full universe of discourse is then the Boolean algebra generated by these base sets
- We require two things for each program
- The verifier has access to the true state of the environment, and can inspect this state to determine if it satisfies the program (reward function)
- We also need a search procedure which inspects the program as well as some summary of the environment state and decides how to modify the environment to bring the program closer to satisfaction (agent)
- We create one detector for each property in our vocabulary. Each detector is a small neural network that maps columns $\omega_j$ of $\Omega$ to a value in [0, 1]
- Detectors are applied independently to each column of the matrix $\Omega$, and each detector populates a single row of $\Phi$
- Groups of detectors corresponding to sets of mutually exclusive properties (e.g., blocks can only have one color) have their outputs coupled by a softmax