Wilson D, et al. "Learning Robust and Readable Control Laws for Aircraft Taxi Control." Under
review.
Autonomous Vehicles
Atakishiyev, Shahin, et al. "Explainable artificial intelligence for autonomous driving: A
comprehensive overview and field guide for future research directions." arXiv preprint
arXiv:2112.11561 (2021).
Mao, Hongzi, et al. "Resource management with deep reinforcement learning." Proceedings of the
15th ACM workshop on hot topics in networks. 2016.
Explaining Neural Networks
Kim, Jinkyu, and John Canny. "Interpretable learning for self-driving cars by visualizing causal
attention." Proceedings of the IEEE international conference on computer vision. 2017.
The problem with explaining
Attention maps and feature importance analysis explain what information is being used, not
how.
Rudin, Cynthia. "Stop explaining black box machine learning models for high stakes decisions and
use interpretable models instead." Nature machine intelligence 1.5 (2019): 206-215.
The many forms of explanation
Evolution can optimize control policies which are interpretable by design.
Zhou, Ryan, and Ting Hu. "Evolutionary approaches to explainable machine learning." arXiv
preprint arXiv:2306.14786 (2023).
How to get interpretable policies
Start with a good neural network policy and imitate it.
Predict expert action as a classification problem.
Delgado, Juan, et al. "Robotics in construction: A critical review
of the reinforcement learning and imitation learning paradigms." Advanced Engineering
Informatics 54 (2022): 101787.
Imitation with Decision Trees
Pong inputs extracted from raw images:
ball position $(x, y)$ and velocity $(v_x , v_y)$
player paddle position $y_p$ , velocity $v_p$, acceleration $a_p$, and jerk $j_p$.
Bastani, Osbert, Yewen Pu, and Armando Solar-Lezama. "Verifiable reinforcement learning via
policy extraction." Advances in neural information processing systems 31 (2018).
Imitation with Decision Trees
Application to 10x10 visual area around Mario.
Coppens, Youri, et al. "Distilling deep reinforcement learning policies in soft decision trees."
Proceedings of the IJCAI 2019 workshop on explainable artificial intelligence. 2019.
Imitation with Functional Trees
Model-based and model-free symbolic regression (genetic programming)
Hein, Daniel, et al. "Interpretable policies for reinforcement
learning by genetic programming." Engineering Applications of Artificial Intelligence 76 (2018):
158-169.
Imitation with Programs
Example policy:
Similar to grammatical evolution with mutation function "neighborhood_pool"
Verma, Abhinav, et al. "Programmatically interpretable reinforcement learning." International
Conference on Machine Learning. PMLR, 2018.
Generating Policy Programs
Generate programs with a recurrent network guided by an anchor policy.
Landajuela, Mikel, et al. "Discovering symbolic policies with deep reinforcement learning."
International Conference on Machine Learning. PMLR, 2021.
Generation > Imitation
Imitation policies may not align with expert policies over full trajectories.
Generated policies, boosted by an expert policy, outperform imitation.
Landajuela, Mikel, et al. "Discovering symbolic policies with deep reinforcement learning."
International Conference on Machine Learning. PMLR, 2021.
Direct search for interpretable policies
Get reward by looking for reward.
You don't have to use neural networks.
K.I.S.S.
Functional trees as policies
Tree-based genetic programming to maximize cumulative reward on Cart Pole.
Miranda, Ícaro, et al. "A Comparison Study Between Deep Learning and Genetic Programming
Application in Cart Pole Balancing Problem." 2018 IEEE Congress on Evolutionary Computation
(CEC). IEEE, 2018.
Interpretable Policies competitive with Deep RL
Simple policies with cumulative reward similar to Deep RL on control tasks in gym and pybullet.
Videau, Mathurin, et al. "Multi-objective genetic programming for explainable reinforcement
learning." European Conference on Genetic Programming. 2022.
Videau, Mathurin, et al. "Multi-objective genetic programming for explainable reinforcement
learning." European Conference on Genetic Programming. 2022.
Linear GP
Register-based genetic programming resulting in functional graphs.
Very simple to implement and uses standard GA.
Kantschik, Wolfgang, and Wolfgang Banzhaf. "Linear-graph GP-a new GP structure." European
Conference on Genetic Programming. Berlin, Heidelberg: Springer Berlin Heidelberg, 2002.
Brameier, Markus, Wolfgang Banzhaf, and Wolfgang Banzhaf. Linear genetic programming. Vol. 1.
New York: Springer, 2007.
Programs as Policies
Program synthesis using continuous optimization over a latent program space.
Trivedi, Dweep, et al. "Learning to synthesize programs as interpretable and generalizable
policies." Advances in neural information processing systems 34 (2021): 25146-25163.
Weight Agnostic Neural Networks
NEAT network optimization with single weight parameter shared across full network. Parameter-free functional graph.
Gaier, Adam, and David Ha. "Weight agnostic neural networks." Advances in neural information
processing systems 32 (2019).
Cartesian Genetic Programming
A graph-based GP method based on functional node indexing with Cartesian coordinates.
Miller, Julian Francis. "Cartesian genetic programming: its status and future." Genetic
Programming and Evolvable Machines 21 (2020): 129-168.
Floating Point Cartesian Genetic Programming
Modern CGP uses a linear representation and a single index per node.
Wilson, Dennis G., et al. "Positional cartesian genetic programming." arXiv preprint
arXiv:1810.04119 (2018).
Graph-based policies
Reward of 7000, similar to PPO and SAC.
Graph-based policies
Reward of 20000 - 40000 (but no termination criteria)
Critical applications: Airplane taxi
Wilson D, et al. "Learning Robust and Readable Control Laws for Aircraft Taxi Control." Under
review.
Critical applications: Airplane taxi
Policy:
Wilson D, et al. "Learning Robust and Readable Control Laws for Aircraft Taxi Control." Under
review.
Arcade Learning Environment
Bellemare, Marc G., et al. "The arcade learning environment: An evaluation platform for general
agents." Journal of Artificial Intelligence Research 47 (2013): 253-279.
Standard benchmark for visual control
Used in:
HyperNEAT [Hausknecht et al., 2012]
DQN [Mnih et al., 2015]
A3C [Mnih et al., 2016]
Tangled Problem Graphs [Kelly and Heywood, 2017]
CGP [Wilson et al., 2018]
Go-explore [Ecoffet et al., 2019]
Flare [Shang et al., 2021]
Control with convolutional neural network policy
Policy network maps pixel space representation to small action distribution
Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature
518.7540 (2015): 529-533.
Explaining Atari agents
Attention maps can show what information a network is using.
Observation and expertise can, in some cases, demonstrate how that information informs the
decision.
Greydanus, Samuel, et al. "Visualizing and understanding atari agents." International conference
on machine learning. PMLR, 2018.
CGP for Atari playing
Wilson, Dennis G., et al. "Evolving simple programs for playing Atari games." Proceedings of the
genetic and evolutionary computation conference. 2018.
Centipede
Human
Double
DQN
Prioritized
A3C:FF
A3C:LSTM
TPG
HyperNEAT
CGP
11963
3853.5
4881
3421.9
3755.8
1997
34731.7
25275.2
24708
Wilson, Dennis G., et al. "Evolving simple programs for playing Atari games." Proceedings of the
genetic and evolutionary computation conference. 2018.
Kung Fu Master
Human
Double
DQN
Prioritized
A3C:FF
A3C:LSTM
TPG
HyperNEAT
CGP
22736
30207
24288
31244
28819
40835
7720
57400
Wilson, Dennis G., et al. "Evolving simple programs for playing Atari games." Proceedings of the
genetic and evolutionary computation conference. 2018.
Boxing
Human
Double
DQN
Prioritized
A3C:FF
A3C:LSTM
TPG
HyperNEAT
CGP
4.3
73.5
77.3
68.6
59.8
37.3
16.4
38.4
Wilson, Dennis G., et al. "Evolving simple programs for playing Atari games." Proceedings of the
genetic and evolutionary computation conference. 2018.
Interpretability in high-dimensional data
Break the data down into lower dimension
Use explanability methods like attention maps
Construct an interpretable analysis pipeline using adapted functions
Split the problem into different interpretable parts
Atakishiyev, Shahin, et al. "Explainable artificial intelligence for autonomous driving: A
comprehensive overview and field guide for future research directions." arXiv preprint
arXiv:2112.11561 (2021).
TPG: Decimal Feature Grid
Break the data down into lower dimension: small grid with discretized color palette
Kelly, Stephen, and Malcolm I. Heywood. "Emergent tangled graph representations for Atari game
playing agents." Genetic Programming: 20th European Conference, EuroGP 2017, Amsterdam,
The Netherlands, April 19-21, 2017, Proceedings 20. Springer International Publishing, 2017.
TPG: Multi-Task Learning
Interpretable graphs inherently allow for understanding mutual information between control
policies.
Kelly, Stephen, and Malcolm I. Heywood. "Multi-task learning in atari video games with emergent
tangled program graphs." Proceedings of the Genetic and Evolutionary Computation Conference.
2017.
Interpretable CGP Encoder Controller
Construct an interpretable analysis pipeline using adapted functions.
Split the problem into different interpretable parts.
Lecarpentier, Erwan, et al. "Cartesian Genetic Programming for Learning Interpretable Agents
Playing Atari Games." Under review.
Interpretable CGP Encoder Controller
Bowling:
hit action button when character is in 5,2
hit down direction when ball is launched
hit up direction when ball is in 4,3 or 4,5.
Lecarpentier, Erwan, et al. "Cartesian Genetic Programming for Learning Interpretable Agents
Playing Atari Games." Under review.
Takeaways
Explainable policies are necessary for critical applications
Interpretable policy representations give naturally to explanations
Evolutionary policy search is well-suited to interpretable representations
Many representations exist: trees, graphs, programs
Not a very active domain and requires more comparison between methods
High-dimensional data like visual representations still difficult