Evolutionary Methods

for Interpretable Control

Dennis G. Wilson

EvoRL Workshop

GECCO 2023

Policy search in critical applications

Wilson D, et al. "Learning Robust and Readable Control Laws for Aircraft Taxi Control." Under review.

Autonomous Vehicles

Atakishiyev, Shahin, et al. "Explainable artificial intelligence for autonomous driving: A comprehensive overview and field guide for future research directions." arXiv preprint arXiv:2112.11561 (2021).

How to train your policy

Representation:
Neural network

Optimization:

Value function approximation
Policy gradient
Direct policy search

Objective:
\[\begin{aligned} f(\pi_\theta) = \sum_t^T \gamma^t R(s_t, \pi_\theta(s_t)) \end{aligned} \]

Mao, Hongzi, et al. "Resource management with deep reinforcement learning." Proceedings of the 15th ACM workshop on hot topics in networks. 2016.

Explaining Neural Networks

Kim, Jinkyu, and John Canny. "Interpretable learning for self-driving cars by visualizing causal attention." Proceedings of the IEEE international conference on computer vision. 2017.

The problem with explaining

Attention maps and feature importance analysis explain what information is being used, not how.

Rudin, Cynthia. "Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead." Nature machine intelligence 1.5 (2019): 206-215.

The many forms of explanation

Evolution can optimize control policies which are interpretable by design.

Zhou, Ryan, and Ting Hu. "Evolutionary approaches to explainable machine learning." arXiv preprint arXiv:2306.14786 (2023).

How to get interpretable policies

Start with a good neural network policy and imitate it.
Predict expert action as a classification problem.

Delgado, Juan, et al. "Robotics in construction: A critical review of the reinforcement learning and imitation learning paradigms." Advanced Engineering Informatics 54 (2022): 101787.

Imitation with Decision Trees

Pong inputs extracted from raw images:
ball position $(x, y)$ and velocity $(v_x , v_y)$
player paddle position $y_p$ , velocity $v_p$, acceleration $a_p$, and jerk $j_p$.

Bastani, Osbert, Yewen Pu, and Armando Solar-Lezama. "Verifiable reinforcement learning via policy extraction." Advances in neural information processing systems 31 (2018).

Imitation with Decision Trees

Application to 10x10 visual area around Mario.

Coppens, Youri, et al. "Distilling deep reinforcement learning policies in soft decision trees." Proceedings of the IJCAI 2019 workshop on explainable artificial intelligence. 2019.

Imitation with Functional Trees

Model-based and model-free symbolic regression (genetic programming)

Hein, Daniel, et al. "Interpretable policies for reinforcement learning by genetic programming." Engineering Applications of Artificial Intelligence 76 (2018): 158-169.

Imitation with Programs

Example policy:

Similar to grammatical evolution with mutation function "neighborhood_pool"

Verma, Abhinav, et al. "Programmatically interpretable reinforcement learning." International Conference on Machine Learning. PMLR, 2018.

Generating Policy Programs

Generate programs with a recurrent network guided by an anchor policy.

Landajuela, Mikel, et al. "Discovering symbolic policies with deep reinforcement learning." International Conference on Machine Learning. PMLR, 2021.

Generation > Imitation

Imitation policies may not align with expert policies over full trajectories.
Generated policies, boosted by an expert policy, outperform imitation.

Landajuela, Mikel, et al. "Discovering symbolic policies with deep reinforcement learning." International Conference on Machine Learning. PMLR, 2021.

Direct search
for interpretable policies

Get reward by looking for reward.

You don't have to use neural networks.

K.I.S.S.

Functional trees as policies

Tree-based genetic programming to maximize cumulative reward on Cart Pole.

Miranda, Ícaro, et al. "A Comparison Study Between Deep Learning and Genetic Programming Application in Cart Pole Balancing Problem." 2018 IEEE Congress on Evolutionary Computation (CEC). IEEE, 2018.

Interpretable Policies competitive with Deep RL

Simple policies with cumulative reward similar to Deep RL on control tasks in gym and pybullet.

Videau, Mathurin, et al. "Multi-objective genetic programming for explainable reinforcement learning." European Conference on Genetic Programming. 2022.

GP for eXplainable RL


                    def swingup(s):
                        return [s[4] + s[3]*6.614633680991087 - s[2] + np.exp(if_then_else(s[3]>-0.7571072906634332, s[1], \
                                15.013603569678889*s[1]))]

                    def hopper(s):
                        return [np.sin(np.exp(s[8])), -6.257060739725605*(s[7] + np.sin(s[3]+s[7])), \
                                np.sin(np.sin(s[7])-np.sin(s[8])-s[10]*(s[1]-np.log(abs(s[8]*s[3])+0.0001) - 5.860219777510614))]

Videau, Mathurin, et al. "Multi-objective genetic programming for explainable reinforcement learning." European Conference on Genetic Programming. 2022.

Linear GP

Register-based genetic programming resulting in functional graphs.
Very simple to implement and uses standard GA.

Kantschik, Wolfgang, and Wolfgang Banzhaf. "Linear-graph GP-a new GP structure." European Conference on Genetic Programming. Berlin, Heidelberg: Springer Berlin Heidelberg, 2002.
Brameier, Markus, Wolfgang Banzhaf, and Wolfgang Banzhaf. Linear genetic programming. Vol. 1. New York: Springer, 2007.

Programs as Policies

Program synthesis using continuous optimization over a latent program space.

Trivedi, Dweep, et al. "Learning to synthesize programs as interpretable and generalizable policies." Advances in neural information processing systems 34 (2021): 25146-25163.

Weight Agnostic Neural Networks

NEAT network optimization with single weight parameter shared across full network.
Parameter-free functional graph.

Gaier, Adam, and David Ha. "Weight agnostic neural networks." Advances in neural information processing systems 32 (2019).

Cartesian Genetic Programming

A graph-based GP method based on functional node indexing with Cartesian coordinates.

Miller, Julian Francis. "Cartesian genetic programming: its status and future." Genetic Programming and Evolvable Machines 21 (2020): 129-168.

Floating Point Cartesian Genetic Programming

Modern CGP uses a linear representation and a single index per node.

Wilson, Dennis G., et al. "Positional cartesian genetic programming." arXiv preprint arXiv:1810.04119 (2018).

Graph-based policies

Reward of 7000, similar to PPO and SAC.

Graph-based policies

Reward of 20000 - 40000 (but no termination criteria)

Critical applications: Airplane taxi

Wilson D, et al. "Learning Robust and Readable Control Laws for Aircraft Taxi Control." Under review.

Critical applications: Airplane taxi

Policy:

Wilson D, et al. "Learning Robust and Readable Control Laws for Aircraft Taxi Control." Under review.

Arcade Learning Environment

Bellemare, Marc G., et al. "The arcade learning environment: An evaluation platform for general agents."
Journal of Artificial Intelligence Research 47 (2013): 253-279.

Standard benchmark for visual control

Used in:

HyperNEAT [Hausknecht et al., 2012]
DQN [Mnih et al., 2015]
A3C [Mnih et al., 2016]
Tangled Problem Graphs [Kelly and Heywood, 2017]
CGP [Wilson et al., 2018]
Go-explore [Ecoffet et al., 2019]
Flare [Shang et al., 2021]

Control with convolutional neural network policy

Policy network maps pixel space representation to small action distribution

Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529-533.

Explaining Atari agents

Attention maps can show what information a network is using.
Observation and expertise can, in some cases, demonstrate how that information informs the decision.

Greydanus, Samuel, et al. "Visualizing and understanding atari agents." International conference on machine learning. PMLR, 2018.

CGP for Atari playing

Wilson, Dennis G., et al. "Evolving simple programs for playing Atari games." Proceedings of the genetic and evolutionary computation conference. 2018.

Centipede

Human	Double	DQN	Prioritized	A3C:FF	A3C:LSTM	TPG	HyperNEAT	CGP
11963	3853.5	4881	3421.9	3755.8	1997	34731.7	25275.2	24708

Wilson, Dennis G., et al. "Evolving simple programs for playing Atari games." Proceedings of the genetic and evolutionary computation conference. 2018.

Kung Fu Master

Human	Double	DQN	Prioritized	A3C:FF	A3C:LSTM	TPG	HyperNEAT	CGP
22736	30207	24288	31244	28819	40835		7720	57400

Wilson, Dennis G., et al. "Evolving simple programs for playing Atari games." Proceedings of the genetic and evolutionary computation conference. 2018.

Boxing

Human	Double	DQN	Prioritized	A3C:FF	A3C:LSTM	TPG	HyperNEAT	CGP
4.3	73.5	77.3	68.6	59.8	37.3		16.4	38.4

Wilson, Dennis G., et al. "Evolving simple programs for playing Atari games." Proceedings of the genetic and evolutionary computation conference. 2018.

Interpretability in high-dimensional data

Break the data down into lower dimension
Use explanability methods like attention maps
Construct an interpretable analysis pipeline using adapted functions
Split the problem into different interpretable parts

TPG: Decimal Feature Grid

Break the data down into lower dimension: small grid with discretized color palette

Kelly, Stephen, and Malcolm I. Heywood. "Emergent tangled graph representations for Atari game playing agents."
Genetic Programming: 20th European Conference, EuroGP 2017, Amsterdam, The Netherlands, April 19-21, 2017, Proceedings 20. Springer International Publishing, 2017.

TPG: Multi-Task Learning

Interpretable graphs inherently allow for understanding mutual information between control policies.

Kelly, Stephen, and Malcolm I. Heywood. "Multi-task learning in atari video games with emergent tangled program graphs." Proceedings of the Genetic and Evolutionary Computation Conference. 2017.

Interpretable CGP Encoder Controller

Construct an interpretable analysis pipeline using adapted functions.
Split the problem into different interpretable parts.

Lecarpentier, Erwan, et al. "Cartesian Genetic Programming for Learning Interpretable Agents Playing Atari Games." Under review.

Interpretable CGP Encoder Controller

Bowling:
hit action button when character is in 5,2
hit down direction when ball is launched
hit up direction when ball is in 4,3 or 4,5.

Lecarpentier, Erwan, et al. "Cartesian Genetic Programming for Learning Interpretable Agents Playing Atari Games." Under review.

Takeaways

Explainable policies are necessary for critical applications
Interpretable policy representations give naturally to explanations
Evolutionary policy search is well-suited to interpretable representations
Many representations exist: trees, graphs, programs
Not a very active domain and requires more comparison between methods
High-dimensional data like visual representations still difficult

Thank you!

Evolutionary Methods

for Interpretable Control

Dennis G. Wilson

EvoRL Workshop

GECCO 2023

Policy search in critical applications

Autonomous Vehicles

How to train your policy

Explaining Neural Networks

The problem with explaining

The many forms of explanation

How to get interpretable policies

Imitation with Decision Trees

Imitation with Decision Trees

Imitation with Functional Trees

Imitation with Programs

Generating Policy Programs

Generation > Imitation

Direct searchfor interpretable policies

Get reward by looking for reward. You don't have to use neural networks. K.I.S.S.

Functional trees as policies

Interpretable Policies competitive with Deep RL

GP for eXplainable RL

Linear GP

Programs as Policies

Weight Agnostic Neural Networks

Cartesian Genetic Programming

Floating Point Cartesian Genetic Programming

Graph-based policies

Graph-based policies

Critical applications: Airplane taxi

Critical applications: Airplane taxi

Arcade Learning Environment

Control with convolutional neural network policy

Explaining Atari agents

CGP for Atari playing

Centipede

Kung Fu Master

Boxing

Interpretability in high-dimensional data

TPG: Decimal Feature Grid

TPG: Multi-Task Learning

Interpretable CGP Encoder Controller

Interpretable CGP Encoder Controller

Takeaways

Direct search
for interpretable policies

Get reward by looking for reward.

You don't have to use neural networks.

K.I.S.S.