Abstract—Some data mining problems require predictive
models to be not only accurate but also comprehensible.
Comprehensibility enables human inspection and
understanding of the model, making it possible to trace why
individual predictions are made. Since most high-accuracy
techniques produce opaque models, accuracy is, in practice,
regularly sacrificed for comprehensibility. One frequently
studied technique, often able to reduce this accuracy vs.
comprehensibility tradeoff, is rule extraction, i.e., the activity
where another, transparent, model is generated from the
opaque. In this paper, it is argued that techniques producing
transparent models, either directly from the dataset, or from
an opaque model, could benefit from using an oracle guide. In
the experiments, genetic programming is used to evolve
decision trees, and a neural network ensemble is used as the
oracle guide. More specifically, the datasets used by the genetic
programming when evolving the decision trees, consist of
several different combinations of the original training data and
“oracle data”, i.e., training or test data instances, together with
corresponding predictions from the oracle. In total, seven
different ways of combining regular training data with oracle
data were evaluated, and the results, obtained on 26 UCI
datasets, clearly show that the use of an oracle guide improved
the performance. As a matter of fact, trees evolved using
training data only had the worst test set accuracy of all setups
evaluated. Furthermore, statistical tests show that two setups,
both using the oracle guide, produced significantly more
accurate trees, compared to the setup using training data only.
IEEE , 2009.
oracle guides, rule extraction, genetic programming, Machine learning