Information
Citation
T. Helmuth and L. Spector. Word Count as a Traditional Programming Benchmark Problem for Genetic Programming. In GECCO '14: Proceedings of the 16th annual conference on Genetic and evolutionary computation, pp. 919-926. July 2014. ACM.
Abstract
The Unix utility program wc, which stands for "word count," takes any number of files and prints the number of newlines, words, and characters in each of the files. We show that genetic programming can find programs that replicate the core functionality of the wc utility, and propose this problem as a "traditional programming" benchmark for genetic programming systems. This "wc problem" features key elements of programming tasks that often confront human programmers, including requirements for multiple data types, a large instruction set, control flow, and multiple outputs. Furthermore, it mimics the behavior of a real-world utility program, showing that genetic programming can automatically synthesize programs with general utility. We suggest statistical procedures that should be used to compare performances of different systems on traditional programming problems such as the wc problem, and present the results of a short experiment using the problem. Finally, we give a short analysis of evolved solution programs, showing how they make use of traditional programming concepts.
Full paper
Supplementary materials
Slides
PDF of slides presented at GECCO
Source code
An archive of the source code for Clojush, the Clojure version of Push used to run our experiments, can be found here. The most recent version of Clojush can be found in its GitHub Repository.
Evolved solutions to wc problem
The 11 solution programs we evolved in our experiments can be found here. Note that these programs have been automatically simplified to remove instructions that have no effect on the outputs of their programs.