Adaptive Selection in Evolutionary Algorithm thesis

- Transcript Header:
- Adaptive Selection in Evolutionary Algorithm thesis

- Transcript Body:

- 1. I An Adaptive Selection Scheme for Balancing Exploitation and Exploration By Muhammad Riyad Parvez Student ID: 200605043 Department of Computer Science and Engineering Bangladesh University of Engineering and Technology March 2012

- 2. II DECLARATION I, hereby, declare that the work presented in this thesis is the outcome of the investigation performed by me under the supervision of Dr. Md. Monirul Islam, Associate Professor, Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka. I also declare that no part of this thesis and thereof has been or is being submitted elsewhere for the award of any degree or diploma. Signature (Muhammad Riyad Parvez) Candidate Countersigned (Dr. Md. Monirul Islam) Supervisor

- 3. III Acknowledgements Firstly, I would like to express my profound gratefulness and honor to Dr. Md. Monirul Islam, Associate Professor, Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, for his continuous support, advice and care. His endless patience, scholarly guidance, continual encouragement, constant and energetic supervision, constructive criticism, valuable advice, reading many inferior drafts and correcting them at all stages have made it possible to complete this thesis. I would also like to thank all the faculty members and staff of Department of CSE, BUET, for their support and cooperation. Finally, I would like to express my deep respect and gratitude to my parents and my family. And to Charles Darwin from whom, evolutionary algorithms get their aesthetic beauty.

- 4. IV Abstract Evolutionary algorithms are successfully applied to problems where typical optimization algorithms and local search methods fail. Like any search algorithm, evolutionary algorithms have to face the conflicting goals of local exploitations and global explorations during the search process. The success of evolutionary algorithm lies in its ability to explore and exploit simultaneously. But EA is also error prone to loosing population diversity early generations which results into population trapped into local optima and degrading of its performance. Though numerous schemes are suggested for various stages of EA to prevent premature optimization, maintaining balance between exploration and exploitation isn’t tried at survivor selection stage. Existing selection strategies either focuses on exploiting or exploring. This thesis introduces Adaptive Survivor Selection Strategy (ASSS), a totally new concept that tries to maintain required amount of diversity at survivor selection stage. Along with new selection strategy this thesis paper also presents new criteria for measuring diversity both for individual and whole population. Key concept of this strategy is to measure diversity across the population, calculating needed amount of diversity at that time and try to gain that diversity level selecting survivors using newly introduced diversity measurement technique. ASSS uses both fitness of an individual and how much diverse is the individual regarding to current population to calculate an adaptive survivor selection fitness function. Using that fitness we simply weed out inferior individuals in terms of fitness and diversity. This survivor fitness function is adaptive which gives it control on selection pressure. Performance in maintaining required level of diversity at any time of algorithm is evaluated on a number of benchmark numerical optimization problems and results are compared with several existing selection schemes. Experimental result shows that ASSS shows significant performance gain in managing diversity for any sort of EA.

- 5. V Table of Contents Chapter 1: Introduction……………………………………….……1 1.1 General Introduction……………………………………….……...1 1.2 Objective of the Thesis……………………………………….……2 1.3 Thesis Organization……………………………………………….3 Chapter 2: Background………………………………………….….4 2.1 When EA is Needed……………………………………………….4 2.2 Advantages of EA…………………………………………………4 2.3 Disadvantages of EA……………………………………………...4 2.4 Canonical Structure of EA……………………………………….5 2.5 Representation of Gene…………………………………………...6 2.6 Major Branches of EA…………………………………………....7 2.6.1 Genetic Algorithm………………………………………………….…..7 2.6.2 Evolutionary Programming……………………………………….…..7 2.6.3 Evolutionary Strategy…………………………………………………7 2.6.4 Genetic Programming…………………………………………………8 2.6.5 Memetic Algorithm……………………………………………………9 2.7 Existing Work…………………………………………………….9 2.7.1 Dynamic Parameter Control…………………………………………9 2.7.2 Maintaining Diversity and Multi-population GAs…………………9 2.7.3 Memory Based Genetic Algorithm…………………………………10 2.7.4 Mutation Based Work………………………………………………10 2.7.5 Survivor Selection Based Work……………………………………12

- 6. VI Chapter 3: Proposed Algorithms………………………………..15 3.1 Dual Population Genetic Algorithm………………………….15 3.1.1 Advantages………………………………………………………….16 3.1.2 Disadvantages……………………………………………………….16 3.1.3 Recommendation……………………………………………………17 3.2 Modified DPGA Proposal……………………………………...17 3.2.1 Structure of Individual……………………………………………..17 3.2.2 Initialization………………………………………………………….18 3.2.3 Parent Selection……………………………………………………..18 3.2.4 Generating Parent Individual On The Fly………………………..18 3.2.5 Mutation…………………………………………………………….18 3.2.6 Survivor Selection…………………………………………………..19 3.2.6.1 Exploited Individual……………………………………...19 3.2.6.2 Explored Individual……………………………………....19 3.2.6.3 Normal Individual………………………………………..19 3.2.7 Schedule of T……………………………………………………….20 3.2.8 Advantages…………………………………………………………20 3.3 New Survivor Selection Strategy……………………………..20 3.4 New Mutation Strategy……………………………………….24 3.4.1 Laplace Distribution………………………………………………25 3.4.2 Slash Distribution…………………………………………………27 3.4.3 Students T-Distribution…………………………………………..28 Chapter 4: Experimental Study………………………………...29 4.1 Modified DPGA………………………………………………29 4.1.1 Pitfalls of Modified DPGA………………………………………29

- 7. VII 4.2 Adaptive Survivor Selection Strategy……………………….30 Chapter 5: Conclusion…………………………………………...32 5.2 Future Works……………………………………………….....32 5.2.1 Modified DPGA…………………………………………………...32 5.2.2 Adaptive Survivor Selection………………………………………33 5.2.3 New Distribution Based Mutation………………………………..33 References………………………………………………………...34 Appendix………………………………………………………….35

- 8. VIII List of Figures Table 1.1(a): Change in best fitness (best solution) with number of generations…………….1 Table 1.1(b): Comparison between Random Search, EA and Problem Tailored Search……..2 Table 2.4: Basic skeleton of an Evolutionary Algorithm………………………………….…..6 Table 2.6.4: Individual structure of GP………………………………………………….……8 Table 2.7.4: Probability Distribution Function (PDF) of Gaussian distribution…………..…11 Table 2.7.5: Truncation Selection……………………………………………………………12 Table 3.1(a): Offspring Generation of DPGA…………………………………………….…15 Table 3.1(b): Reserve Population Fitness Function……………………………………….…16 Table 3.4: Probability Density Function of Stable Family………………………….…….….24 Table 3.4.1(a): Probability Density Function of Laplace Distribution…………………...…..26 Table 3.4.1(b): Comparison of Gaussian and Laplace Distribution…………………….…....26 Table 3.4.2: Probability Density Function of Slash Distribution at different parameters........27 Table 3.4.3: Probability Density Function for Student’s t-distribution with different degrees of freedom……………………………………………………………………………28 Table 4.2(a): Change in diversity across generations………………………………………...31 Table 4.2(b): Number of buckets searched…………………………………………………..31

- 9. IX List of Symbols MPGA Multi Population Genetic Algorithm DPGA Dual Population Genetic Algorithm GA Genetic Algorithm StGA Standard Genetic Algorithm EA Evolutionary Algorithm EC Evolutionary Computing EP Evolutionary Programming MA Memetic Algorithm FEP Fast Evolutionary Programming ASSS Adaptive Survivor Selection Strategy

- 10. X

- 11. XI Chapter 1 Introduction 1.1 General Introduction Evolutionary Algorithm (EA) is the study of computational system which use ideas and get inspirations from natural evolution. It’s a generic population based meta-heuristic optimization algorithm. EA falls into category of bio-inspired computing. It uses selection, crossover, mutation mechanisms borrowed from natural evolution. And survival of the fittest principle lies in the heart of EA [1] [2]. Evolution Algorithms are often viewed as function optimizers, although the range of problems to which EAs are applies quite broad. One of the many advantages of EAs is they don’t require very broad domain knowledge. Although domain knowledge can be introduced in EAs. Fitness curve by generations for EA is asymptotic in nature. Fitness improvement in earlier generations of EA is rapid and decreasingly increasing. And after certain generations, improvement in best fitness throughout generations is negligible. That’s when we call population has converged. It’s expected that population will converge to good enough solution. But sometimes population converges to local optima which is not accepted result. This phenomenon is called premature optimization. Figure 1.1(a): Change in best fitness (best solution) with number of generations EAs performs better than random search because search because of its exploitative behavior. It uses random walk, but also tries exploit good solutions. It also outperforms local greedy

- 12. XII search. Local greedy searches are exploitative in nature, often trapped into local maxima. But EA has random walk and maintaining required level of diversity it’s less likely to be trapped into local maxima. Problem tailored searches outperform EA only for the problem in which the search is tailored and uses deep domain knowledge of that problem. Such deep domain knowledge isn’t readily available and incorporating to problem tailored search is difficult. Figure (1.1b): Comparison between Random Search, EA and Problem Tailored Search[4] 1.2 Thesis Objective This thesis mainly focuses into maintaining diversity of single population algorithms. It is frequently observed that populations lose diversity too early and their individuals are trapped into local optima. For lack of diversity trapped individuals can’t escape basin of local minima. This phenomenon is called Premature Convergence. Objective of this thesis paper is to investigate better schemes which can maintain diversity of a population and also give control on diversity. The quest is searching for an adaptive diversity maintaining scheme. Thesis is done in three focused areas: 1. Modifying Dual Population Genetic Algorithm (DPGA) so that it can properly manage diversity. 2. Seeking a survivor selection technique which is adaptive and gives more control on diversity at any time of algorithm. 3. Examining probability distributions other than already used distributions which can give appropriate amount of jumps in any stage of evolution.

- 13. XIII 1.3 Thesis Organization The rest of the thesis is organized as follows. Chapter 2 introduces the fundamentals of evolutionary algorithm, with its operators and processes. The essential terms related to evolutionary algorithm are explained with examples. The strengths, limitations, and applications of evolutionary algorithm are also mentioned. In Chapter 3, we introduce new evolutionary strategies, entitled as Modified DPGA, Adaptive Survivor Selection Strategy, New Mutation Based on Distributions, to balance the exploitative and explorative features of the standard evolutionary algorithm. The different stages, operators and procedures of Modified DPGA, ASSS, and Mutation Based on Distribution are described in details. It is also explained how they differ substantially from other existing works. Chapter 4 evaluates Modified DPGA and ASSS on a number of benchmark numerical optimization problems and makes comparisons with several other existing works. Although Modified DPGA didn’t perform well, but we gained valuable insight how we can modify this further to gain more performance. An in-depth experimentation with the parameters, operators and the stages of ASSS, with their effects on population fitness and diversity, is also carried out. Finally, in Chapter 5, we summarize our work and provide directions for future research.

- 14. XIV Chapter 2 Background Evolutionary Algorithms (EA) consist of several heuristics, which are able to solve optimization tasks by imitating some aspects of natural evolution. They may use different levels of abstraction, but they are always working on whole populations of possible solutions for a given task. EAs are an approved set of heuristics, which are flexible to use and postulate only negligible requirements on the optimization task. 2.1 When EA is Needed The search space is large, complex or poorly understood. Domain knowledge is scarce or expert knowledge is difficult to encode to narrow the search space. Only target (fitness) function is provided. No mathematical analysis is available. Traditional search methods fail. Not the best solution but good enough solution is needed. Local search methods can’t give good enough solutions. Continuous optimization problems. 2.2 Advantages of EA Applicable to a wide range of problems. Useful in areas without good problem specific techniques. No explicit assumptions about the search space necessary. Easy to implement. Any-time behavior. 2.3 Disadvantages of EA Problem representation must be robust. No general guarantee for an optimum. No solid theoretically foundations (yet). Parameter tuning: trial-and-error Process (but self-adaptive variants in evolution strategies). Sometimes high memory requirements. Implementation: High degree of freedom.

- 15. XV 2.4 Canonical Structure of EA EAs are family of algorithms. There’s no definite structure exists among them. Although most of the EAs follow more or less following structure: 1. Initialization: The initial population of candidate solutions is usually generated randomly across the search space. However, domain specific knowledge or other knowledge can easily be incorporated. 2. Evaluation: Once the population is initialized or offspring population is created, the fitness value of the candidate solutions is evaluated. 3. Parent Selection: Selection allocates more copies of those solutions with higher fitness values and thus imposes the survival-of-the-ﬁttest mechanism on the candidate solutions. The main idea of selection is to prefer better solutions to worse ones, and many selection procedures have been proposed to accomplish this idea, including roulette-wheel selection, stochastic universal selection, ranking selection and tournament selection, some of which are described in the next section. 4. Recombination: Recombination combines parts of two or more parental solutions to create new, possibly better solutions (i.e. offspring). There are many ways of accomplishing this (some of which are discussed in the next section), and competent performance depends on a properly designed recombination mechanism. The offspring under recombination will not be identical to any particular parent and will instead combine parental traits in a novel manner. 5. Mutation: While recombination operates on two or more parental chromosomes, mutation locally but randomly modiﬁes a solution. Again, there are many variations of mutation, but it usually involves one or more changes being made to an individual’s trait or traits. In other words, mutation performs a random walk in the vicinity of a candidate solution. 6. Replacement: The offspring population created by selection, recombination, and mutation replaces the original parental population. Many replacement techniques such as elitist replacement, generation-wise re-placement and steady-state replacement methods are used in GAs. 7. Repeat steps 2–6 until a terminating condition is met.

- 16. XVI Figure (2.4): Basic skeleton of an Evolutionary Algorithm 2.5 Representation of Gene Individual representations are typically divided into two types: 1. Genotypic Representation: Genes are internal structures those determine physical characteristics of an individual. Usually represented by array of letters like genes in human DNA. In case of EA, it is represented by bit-string. Genotypic representation is used extensible in Genetic Algorithm. But it has some limitation. Most real world problems are not in form of genotypic representation. So we have to device a scheme to represent genotype by bit-string. Performance of algorithm is dependent on representation of bit-string. 2. Phenotypic Representation: Individuals are represented by real valued vectors. So there’s no need to convert them to any other representations. Algorithm directly works on real valued vectors of problems. Extensively used in Evolutionary Strategy and Evolutionary Programming. It’s used in real valued function optimization.

- 17. XVII 2.6 Major Branches of EA EAs are divided into four major branches. 2.6.1 Genetic Algorithm Genetic Algorithm (GA) was first formulated by John Holland. Holland’s original GA is called standard Genetic Algorithm which uses two parents, produces two offspring. It simulates Darwinian evolution. Search operators are only applied to genotypic representation; hence it’s called Genotypic Algorithm. It emphasizes the role of crossover and mutation as a background operator. GA uses binary string as representation of individuals extensively. 2.6.2 Evolutionary Programming Evolutionary Programming (EP) was first proposed by David Fogel [2]. It is closer to Lamarckian evolution. It doesn’t use any kind of crossover. Only mutation is used both for exploitation and exploration. Individuals are represented by two parts: object variables and mutation step size . are essentially real valued vectors i.e. phenotypes. So they are called Phenotypic Algorithm. 2.6.3 Evolutionary Strategies Evolutionary Strategies (ES) was first proposed by Ingred Rechenberg. Individuals are represented by real valued vectors. Good optimizer of real valued functions. Like EP, they are also Phenotypic Algorithm. Mutation plays the main role, crossover is also used. It has special self-adapting step size of mutation. ES has some basic notation: 1. (p,c) The p parents 'produce' c children using mutation. Each of the c children is then assigned a fitness value, depending on its quality considering the problem-specific environment. The best (the fittest) p children become next generations parents. This means the c children are sorted by their fitness value and the first p individuals are selected to be next generations parents (c must be greater or equal p). 2. (p+c) The p parents 'produce' c children using mutation. Each of the c children is then assigned a fitness value, depending on its quality considering the problem-specific environment. The best (the fittest) p individuals of both: parents and children become next generations parents. This means the c children together with the p parents are sorted by their fitness value and the first p individuals are selected to be next generations parents. 3. (p/r,c) The p parents 'produce' c children using mutation and recombination. Each of the c children is then assigned a fitness value, depending on its quality considering the problem-specific environment. The best (the fittest) p children become next generations parents. This means the c children are sorted by their fitness value and the first p individuals are selected to be next generation parents (c must be greater or equal p). 4. (p+c) The p parents 'produce' c children using mutation and recombination. Each of the c children is then assigned a fitness value, depending on its quality considering the problem-specific environment. The best (the fittest) p individuals of both: parents and

- 18. XVIII children become next generation parents. This means the c children together with the p parents are sorted by their fitness value and the first p individuals are selected to be next generations parents. 2.6.4 Genetic Programming Genetic Programming (GP) is put forward by John Koza. GP evolves computer programs. It is a specialization of genetic algorithms (GA) where each individual is a computer program. It is a machine learning technique used to optimize a population of computer programs according to a fitness landscape determined by a program's ability to perform a given computational task. Trees can be easily evaluated in a recursive manner. Every tree node has an operator function and every terminal node has an operand, making mathematical expressions easy to evolve and evaluate. Genetic programming starts with a primordial ooze of thousands of randomly created computer programs. This population of programs is progressively evolved over a series of generations. The evolutionary search uses the Darwinian principle of natural selection (survival of the fittest) and analogs of various naturally occurring operations, including crossover (sexual recombination), mutation, gene duplication, gene deletion. Genetic programming sometimes also employs developmental processes by which an embryo grows into fully developed organism. It uses both mutation and crossover. Trees are often used as data structure for individuals. Although non-tree representations have been suggested and successfully implemented. Although other fields of EA developed to be in mainstream usage, GP still is in its infancy. Because of representation of programs, huge search space, complex operation is needed to generate better individuals, GP isn’t mainstream yet. Figure (2.6.4): Individual structure of GP

- 19. XIX 2.6.5 Memetic algorithm Although Memetic algorithms don’t fall into EA category, they incorporate other searching techniques to EAs. The combination of Evolutionary Algorithms with Local Search Operators that work within the EA loop has been termed “Memetic Algorithms” (MA). Quite often, MA are also referred to in the literature as Baldwinian Evolutionary algorithms (EA), Lamarckian EAs, cultural algorithms or genetic local search. After generating individuals local search is performed on them. The frequency and intensity of individual learning directly define the degree of evolution (exploration) against individual learning (exploitation) in the MA search, for a given fixed limited computational budget. Clearly, a more intense individual learning provides greater chance of convergence to the local optima but limits the amount of evolution that may be expended without incurring excessive computational resources. Therefore, care should be taken when setting these two parameters to balance the computational budget available in achieving maximum search performance. When only a portion of the population individuals undergo learning, the issues on which subset of individuals to improve need to be considered to maximize the utility of MA search. 2.7 Existing Works 2.7.1 Dynamic Parameter Control A variety of previous works have proposed methods of dynamically adjusting the parameters of GA or other evolutionary algorithms. These methods include deterministic parameter control, adaptive parameter control, and self-adaptive parameter control. The simplest technique is the deterministic parameter control, which adjusts parameters according to a predetermined policy. Since it controls the parameters deterministically, it cannot adapt to the changes that occur during the execution of an algorithm. Adaptive parameter control exploits feedback from the evolution of a population to control the parameters. A notable example is the 1:5 adaptive Gaussian mutation widely used in the evolution strategy algorithms. According to this method, the mutation step size is increased if more than 20% of the mutations are successful and reduced otherwise. However, this method cannot be applied to algorithms adopting other than the real number representation. Finally, self-adaptive parameter control encodes the parameters into chromosomes and let them evolve with other genes. Although elegant, its applicability and effectiveness in a broad range of problems have not yet been shown 2.7.2 Maintaining Diversity and Multi-population Genetic Algorithms Multi population GAs (MPGAs) do so by evolving multiple subpopulations which are spatially separated [6]. Island-model GA (IMGA), which is a typical example of MPGA, evolves two or more subpopulations and uses periodic migration for the exchange of information between the subpopulations. The number and size of the populations of IMGA

- 20. XX are predetermined and kept unchanged during the algorithm’s execution. However, other MPGAs such as multinational GA forking GA the bi-objective multi population algorithm and variable island GA can adjust the number and size of populations dynamically by splitting a population into two smaller ones or combining two similar ones. The performance of IMGA is sensitive to the migration policy, migration rates and size, and the particular topology used, because they determine the spread speed of good solutions among the subpopulations. A variety of previous works have studied the effect of these parameters for migration both theoretically and experimentally. 2.7.3 Memory Based Genetic Algorithm Diploid GA, GA with unexpressed genes, dual GA (dGA), and primal-dual GA (PDGA) have adopted complementary and dominance mechanisms to maintain or provide population diversity. Most organisms in nature have a great number of genes in their chromosomes and only some of the dominant genes are expressed in a particular environment. The repressed genes are considered as a means of storing additional information and providing a latent source of population diversity. Diploid GAs use diploid chromosomes which are different from natural ones in that the two strands of the diploid chromosomes are not complementary. Only some genes in a diploid chromosome are expressed and used for ﬁtness evaluation by some predetermined dominance rules. GAUG is different from diploid GA in that it uses haploid chromosomes, but it also incorporates some unexpressed genes into its chromosomes. The unexpressed genes in GAUG are not used for ﬁtness evaluation but used for preserving diversity. dGAs and PDGAs also have haploid chromosomes in the population, but the chromosomes are sometimes interpreted complementarily to provide additional diversity. In dGA, each chromosome is attached with an additional bit which indicates whether the chromosome should be interpreted as it is or as complemented. In PDGA, some bad-looking chromosomes are interpreted both as complemented and original, and the original one is replaced by the complemented one if the latter gives better evaluations. Since the additional diversity provided by memory-based algorithms makes it easier to adapt to extreme environmental changes, these methods are frequently used for dynamic optimization problems. 2.7.4 Mutation Based Work ES and EP use mutations exclusively for both maintaining diversity and exploitation. Mutations can be divided into several categories. Mutation classification based on uniform ness across generations is: 1. Uniform Mutation: When mutation step size or mutation rate is uniform regardless of generation at any time of algorithm, then it’s called uniform mutation. Its usage not very high because of deterministic behavior regardless of generations.

- 21. XXI 2. Non Uniform Mutation: If mutation step size of mutation rate varies with respect to generation, then it’s called non uniform mutation. Usually at initial generations, step size or mutation rate is higher. As generation continues to increase step size or mutation rate is decreased gradually. It’s used frequently, because it gives option for governing diversity rate and also when diversity is needed it’s facilitated by large step size and convergence is needed it’s facilitated by small step size. For genetic algorithm, random bit-flipping is used for mutation. Random bit changing has some issues. For example, bit changing in higher position in bit-string has more effect on bit changing in lower position. And also for some bit-string going to immediate next or previous bit-string needs all bits changing. So exploitation becomes difficult. It’s called Hamming Cliff problem. Using gray code can mitigate effect of this problem. For mutation, random step size is needed to introduce random walk into search space. For random number generation, Gaussian distribution is most used. It’s a bell shaped curve. It’s defined by two parameters: position parameter (mean, µ), scale parameter (standard deviation, σ) and is denoted by . Always µ=0 and usually σ=3 i.e. is used for random number. generation (RNG). Mutations using Gaussian distribution is called Gaussian mutation. Algorithms using distribution based mutation. Figure (2.7.4): Probability Distribution Function (PDF) of Gaussian distribution Xin Yao uses two more distributions for RNG. They are: 1. Cauchy Distribution 2. Levy Distribution Gaussian, Cauchy and Levy they all have same bell curve shape PDF. Both of them have same parameter set like Gaussian. Both Cauchy and Levy have fatter tail than Gaussian. That means they are able to give more long jumps which can give more diverse individuals; less prone to getting trapped into local optima. Mutation using Cauchy and Levy distribution as

- 22. XXII RNG are called Cauchy mutation and Levy mutation respectively. Xin Yao uses adaptive mutation parameter. Every individual is represented by pair of , where is real values vectors, is adaptive mutation parameter, is size parameter or standard deviation of that distribution. 2.7.5 Survivor Selection Based Work Survivor selection is usually deterministic. In this phase of algorithm, selection pressure is applied to individuals. Several survivor selection schemes exist: 1. Naïve Survivor Selection: Basically follows survival of the fittest principle. Individuals are selected based on their fitness value for next generation. Lower fitness valued individuals are weed out. Sometimes risky, because lower fitness individuals can have latent genes which can give better individuals in later generations. 2. Elitist Selection: Population maintains spot for best individuals so that they didn’t get lost across the generations. Certain portion of best individuals is transferred directly to next generation without any modification. This ensures even if algorithm can’t make any solution any better than current solutions, the best solution must remain and in the end of the algorithm is returned. 3. Truncation Selection: Truncation selection simply retains the fittest x% of the population. These fittest individuals are duplicated into the next generation, so that the population size is maintained. Less fit candidates are culled even without being given the opportunity to evolve into something better. Very often results in premature convergence. Only advantage is rapid convergence. Figure (2.7.5): Truncation Selection

- 23. XXIII 4. Fitness Proportionate Reproduction: Same as roulette wheel selection scheme. Individuals are directly transferred to next generation based on their proportionate fitness value. Individuals of lower fitness still have some chances to survive, so that some genes that are latent can survive through generations even so they haven’t been able to generate good individuals. 5. Niching Methods: Niching methods strive to maintain niches [9] [10]. That means it ensures individuals of one niche don’t have to compete with individuals of other niches. The advantage is pre-existing diversity is maintained. But also makes convergence harder as selection pressure is lower. Niching methods are divided into two categories: i) Fitness Sharing: In nature, individuals of same species compete with each other for fixed resources [13]. Like nature, in fitness sharing, individuals in same region share fixed fitness values assigned to that region. Fitness is a shared resource of the population. Population is first divided into niches. Region is defined by sharing radius . Sharing Radius defines the niche size. This scheme is very sensitive to the value of assigned fitness per region and sharing radius. Population does not converge as a whole, but convergence takes place within the niches. Sharing can be done at genotypic or phenotypic level: 1. Genotypic level: Hamming distance and 2. Phenotypic Level: Euclidean distance. Sharing radius: if too small, practically no effect on the process; if too large, several peaks will ’melt’ individual peaks into one. ii) Crowding: Similar individuals in natural population, often of the same species, compete against each other for limited resources. Dissimilar individuals tend to occupy different niches, they typically don’t compete. Crowding uses individuals newly entering in a population to replace similar individuals. Random sample of CF (Crowding Factor) individuals is taken from the population. Larger crowding factor indicates less tolerance for the similar solutions, smaller values indicate similar solutions are more welcomed. New members of particular species replace older members of that species, not replacing members of other species. Crowding doesn’t increase the diversity of population; rather it strives to maintain the pre- existing diversity. It’s not directly influenced by fitness value. Crowding is divided into: (1) Deterministic Crowding: New individual will always replace the most similar individual if it has better fitness value. (2) Probabilistic Crowding: Primarily a distance based niching method. Main difference is the use of a probabilistic rather than deterministic acceptance function. No longer do stronger individuals win over weaker individuals, they win proportionally according to their fitness, and thus we get restorative pressure. Two core ideas of probabilistic crowding are to hold tournament between similar individuals and to let tournaments be probabilistic. 6. Deterministic Sampling: Average fitness of the population is calculated. Fitness associated to each individual is divided by the average fitness, but only the integer part of this operation is stored. If the value is equal or higher than one, the individual

- 24. XXIV is copied to the next generation. Remaining free places in the new population is fulfilled with individuals with the greatest fraction.

- 25. XXV Chapter 3 Algorithm Proposal 3.1 Dual Population Genetic Algorithm Dual Population Genetic Algorithm (DPGA) is a genetic algorithm which uses two populations instead of one to avoid premature convergence with two different evolutionary objectives [11] [12] [13]. The main population plays the role of that of an ordinary genetic algorithm. It evolves to find a good solution of high fitness value. The additional population is called reserve population is employed as reservoir for additional chromosomes which are rather different from chromosomes of main population. Two different fitness functions are used. Main population uses actual fitness function (like normal GA) and reserve population uses a fitness function which gives better fitness to the chromosomes more different from chromosomes of main population. Multi Population Genetic Algorithms use migration of chromosomes from one population to another population to exchange information. DPGA doesn’t use migration instead it uses another noble approach called crossbreeding. Crossbreeding is performed by taking one parent from main population and another parent from reserve population, making crossover between them. Newly born offspring are called crossbred offspring. Crossbred offspring then evaluated for both main population and reserve population for survival. DPGA also employs inbreeding, which takes two parents from the same population and makes offspring by crossover. These inbred offspring compete for survival in their respective parent population. Figure (3.1a): Offspring Generation of DPGA Mutation plays minimal role in DPGA and diversity is mainly provided by reserve population through crossbreeding. Crossbreeding plays the role of maintaining diversity in DPGA. The amount of diversity needed in any step of DPGA is specified by a self-adaptive parameter δ (0 < δ 0 and dn such that

- 36. XXXVI Figure (3.4): Probability Density Function of Stable Family So we can try other member distributions of stable family to generate random numbers for mutation. Other two members of stable family are: 1. Laplace Distribution 2. Slash Distribution 3.4.1 Laplace Distribution Like Gaussian distribution, it has two parameters: Location parameter, µ and Scale parameter, σ. Cauchy distribution is the result of Fourier transformation of Laplace distribution. The probability density function of the Laplace distribution is also reminiscent of the Gaussian distribution; however, whereas the Gaussian distribution is expressed in terms of the squared difference from the mean μ, the Laplace density is expressed in terms of the absolute difference from the mean. Consequently the Laplace distribution has fatter tails than the Gaussian distribution.

- 37. XXXVII Figure (3.4.1a): Probability Density Function of Laplace Distribution Figure (3.4.1b): Comparison of Gaussian and Laplace Distribution Above is a graph of Gaussian and Laplace distribution with same scale and location parameter. It is noticeable that Laplace has fatter tail than Gaussian and has a sharper peak than Gaussian. Laplace falls rather quickly in comparison with Gaussian. It is expected to have a higher probability of escaping from a local optima or moving away from a plateau, especially when “the basin of attraction” of the local optima or plateau is large relative to the mean step size. On the other hand, Gaussian has greater probability in the mid-range. From observation, we can conclude that, sharp peak of Laplace facilitates exploitation as it has more probability of producing short jump; it can also give long jump more than Gaussian. Although for the mid-range jump, Gaussian gives better result.

- 38. XXXVIII 3.4.2 Slash Distribution The Slash distribution is a continuous unbounded distribution developed as a deviation to the Gaussian distribution to allow for fatter tails kurtosis by altering the κ parameter, as illustrated in the plot below. When κ=0 the distribution reduces to a Gaussian(μ, σ). If Gaussian distribution is divided by a standard uniform random variable, then the resulting distribution is Slash distribution. It’s an example of ratio distribution. It has three parameters, like Gaussian distribution location parameter μ, scale parameter σ and an extra parameter κ. Figure (3.4.2): Probability Density Function of Slash Distribution at different parameters From above graph, we see that, as value of κ getting bigger, tail and peak of Slash distribution is getting bigger, slope is getting steeper and mid-range is getting smaller. By controlling the value κ, we can get an adaptive probability distribution which will facilitate two extreme ends: exploitation and exploration. The Slash distribution is used to fit to data that are approximately Gaussian distribution but have a kurtosis > 3. i.e. greater than the Gaussian distribution. The Slash distribution can readily be compared to a Gaussian distribution since they share the same mean μ and standard deviation σ parameters. Another distribution of which Gaussian distribution is a special form of family of distributions is called Student’s t-distribution.

- 39. XXXIX 3.4.3 Student’s t-distribution Student’s t-distribution (or simply the t-distribution) is a family of continuous probability distributions that arises when estimating the mean of a normally distributed population in situations where the sample size is small and population standard deviation is unknown. The t-distribution is symmetric and bell-shaped, like the normal distribution, but has heavier tails, meaning that it is more prone to producing values that fall far from its mean. Figure (3.4.3): Probability Density Function for Student’s t-distribution with different degrees of freedom The overall shape of the probability density function of the t-distribution resembles the bell shape of a normally distributed variable with mean 0 and variance 1, except that it is a bit lower and wider. As the number of degrees of freedom grows, the t-distribution approaches the normal distribution with mean 0 and variance 1. As t-distributions have similar bell curve shape of Gaussian distribution and when the number of degrees of freedom reaches infinity it converges to Gaussian. Although for practical purposes, when degree of freedom is 30, t- distribution converges to Gaussian. Careful observation reveals that when degree of freedom is low, t-distribution have much fatter tail and lower peak. As degree of freedom (DOF) increases, tails become thinner and peak becomes higher. That means, at low DOF, this distribution gives more long jumps and with increase of DOF distribution gives sorter jumps. So we can exploit this behavior of t-distribution. At the beginning of EA, DOF for t- distribution will be low; diversity is needed so t-distribution will produce long jumps to facilitate diversity. As generation increases we will increase DOF of t-distribution, it will give less short jumps and the algorithm will be less exploitative.

- 40. XL Chapter 4 Experimental Study 4.1 Modified DPGA We have implemented 1. Standard GA 2. DPGA 3. Modified DPGA Parameter setting for algorithms: Maximum generation = 1000 Population size = 500. Main population parents = 2 Reserve population parents = 2 Inbred main population offspring = 2 Inbred reserve population offspring = 2 Crossbred offspring (are produced by taking 1 parent individual from main population and another parent from reserve population) = 2 For crossover blend crossover method was used with parameter . Uniform Gaussian mutation with is applied for both of them. Tournament selection was used for parent selection. Naïve survivor selection method was adopted. 4.1.1 Pitfalls of Modified DPGA Theoretically the proposed algorithm should work better than DPGA. But in practice it doesn’t. At the beginning if the algorithm, we set which means generated individual should be at maximum distance possible. As a result, on the fly generated individual always go to the edge of search space. So generated individual only searches extreme ends of search space. Thus offspring produced by crossover taking generated individual as parent, are also on the boundary of search space or in its neighborhood. Mutating these offspring seldom works, because short or mid-range jumps will still keep individual near other individuals. And long jump needed to introduce diversity is very unlikely by current mutation operators.

- 41. XLI Even if we design a mutation operator which gives this sort of jump, it has the risk of taking individual out of search space and taking sufficiently diverse individual to already searched regions. If we have search space of n dimensional, every dimension has same lower bound, upper bound then the search space will be like n-dimensional hypercube. Literally this algorithm only searches the faces of hypercube and their neighborhoods, while core region of hypercube remains unsearched. Another modification can be made to start the algorithm with lower value of . This will prevent generating individual at the boundary of search space as well as offspring. But lower value of also means algorithm is unable to make long jumps. 4.2 Adaptive Survivor Selection Strategy We have used standard GA with different types of survivor selection scheme. Implemented schemes are: 1. Naïve survivor selection 2. Adaptive survivor selection Initial setting of parameters of adaptive survivor selection: Population size = 2000 Maximum generation = 500 Bucket edge length = 1.90734863e-6 Minimum number of generations required to be declared searched = 70 Minimum number of individuals required to be declared searched = 3 Survival adaptive parameter = 0.3 Diversity scaling factor = 50 Penalty factor = 10 Reserved number of best individuals (elite) = 20 Reserved number of most diverse individuals (elite) = 20 Experiment result for Ackley function by both adaptive survivor selection and naïve survivor selection given below:

- 42. XLII Figure (4.2a): Change in diversity across generations Figure (4.2b): Number of buckets searched 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 Adaptive Survivor Selection Naive Survivor Selection

- 43. XLIII Chapter 5 Conclusion 5.1 Adaptive survivor selection is a noble approach which first introduces adaptive survivor selection with new diversity measurement. The best tool it provides that selection method can be adapted with respect to generation and diversity level. It checks the amount of diversity so that at any moment diversity doesn’t fall beyond lowest permitted value. It also incorporates elitist selection scheme not only for best individuals but also for most diverse individuals; so that when individuals are trapped into deep local optima, these most diverse individuals found by far helps to escape. Experiments show that adaptive survivor selection beats currently most used naïve survivor selection in terms of maintaining diversity exclusively. Although niching methods can maintain pre-existing diversity better than adaptive survivor selection sometimes, but we can mitigate this gap of performance by using proper initialization of adaptive diversity parameter and update rule. This scheme addresses one of the drawbacks of niching methods, they can’t control the diversity needed for at any generation. Actually niching methods and adaptive survivor selection have different goals. Niching methods mainly focuses on growing niches of individuals and maintaining niches, on the other hand our scheme focuses on maintain the level of diversity which can guide to individuals to global maxima. 5.2 Future Work 5.2.1 Modified DPGA It is obvious that value of δ caused this measurable performance of this algorithm. If we can change initialization and update rule of δ, hopefully this algorithm will perform better. One approach could be instead of initializing δ to 1, we will initialize δ to lower values. Thus risk of individuals going beyond the search space or only residing on the search space boundary will be mitigated. But this approach has a flaw. If we restrict δ to lower values, that means algorithm is now less capable of getting out of local optima and hence more prone to premature convergence. Assigning value to δ can be taken from a probability distribution. So that δ won’t be vulnerable to being too high or too low. After the initialization problem of δ is solved, update rule of δ is still needs to be revised.

- 44. XLIV 5.2.2 Adaptive Survivor Selection We have investigated new diversity measurement technique and using that diversity measurement technique, we have proposed new survival selection strategy which works better than existing survivor selection schemes. A pitfall of new diversity measurement is for some edge cases, diversity measurement gives high value of diversity although the population isn’t diverse at all. So detecting these edge cases and mitigating the error caused by these edge cases can be done in future. Also we can adopt fitness sharing to assign fitness to each bucket, where every individual of that bucket will share that fitness. Assigned fitness to a bucket will be dependent how much diverse that bucket is. That means instead of measuring diversity of individuals, we are measuring diversity of their container buckets. Once bucket is assigned fitness, then individuals of same bucket will share that fitness among them. 5.2.3 New Distribution Based Mutation We have investigated distributions which have similar properties of currently deployed distributions or have same family origin. These distributions have bell shaped curve similar to Gaussian to and also dependent on the same set of parameters like Gaussian, Cauchy or Levy distributions. Three distributions presented before has potential to replace current distribution based mutations. All of them have fatter tails and Laplace, Slash distributions have higher peaks, so theoretically both of them should give better performance in both exploration and exploitation. Student’s t-distribution has converged to Gaussian at DOF 30. So we can experiment on which initial DOF, we initiate our algorithm and how we can change the DOF as the generation increases.

- 45. XLV References [1] D.E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading, MA, 1989. [2] L.J. Fogel, A.J. Owens, and M.J. Walsh, Artificial Intelligence through simulated evolution, New York, John Wiley & Sons, 1966. [3] E. Eiben, R. Hinterding, and Z. Michalewicz, “Parameter control in evolutionary algorithms,” IEEE Trans. Evol. Comput., vol. 3, no. 2, pp. 124–141, Jul. 1999. [4] D. H. Wolpert and W. G. Macready, “No free lunch theorems for optimization”, IEEE Transactions on Evolutionary Computation, vol. 1, no. 1, pp. 67–82, 1997. [5] D. E. Goldberg and J. Richardson, “Genetic algorithms with sharing for multimodal function optimization,” in Proc. 2nd Int. Conf. Genetic Algorithms (ICGA), 1987, pp. 41–49. [6] T. Jumonji, G. Chakraborty, H. Mabuchi, and M. Matsuhara, “A novel distributed genetic algorithm implementation with variable number of islands,” in Proc. IEEE Congr. Evolut. Comput., 2007, pp. 4698–4705. [7] Y. Yoshida and N. Adachi, “A diploid genetic algorithm for preserving population diversity-pseudo-Meiosis GA,” in Proc. 3rd Parallel Problem Solving Nature (PPSN), 1994, pp. 36–45. [8] M. Kominami and T. Hamagami, “A new genetic algorithm with diploid chromosomes by using probability decoding for nonstationary function optimization,” in Proc. IEEE Int. Conf. Syst., Man, Cybern., 2007, pp. 1268–1273. [9] S. W. Mahfoud, “Crowding and preselection revisited,” in Proc. 2nd Parallel Problem Solving Nature (PPSN), 1992, pp. 27–37. [10] S. W. Mahfoud, “Niching methods for genetic algorithms,” Ph.D. dis-sertation, Dept. General Eng., Univ. Illinois, Urbana-Champaign, 1995. [11] T. Park and K. R. Ryu, “A dual population genetic algorithm with evolving diversity,” in Proc. IEEE Congr. Evol. Comput. , 2007, pp. 3516–3522. [12] T. Park and K. R. Ryu, “Adjusting population distance for dual-population genetic algorithm,” in Proc. Aust. Joint Conf. Artif. Intell., 2007, pp. 171–180. [13] T. Park and K. R. Ryu, “A Dual-Population Genetic Algorithm for Adaptive Diversity Control” in Proc. Aust. Joint Conf. Artif. Intell., 2009, pp. 191–210. [13] R. McKay, “Fitness sharing in genetic programming,” in Proc. of the Genetic and Evolutionary Computation Conference, Las Vegas, Nevada, 2000, pp. 435–442.

- 46. XLVI [14] R. K. Ursem, “Diversity guided Evolutionary algorithm,” in Proc. of Parallel Problem Solving from Nature (PPSN) VII, vol. 2439, J. J. Merelo, P. Adamidis, H. P. Schwefel, Eds. Granada, Spain, 2002, pp. 462–471. [15] T. Bäck and H.-P. Schwefel, “An overview of evolutionary algorithms for parameter optimization,” Evol. Comput., vol. 1, pp. 1–23, 1993. [16] K. Chellapilla, “Combining mutation operators in evolutionary programming,” IEEE Trans. Evol. Comput., vol. 2, pp. 91–96, Sept. 1998. [17] R. Mantegna, “Fast, accurate algorithm for numerical simulation of Lévy stable stochastic process,” Phys. Rev. E, vol. 49, no. 5, pp. 4677–4683, 1994. [18] X. Yao, G. Lin, and Y. Liu, “An analysis of evolutionary algorithms based on neighborhood and step size,” in Proc. 6th Int. Conf. Evolutionary Programming, 1997, pp. 297–307 [19] D. Thierens, “Adaptive mutation rate control schemes in genetic algorithms,” in Proc. Congr. Evol. Comput. , vol. 1. 2002, pp. 980–985. [20] G. Rudolph, “On takeover times in spatially structured populations: Array and ring,” in Proc. 2nd Asia-Paciﬁc Conf. Genetic Algorithms Applicat., 2000, pp. 144–151. [21] X. Yao, Y. Liu, and G. Lin, “Evolutionary programming made faster,” IEEE Trans. Evol. Comput., vol. 3, no. 2, pp. 82–102, Jul. 1999.
View More

# Adaptive Selection in Evolutionary Algorithm thesis

Adaptive Selection in Evolutionary Algorithm thesis