Titulo Estágio
Algorithms for the star discrepancy subset selection problem
Áreas de especialidade
Sistemas Inteligentes
Local do Estágio
DEI
Enquadramento
Given a point set P={x1,..,,xn}, its discrepancy is measured as the largest deviation from a perfectly evenly distributed point set [2]. In star discrepancy, we consider the d-dimensional unit cube and measure the discrepancy with respect to all intervals (axis-parallel boxes) in the form [0,u] := [0,u1] x [0,u2] x ... x [0,ud]. In a low-discrepancy set P of points, the number of points in P inside a box [0,u] should be proportional to its volume. Procedures that generate such set of points arise in Monte Carlo integration, statistics and computer graphics as well as in selection mechanisms within evolutionary algorithms [2].
Formally, the star discrepancy D*(P) of a point set P={x1,...,xn} in the d-dimensional unit cube is defined as
D*(P) = sup {| ( A(B,P) / n ) - Vol ( B ) |, B in J }
where Vol is the d-dimensional Lebesgue measure (or volume), A(B,P) is the number of points in P that fall into B, and J is the set of d-dimensional intervals (or boxes) of the form [0,u1] x [0,u2] x ... x [0,ud] where each ui is a real value between 0 and 1.
The problem of finding the star discrepancy is NP-hard. The smaller the star discrepancy of a point set, the more regular is its distribution with respect to all axis-parallel boxes. It has been shown that the number of intervals to consider in J is finite. An algorithm to this problem is discussed in [1].
Objetivo
Subset selection procedures arise in the selection mechanisms of evolutionary algorithms. One might be interested in a selection that
favours star discrepancy to increase diversity of the population in the objective space. The star discrepancy subset selection consists of finding the subset of k points of a point set P that maximizes the star discrepancy. The goal of this work is to develop implicit enumeration algorithms (based on dynamic programming or branch-and-bound) that are able to solve this problem for an arbitrary set of points P, number of dimensions and value of k. Code for computing the star discrepancy is available. Links with existent work for the hypervolume subset selection problem will also be established [4].
Plano de Trabalhos - Semestre 1
1) Literature review
2) Definition of upper and lower bounds on the optimal star discrepancy value
3) Implementation of a brute-force approach
4) Implementation of an implicit enumeration approach for 2 dimensions
5) Writing of the intermediate report
Plano de Trabalhos - Semestre 2
1) Implementation of an implicit enumeration approach for an arbitrary number of dimensions
2) Experimental analysis
3) Writing of the final report
Condições
This MSc project is for students that would like to deepen his knowledge
on algorithms. The MSc student can use the computational resources from
ECOS group for performing experiments. This project has no funding available.
Observações
This work involves the colaboration of Prof Michael Emmerich, University of Leiden and Prof Carola Doerr, Sorbonne University.
References
[1] D. Dobkin, D.Eppstein, Computing the discrepancy, Proceedings of the ninth annual symposium on Computational Geometry, pages 47-52, 1993
[2] A. Neumann,W.Gao , C. Doerr, F. Neumann, and M. Wagner. Discrepancy-based Evolutionary Diversity Optimization. Proc. of Genetic and Evolutionary Computation Conference (GECCO 2018), ACM, 2018, to appear.
[3] C. Doerr, M. Gnewuch, and M. Wahlstrom. Calculation of discrepancy
measures and applications. In G. T. W.W.L. Chen, A. Srivastav, editor, A Panorama of Discrepancy Theory, volume 2107 of Lecture Notes in Mathematics, pages 621?678. 2014.
[4] R. J. Gomes, A. P. Guerreiro, T. Kuhn, L. Paquete, Implicit enumeration strategies for the hypervolume subset selection problem. Computers & OR 100: 244-253 (2018)
Orientador
Luís Paquete
paquete@dei.uc.pt 📩