Proposta sem aluno

Gerado a 2025-07-18 23:45:35 (Europe/Lisbon).

Voltar

Titulo Estágio

Heuristics for the star discrepancy subset selection problem

Áreas de especialidade

Sistemas Inteligentes

Local do Estágio

ECOS/CISUC

Enquadramento

The star discrepancy disc(X) of a set of points x_1,...,x_n in [0,1]^d measures how much the distribution of X deviates from a perfectly uniform one [1]. More precisely, the star discrepancy of X is defined as the largest difference between the volume of an axis-parallel box [0,u] := [0,u_1] x [0,u_2] x ... x [0,u_d] and the fraction of points that are enclosed by this box. Thus, intuitively, a point set of small star discrepancy allows to approximate the volume of a box [0,u] by counting the number of points in it.

Point sets of small star discrepancy are needed in numerous applications, such as in the design of experiments, in computer vision, in astronomy, and in financial engineering. Its most famous applications, however, are in numerical integration, where low discrepancy point sets are known to outperform Monte Carlo integration methods, which rely on randomly chosen points. However, while low discrepancy point sets such as Latin Hypercube Samples, Sobol sequences, Halton point sets, etc. are currently gaining a lot of interest in various scientific disciplines, we currently lack a good understanding on which of these sets are favorable for which combinations of n and d -- a result of the fact that the computation of disc(X) is an NP-hard problem [2].

The best-known algorithm to compute star discrepancies has a complexity of n^(1+d/2) [3], which is prohibitively large for many real-world applications. Heuristic approximations such as the algorithm proposed in [4] are therefore in high demand.

Star discrepancies can also be used as a diversity measure for developing distinct problem instances, which are needed as training sets in the context of machine learning. A common way to derive candidate point sets is an iterative procedure which alternates between generating new candidate points and reducing the set to the desired size, see [5] for an example.

Objetivo

The above-described iterative procedures require to select from a set of n+k points those n points that minimize the star discrepancy. Put differently, we need to eliminate k points from the set, in a way that leaves us with the point set of smallest possible star discrepancy. This subset selection problem thus forms an optimization problem with a hard to evaluate objective function (since, effectively, we need to evaluate the star discrepancy of any of the subsets of size n). Exact optimization methods, such as full enumeration or branch-and-bound algorithms are not practical. We therefore seek to develop heuristic approaches to solve the subset selection problem.

The project comprises the design, the implementation, and the performance assessment of subset selection heuristics. Starting from an exact branch-and-bound solver, which is being developed in the context of a MSc thesis [6], we will consider different ways of speeding up the search through heuristic approximation.

Depending on the student's interest, theoretical aspects such as determining the complexity of the subset selection problem can also be considered as part of this project.

Plano de Trabalhos - Semestre 1

- Literature review: Notions of star discrepancy, subset selection problems, branch-and-bound, greedy and local search heuristics
- Development of greedy heuristics
- Preliminary experimental evaluation
- Writing of the intermediate report

Plano de Trabalhos - Semestre 2

- Development of improved heuristics based on local search and evolutionary approaches
- Experimental evaluation
- Study of particular variants
- Computational complexity (optional)
- Writing of the final report

Condições

This MSc project is for students that would like to deepen his knowledge on algorithms. The MSc student can use the computational resources from ECOS group for performing experiments. Depending on the student's interest, the focus can be set on empirical or on theoretical aspects of the subset selection problem.

This is a joint project with Prof Michael Emmerich, University of Leiden and Dr. Carola Doerr, Sorbonne University. This project has no funding available. However, a non-mandatory research visit to Carola Doerr's research group at Sorbonne University in Paris can be offered.

Observações

[1] J. Matousek. Geometric Discrepancy. Springer, 2010.

[2] M. Gnewuch, A. Srivastav, and C. Winzen. Finding optimal volume subintervals with k points and calculating the star discrepancy are NP-hard problems. Journal of Complexity, Volume 25, pages 115-127, Elsevier, 2009.

[3] D. Dobkin, D. Eppstein, and D. Mitchell. Computing the Discrepancy with Applications to Supersampling Patterns. ACM Transactions on Graphics, Volume 15, pages 354-376, ACM, 1996.

[4] M. Gnewuch, M. Wahlstroem, and C. Winzen. A New Randomized Algorithm to Approximate the Star Discrepancy Based on Threshold Accepting. SIAM Journal on Numerical Analysis, Volume 50, pages 781-807, SIAM Society for Industrial and Applied Mathematics, 2012.

[5] A. Neumann, W. Gao, C. Doerr, F. Neumann, and M. Wagner, Discrepancy-based Evolutionary Diversity Optimization. Proc. of Genetic and Evolutionary Computation Conference (GECCO 2018), pages 991-998, ACM, 2018.

[6] G. Martins, Algorithms for the Star Discrepancy Subset Selection Problem, MSc thesis, Department of Informatics Engineering, University of Coimbra, 2019 (in progress).

Orientador

Luís Paquete and Carola Doerr
paquete@dei.uc.pt 📩