Propostas Submetidas

Gerado a 2025-03-13 07:44:23 (Europe/Lisbon).

Titulo Estágio

Characterizing HTML defects on the web

Áreas de especialidade

Engenharia de Software

Local do Estágio



HTML is currently being massively used to provide rich contents to users. Since its inception in 1990, many versions of HMTL have been created and developers have been many times upgrading their web sites to support the latest standards. Although we can find many versions of the HTML language being used throughout the web, including many that do not comply with the standards (and fail when being processed by browsers), we lack quantitative indicators of their conformance to specifications (i.e., the most frequent errors made by developers while building their webpages) and the specific versions being used. This information can be useful not only to train developers and improve HTML editing tools but also to help browser development and optimization.


The goal of this work is to design a tool to crawl the web and understand the HTML versions being used and especially the most common errors made by developers. A set of recommendations / lessons learned is expected to be produced by the student at the end of the work. In practice, the expected outcome of this internship is:
– A toolset, based on current available tools, such as web crawlers, HTML validation tools or APIs (including integration with web services), that allows us to easily extract and analyze metrics about web pages.
– A research paper, to be submitted and presented at a top international conference, describing the mechanisms created/used and a set of lessons learned from the experiments.

Plano de Trabalhos - Semestre 1

[Some tasks might overlap; M=Month]
T1 (M1 – M2): Knowledge transfer and state of the art review on HTML, web crawlers, and HTML validators and APIs.
T2 (M3) Definition of the requirements for the tool set to be build / integrated.
T3 (M3–M4): Design of the preliminary architecture of the toolset, including storage architecture.
T4 (M4) Implementation of a small proof-of-concept prototype.
T5 (M5): Writing the Intermediate report.

Plano de Trabalhos - Semestre 2

[Some tasks might overlap; M=Month]
T6 (M6): Integration of the intermediate defense comments into the architecture and toolset.
T7 (M6–M7) Implementation of the tool and tests.
T8 (M8): Execution of tests and analysis of results.
T9 (M9): Write a research paper and submission to a top international conference on the Dependability or Services areas (IEEE/IFIP Dependable Systems and Networks, IEEE Services Computing Conference, International Conference on Service Oriented Computing, etc.).
T10 (M10): Writing the thesis.


The selected student will be integrated in the Software and Systems Engineering group of CISUC and the work will be carried out in the facilities of the Department of Informatics Engineering at the University of Coimbra (CISUC - Software and Systems Engineering Group), where a work place and necessary computer resources will be provided.


This internship follows up on the work published in:
Mendes J, Laranjeiro N, Vieira M. Toward characterizing HTML defects on the Web. Softw Pract Exper. 2017;1-8.

Please contact the advisor for any question or clarification needed.


Nuno Laranjeiro 📩