Titulo Estágio
Reconstructing Service Networks from Crunchbase.com
Áreas de especialidade
Sistemas de Informação
Local do Estágio
DEI-FCTUC
Enquadramento
The vision statement “Locating Your Next Strategic Opportunity”, published by Harvard Business Review in March 2011, clearly shows some of the innovations that can be explored by using service networks. For example, service networks can be used to identify niche markets, which are fertile to explore new ideas. Unfortunately, no attempts have been made to reconstruct such large-scale networks.
This project will tackle this limitation and will reconstruct large-scale service networks from the richness of information available on web registries. For example, CrunchBase.com contains the largest corpus of structured data for technological industries and includes profiles for 7,249 service providers, 224,392 companies, 269,171 key people, and 14,519 financial organizations. Other examples of registries include ProgrammableWeb (PW), venturebeat.com, and COMPUSTAT financial databases.
Objetivo
Develop a crawling engine to reconstruct service networks from web data sources. Obtaining a good and large data set is one of the biggest challenges of this project. Fortunately, recent -- and still fairly unexplored initiatives--, such as linked data, open (government) data, and crowdsourced data (e.g., Crunchbase), provide valuable new data sources that are made accessible online, published in an open machine readable format, and licensed to allow re-use.
We will follow a two-step process. First, we will use comprehensive web registries to create “skeleton” networks, a basic structure with minimal information on services and relationships. The goal is to bootstrap service networks using registries available under open licenses. Afterwards, we will query additional web data sources to extend and enrich this structure. This extension is needed since registries may not contain all the relevant services of an ecosystem.
Plano de Trabalhos - Semestre 1
(a) Study the main technologies to be used by the project (e.g., Crunchbase.com, JSON, Web API, Linked USDL, Linked Data, OSSR, and RDFS) (Setembro a Dezembro de 2014).
(b) Develop the first prototype of a crawling engine to reconstruct service networks (Minimum Viable Product) (Novembro de 2014 a Fevereiro de 2015).
Plano de Trabalhos - Semestre 2
(c) Testes the crawling engine with real data (Março de 2015).
(d) Develop the final system prototype of a crawling engine for a service networks (Março de 2014 a Jun de 2015).
(e) Documentation (running task from begin to end) (Setembro 2014 a Jun de 2015).
(f) Report writing and defense (Abril a Jun de 2015).
Condições
Este trabalho será realizado no DEI/Universidade de Coimbra.
Mestrado não remunerado.
Observações
If the results of the project are complete, sound, and innovative, it will be possible to write a final research paper to be published in a scientific outlet.
Orientador
Jorge Cardoso
jcardoso@dei.uc.pt 📩