PhD Thesis Final Defense to be held on July 26, 2017, at 10:00

The examination is open to anyone who wishes to attend.

Thesis Title: Modeling, Analysis and Diversification of Legal Information


Information society poses new threats to the legal informatics discipline, mainly due to the volume and complexity of legal data. In this context, legal information management and dissemination, legal complexity, techniques facilitating users in seeking legal information, and methods to encourage citizens’ participation in regulatory planning activities are challenging research issues to be addressed.
This doctoral thesis reports upon studies for a) legal sources management with semantic standards; b) modeling civil law as a complex network, c) application of diversification methods for legal information retrieval, and d) application of diversification methods for public consultation texts and social networks.
We present a novel methodology that acquires a semantic representation of legislation, from unstructured formats, by expressing legal documents structure in the form of a set of syntactic rules, i.e., a domain-specific language for legal documents. Since legal documents are usually disseminated in unstructured formats, it is advisable to transform them to another format, suitable for modelling legal sources, capturing the internal organization of the textual structure and the legal semantics, interlinking them based on discovered legal references and classifying them. The above has been integrated on legal document management platform aiming to improve access to legal sources by offering advanced modelling, managing and mining functions. The platform has been successfully deployed in a public sector operated production environment, providing citizens semantic access to Greek tax law.
We also propose a novel approach to model civil law collections as a complex network. We applied our approach on the European Union legislation corpus and identified otherwise, hidden organizing principles of the legislation corpus, interpreted the influence of the network structure to individual legal sources and quantified the relative importance of a legal source within the legislation corpus. Among others, legal sources have a strong tendency to connect with legal documents of the same type, forming clusters of the same sector. Communication between highly clustered areas of sparsely connected nodes is maintained by a few hubs, since the Legislation Network is also highly heterogeneous with respect to the number of edges incident on a node and in particular it is a small world power law network. The origin of this heterogeneity may be derived by the preferential attachment process, which amplifies the popularity of highly ranked sources. Further, we studied the temporal evolution of the legislation corpus and evaluated its tolerance to errors, by performing a resilience test. Our approach aims to improve the efficiency of the legal system and future research directions can be built on our findings.
Additionally, we address diversification of results in legal search as a means of assisting user’s searching for useful information in a huge amount of legal data. For example, a lawyer preparing his/her arguments for a given case will find more informative and helpful a diverse result, i.e., a result containing several claims, varying in the type of court and other characteristics, than a set of homogeneous results that contain only relevant cases with similar features. We adopt several state of the art methods from the web search, network analysis and text summarization domains. We also look at the contribution of legal sources diversification criteria, which we also incorporate into the algorithms. We provide an exhaustive evaluation of the methods and criteria in a variety of settings, using real collections of legal documents, from different legal systems, that we objectively annotated with relevance judgments for this purpose, using widely accepted metrics, offering balance boundaries between reinforcing relevant documents or sampling the information space around the legal query
Also, taking into consideration citizen’s involvement in regulations through public consultation, as well as the widespread use of social networks, we address result diversification on user comments/microblog post. Towards this direction, we define comment and microblog posts-specific diversification criteria and apply them on heuristic diversification algorithms. We perform an experimental analysis showing that the diversity criteria we introduce result in distinctively diverse subsets of user’s posts.

PhD student: Koniaris Marios

Supervisor: Vassiliou Yiannis (Professor Emeritus)