Piero Attanasio, Metadata management in the book supply chain
Identifiers, bibliographic metadata, thematic category schemes are at the heart of the book supply chain functioning. There are international standards for all these elements, which have allowed e-commerce to start-up in the book trade before any other sector.
Over the last 50 years, the dialogue on metadata management between the book industry and the library community was not as intensive as desirable. The challenges that the whole book world must cope with today and in the next future give us pressure to change.
Building on lessons learned from the past, the presentation will focus on some upcoming challenges, such as copyright data management and AI applications, with the aim of identifying fields for future collaboration.
Renate Behrens, Standards in a new bibliographic world – community needs versus internationalisation
Internationalisation has been an undisputed goal for many years, also in the world of cultural preservation institutions such as libraries, archives and museums. Meanwhile, the joint production, the exchange and the common use of data should be a matter of course. On the other hand, there are the needs of individual user communities, some of which are the result of traditions, others of technical conditions. Modern standards must meet these requirements. In practice, this means that they must be flexible and at the same time binding and reliable. Keaping this balance is not easy, especially as many user communities are used to fixed rules. New modular procedures are needed that take into account the needs of the communities but are based on a common core set of requirements. However, it will not be enough to look at this aspect only from the perspective of the standards. Rather, the education and training of metadata specialists must also be adapted to the new conditions. Fulfilling this task will be a major challenge in the near future, forced not only by the development of new standards but also by economic factors. It will be of great importance not to work out isolated solutions but to involve as many metadata producers from different areas as possible right from the start.
Traditional UBC provides for the standardization of bibliographic records, the creation of guidelines dedicated to national bibliographic agencies, the creation of the UNIMARC format, and the curation of authority data. Three conditions are essential to achieve UBC's objectives: a widely shared canon of principles, standards and practices for the formulation and structure of cataloguing data; each national bibliographic agency must fulfil its responsibilities in a comprehensive and standard manner; and an infrastructure to support the efficient exchange of data between national bibliographic agencies must be created. Bibliographic Control has deeply evolved since IFLA theorization during the Seventies of the XX Century, due to the availability of a very large range of new bibliographic tools. At the beginning of the XXI century, UBC is quite different and involves new actors. Among these, Wikidata has a background greatly different from that of libraries as institutions: it is not devoted to bibliographic data, nor it is limited to personal authority control, but its value in AC tools like VIAF and National Libraries authority files is undiscussed. In fact, whereas in Wikidata items both identification and description of entities are performed, one of its most relevant features seems to be that “the traditional distinction between authority and bibliographic data disappears in a Wikibase description” (Godby et al. 2020, 8). How and in which measure does Wikidata items describe and identify bibliographic entities? Will the existence, use and reuse of Wikidata affect the way the professional community thinks about UBC?
How we manage bibliographic data is drastically changing. Over the decades best efforts have gone into the development of MARC to adapt to the evolution of resource description theory while still maintaining capacity for interchange. With the shift away from MARC underway, there is increasing opportunity to test theory and develop best practices. Efforts to simultaneously define models for creating native linked data descriptions and crosswalk these models with MARC have resulted in ontological differences and unique extensions. From the outside looking in this may look more like bibliographic chaos than control. This apparent chaos, and the associated experimentation is important for communities to chart a path forward, but also points to a critical challenge ahead. Ultimately this collaborative innovation must be harnessed and consolidated so that open standards development supports the interoperability of library data. This analysis will focus on modelling differences between RDA and BIBFRAME, recent attempts at MARC to BIBFRAME conversion, and work on application profiles, in an attempt to define shared purpose and common ground in the manifestation of real world data. The discussion of these areas will be framed by a particular focus on the balance between core standards development (RDA, MARC, BIBFRAME) and community based extensions and practice (LC, PCC, LD4P, Share-VDE), and the need for a feedback loop from one to the other.
Vincent Boulet, Towards an identifiers' policy: the use case of the Bibliothèque nationale de France
Identifiers are at the crossroads of two interconnected, major evolutions which heavily impact national libraries: the massification of dataflow, redrawing the place libraries occupy within the global and national data ecosystem in a shared environment, and the strategic shift towards entity management underlying behind the new professional practices and standards. Based on the experience and maturation libraries are gaining in this field, the time maybe has come to formalize them and to highlight the impressive strike force libraries could have in a highly competitive landscape. This is the aim the Bibliothèque nationale de France is trying to reach by publishing an identifiers’ policy. It comes as the last part of a triptych after the new cataloguing policy (2016, including the indexing policy published in 2017) and the quality policy (2019). This identifiers’ policy is intended to clarify why and on what grounds a national library could, more or less, get involved in a given identifier, taking into account the diversity of scope, governance structure and business model of identifiers, be they international (for instance : ISNI, ISSN, ARK) or local (for instance: the BnF proper identifiers). Therefore, the identifiers’ policy highlights why it is necessary to use permanent, trustworthy identifiers and to what extent they are helpful in the daily working and quality control processes led by cataloguers. This is why the identifiers’ policy is not limited to principles, but has a very concrete dimension, both for internal and external issues.
Thomas Bourke, Bibliographic control of research datasets: reflections from the EUI library
The exponential growth in the generation and use of research data has important consequences for scientific culture and library mandates. This paper explores how the bibliographic control function in one academic library has been expanded to embrace research data in the social sciences and humanities. Library bibliographic control (BC) of research datasets has emerged at the same time as library research data management (RDM). These two functions are driven by digital change; the rise of the open science and open data movements; library management of institutional repositories; and the increasing recognition that data sharing serves the advancement of science, the economy and society. Both the research data management function and the bibliographic control function can be enhanced by librarians’ awareness of scholarly projects throughout the research data lifecycle (input, elaboration and output) – and not only when research datasets are submitted for reposit. These library roles require knowledge of data sources and provenance; research project context; database copyright; data protection; and supporting documentation. This paper suggests that by creating synergies between the research data management function (during research projects) and the formal bibliographic control function (at the end of research projects) – librarians can make an enhanced contribution to good scientific practice and responsible research.
Michele Casalini, The future of bibliographic services in light of new concepts of authority control
In the last three decades, some major changes in the field of cataloguing are leading to the definition of new forms of authority control: the introduction of FRBR and then of IFLA LRM, the study over the years and, recently, the implementation of linked data in library catalogues, as well as the improvement of data models aiming at ensuring as broad as possible systems interoperability.
A new approach to authority control and its connected services can be based on the joining of manual and automatic processes of data validation and enrichment togheter with the use of knowledge bases as authoritative sources. This will also grant a wider data interoperability which opens to a new level of cooperation among international institutions and organisations that care about the dissemination of knowledge.
John Chapman, Building a shared entity management infrastructure: moving from promise to production
Over the last decade, OCLC has joined a number of national libraries and library organizations in publishing linked data. With the explosion of different efforts and initiatives, attention now turns to the ways in which the metadata can be maintained and kept vital over the long term. In January 2020, OCLC received a 2-year grant from The Andrew W. Mellon Foundation to create a shared "entity management infrastructure" to provide libraries with a robust platform to create, maintain, use, and reuse entity descriptions for persons and creative works. In this session, we'll discuss the reasoning behind the initiative, the progress of the effort, and what it suggests for the future of metadata work in libraries.
After an overview of the current situation of the digital newspaper library of Biblioteca Nazionale Centrale di Roma (BNCR) and on the most important digitization projects in which BNCR is involved, the presentation – starting from a reconnaissance launched in December 2019 (RIDI, Riviste digitali e digitalizzate in Italia) – will focus on those bibliographic resources which are often not considered as unitary, such as free access online journals and digitizations of previously printed publications. The presentation will highlight the growing need for a library to give access to the vast world of open-access digital publications along with its physical heritage. It will then illustrate the state of the art in Italian open access periodicals, both digital native, continuations, or parallel editions of previously printed publications, as well as some examples of bibliographic records in the SBN (Italian National Library Service) OPAC related to publications with both printed and digital editions. Finally, the main Italian and international digital cases will be illustrated, highlighting the problems of coordination of the various initiatives to improve the quantitative and qualitative offer of products, presenting a model that provides the elements for a qualitative standardization of images, data, metadata, bibliographic stories of publications, in order to build a quality information network for the national digital newspaper library of the future.
Gordan Dunsire, Bibliographic control in the fifth information age
Bibliographic control is concerned with the description of persistent products of human discourse across all sensory modes. The history of recorded information is punctuated by technological inventions that have had an immediate and profound effect on human society. These inventions delimit five 'information ages' . The first starts with visual recordings in the form of cave paintings and portable carvings and ends with the invention of writing. The second age ends and the third begins with the invention of printing. The fourth starts with the development of high bandwidth long-distance telecommunication and ends with the invention of the world-wide web. We are now in the fifth information age. It is characterized by the ubiquitous use of powerful portable devices, the internet of things, for peer to peer communication across the entire planet. It is a fundamental necessity of digital communication technologies that all such discourse is recorded during transmission. Nearly all such products of human discourse are copied to persistent storage media and retained for a variety of reasons, such as the cost of deletion exceeding the cost of retention. The roles of cave 'artist', scribe, printer, publisher, encoder, broadcaster, librarian, and other mediators are no longer differentiated from 'author' . The creation of a work, its expression, and manifestation by a single person is now far more prevalent than the distribution of these activities between different individuals and groups. A product of the act of creation can be completed in a few seconds, and its creator may not be aware of how it is produced. In the fifth information age, the end-user is immersed in and interacts with a global ocean of recorded information. The interaction is continuous and ubiquitous, and never passive. Every interaction increases the volume of data; all aspects are recorded, including the time, place, and nature of the interaction, and details of the ‘reader’ and their ‘book’. The distinction between data and metadata, the book and its catalogue card, is completely blurred; the same infrastructure and processes support both. The sequential certainty of data followed by metadata no longer pertains: data becomes metadata as soon as an information resource is named by its creator. A full-text index of a textual resource is metadata with the same data content but a different structure and context. The end-user is an individual with multiple, shifting contexts for their information retrieval activity. Personal expectations of the mechanics of retrieval evolve as rapidly as the intermediation devices, but are governed by cultural norms. The challenge for bibliographic control is the reconciliation of globalization and personalization via localization. The infrastructure of the fifth information age provides the means to meet this challenge, but the ecosystem is very different and the role of the professional cataloguer must evolve to successfully interact with the activities and imploded roles of the end-user. The seeds of the fifth information age were planted in the 1960s, and the last 50 years have seen profound changes in the mechanics and philosophy of bibliographic control. This paper will discuss the characteristics of the evolutionary pressure on the future of bibliographic control and speculate on what it might look like in another 50 years.
Pierluigi Feliciati, Call me by your name: the potential of cross-domain sharing of authority records control
An important and not often addressed topic – considering the difficulties in working on cross-domain projects - is the shared control of authority records, extending bibliographic control to other areas of documentary and humanistic sciences. In this intervention will be examined the potential opened by multidimensional and networked logics of the representation of information entities towards which the document communities are converging. RDA and RiC, above all, stimulate the collaboration to develop meta-ontologies and reference models to facilitate the sharing (and control / enrichment) of authority records in the form of RDF assertions. The lines of research could start from the names of persons, corporate bodies, places, chronological contexts, qualifying the relationships between them. This approach would be valid especially if we consider the users' point of view, now forced to jump from one information environment to another, having to confront different names, forms and attributes for the same entities.
The DREAM project is a large research project founded by Sapienza University of Rome, dealing with bibliographic data in non-latin scripts. As the National Bibliographic Service catalogue (SBN) does not yet manage data in non-Latin scripts, the aim of DREAM is to offer researchers a catalogue searchable through original scripts (such as Arabic, Chinese, Russian, etc.). One of the most remarkable features of the project is the creation of an ILS-independent working context in which the cataloguer may find and retrieve data in original script from authoritative catalogues, starting from the existing romanized ones. From a technical standpoint, the ever increasing Unicode support offered by modern operating systems, DBMSs and indexing engines makes the rapid development of the relevant software tools a concrete possibility. This in turn implies a shift in scientific focus towards the (often subtle) record linkage operations between different data sources. The authors hope that the DREAM project will gather the adhesion of other Italian libraries that perceive the same needs. Furthermore, as soon as SBN will support the management of data in non-Latin scripts, the DREAM project partners will be able to contribute with their data.
Legal deposit, regulated by Law no. 106 of 15 April 2004 and Presidential Decree no. 252 of 3 May 2006, requires Italian publishers to deposit a copy of the published material with several libraries. Legal deposit involves long-term preservation and access to information on various media, not least computer networks. While traditional media are well regulated, digital legal deposit rules – albeit being of strategic importance – are barely sketched out. The National Central Library of Florence (BNCF), in cooperation with the National Central Library of Rome (BNCR) and the Marciana National Library of Venice, created Magazzini digitali: a digital legal deposit project that allows the deposit through harvesting of doctoral theses – the only one required by law – and e-journals produced by universities and research institutions, in addition to the deposit of ebooks and commercial journals. BNCF has also started a remarkable web archiving activity a couple of years ago. Thanks to a collaboration with Horizons and Giunti, BNCR has started an experimental deposit of ebooks. They will be stored and described in repositories, then made available through MLOL. While awaiting the regulation on digital legal deposit, it is urgent to reopen the debate on this issue and make more effective the collaboration between institutions involved in the management of the digital library heritage, so as to establish a coordination structure that will define the scientific guidelines and the appropriate technological and service choices.
Fulvio Guatelli, Maximising dissemination and impact of books: the scientific cloud
Metadata has become the protagonist of scientific communication. For example, let us think about Aristotle: he is indeed a very popular historical figure, and yet we do not know everything about him. So much so that even his best-known book “Physics” is a text reconstructed a posteriori, centuries after his death. Well, if we had Aristotle’s ORCID and the DOI of “Physics” we would instead have two perfectly defined entities, which can be reproduced and operated by a machine capable of carrying out countless services. The content of a publication, that is, what we discuss, evaluate and judge, is no longer the alpha and omega of a scientific publication or its exclusive centre of gravity. Books are gradually taking the form of an iceberg, whose emerged part is represented by the content, while the immersed part is reflected by metadata. In other words, in the current communication approach, the metadata and the dissemination of scientific discoveries go hand in hand, as the metadata of a work is also responsible for its success. In my presentation I will illustrate how, in the field of scientific publishing, best practices, simple metadata and cataloguing indicators such as DOI and ORCID are replacing what once were the chariots pulled by sturdy horses coming out of the workshop of Aldo Manuzio and bringing his books all over the world.
Klaus Kempf, The bibliographic control of music in the digital ecosystem: the situation in Germany-Bavaria. The case of the Bayerische Staatsbibliothek
The BSB’s music department (entrusted since 1949 with the management of the national information service on music) is one of the largest music libraries in the world in terms of the size and quality of its collection, but also in terms of the breadth and depth of its collection acquisition policy. The various materials are widely catalogued and indexed in a very articulate way, using a wide range of catalogues and according to specific rules. The BSB currently uses the RDA and MARC21, according to national policies.
The Gemeinsame Normdatei (GND), the authority files of the German-speaking library world, are used both in cataloguing and in subject classification. The GND is nowadays used even outside the library world by archives, museums and other kinds of institutions, as well as for the cataloguing of websites.
The BSB participates in the RISM (Répertoire International des Sources Musicales) international online catalogue of music sources, and, together with the Staatsbibliothek zu Berlin, manages its OPAC.
The presentation will describe these projects, as well as the cataloguing workflow, the application of the RDA in specific cases, the special rules (and cataloguing system) for personal archives and musical legacies (RNA), and finally the futuristic service ‘musiconn’. This last service is included in the national service for music information Fachinformationsdienst Musikwissenschaft and has been developed by the BSB: it offers the possibility to search by melody, as part of a project based on Optical Music Recognition (OMR), a software tool that allows automatic recognition of compositions after they are printed.
Françoise Leresche, Rethinking bibliographic control in the light of IFLA LRM entities: the ongoing process at the National library of France
When IFLA defined the concept of Universal Bibliographic Control (UBC) during the 1960s, the objective was to describe all resources published worldwide and split this task internationally by developing tools (such as the ISBD and UNIMARC) for the exchange of descriptive metadata. Today libraries are aiming to build web-oriented catalogues, based on the IFLA LRM model: when the ISBD “resource” is split into the WEMI entities, it seems necessary to adopt a new approach toward UBC and to define new criteria.
The BnF has initiated this process. This paper presents which criteria engage BnF’s responsibility as a provider of reference metadata identifying an instance of a WEMI entity or an agent. It also presents the quality approach developed by the cataloguing staff in order to reach its objectives and answer the various needs of the metadata users, in a context where the diversity of metadata sources is modifying traditional cataloguing methods. It also investigates the consequences implied by the various stages of the implementation of IFLA LRM by libraries on the exchange of metadata, and concludes with a commitment to maintain the distribution of reusable metadata for all libraries during a period still to be defined.
Anna Lucarelli, Thesauri in the digital ecosystem
In recent years, thesauri have taken on new roles, new functions, and have shown some advantages over other knowledge organization systems (KOS).
Thesauri are inventoried, assessed for their performance, for their connection to standards and for their level of semantic coverage. They have certainly proved to be dynamic tools, essential components for the integration of data on the web and for the development of mapping, as well as for the interoperability between heterogeneous resources. They have evolved more than other KOS, thanks to the adoption of the formats of the semantic web (such as RDF/SKOS), which make their free reuse possible, even in different frameworks. They have increased both the multilingualism and the implementation of conceptual equivalences - to varying degrees - among various subject indexing languages. Thanks to such equivalences, thesauri enable the connection between information and metadata produced by institutions of different countries. As authority control systems, they interact with Wikidata and help build ‘bridges’ between worlds that were too far apart, until not long ago: libraries, archives and museums. Will the challenge of search engines, machine learning and artificial intelligence override them or will it make them even more involved?
Italy actively participates in this adventure, with the general Thesaurus of Nuovo soggettario realized by the National Central Library of Florence.
Andrew MacEwan, The International Standard Name Identifier: extending identity management across the global metadata supply chain
The presentation describes how ISNI is being adopted as a common identifier across disparate sectors of publishing. Whether publishing and distributing recorded music, film or text ISNI is making good identity management a staple element in the global metadata supply chain. As the content creation industries become more engaged with the value of embedding good metadata from the point of publication libraries can look forward to benefitting from a truly global revolution in the metadata supply flow. A case study describes how a British Library project has taken ISNIs already in the British National Bibliography and cross-matched them with data from UK publishers own databases to embed ISNIs into the book supply chain. It also describes plans for ongoing publisher engagement through implementation of ISNI assignment into its cataloguing-in-publication workflows for UK legal deposit.
Sabina Magrini, “Ti racconto in italiano”: management, description and indexing of oral sources. A project by the ICBSA (Istituto Centrale per I Beni Sonori e Audiovisivi)
The Istituto Centrale per i Beni Sonori e Audiovisivi (ICBSA) has just finished, together with the Università degli Studi di Siena and Università per Stranieri di Siena, to work on the project “Ti racconto in italiano” which focuses on providing different access points to audio resources collected in the 80’s by ICBSA itself, as part of its mission to document Italian audio and audiovisual culture.
The corpus of audio resources employed comprises 36 unpublished interviews with Italian personalities belonging to the world of poets and writers, business entrepreneurs and visual artists and architects. The main aim of the project is to create tools which will enable scholars of social history, art and literature to use these sources as well as providing original material for foreign students – at B1-B2 level – to exercise their knowledge of Italian. In order to facilitate access it has been necessary to create finding aids such as indexes and thesauri. For this purpose ICBSA has started a collaboration with the Biblioteca Nazionale Centrale di Firenze in order to employ and widen the spectrum of the areas covered by the latter’s Nuovo Soggettario. This is not the first case of a project comprising the use of the Nuovo Soggettario for the indexing of archival materials. Indeed, the Soprintendenza Archivistica e Bibliografica della Toscana has already worked in this direction a few years ago when treating the so-called Straw archives.
The ICBSA experience confirms the enormous potential of this approach.
The principles and conceptual models of universal bibliographic control and those of the semantic web share the common goal of organising the documentary universe by highlighting the relevant entities and mutual relationships, in order to ensure access to knowledge as wide as possible. This drives a deep change in the entire information chain, from the analysis and structuring of the data to their dissemination and use.
From the construction of bibliographic data models the point of view, the semantic web paradigm pushes the boundaries of the exchange of records among relatively homogeneous cataloging systems and opens a transversal dialogue between different actors and systems, in a digital ecosystem that is not contained within cultural, linguistic, geographical or thematic limits.
In this context, it’s necessary to dialogue with heterogeneous communities, more or less authoritative, driven by the web and often created by institutions or groups of users quite different from the ones which the cataloging tradition was accustomed with. The free reuse of data can take place in contexts also very different from those of origin, multiplying for everyone the opportunities for universal access and the production of new knowledge.
Can different cataloging traditions coexist in such a changed context and integrate without losing their information value? Based on some recent experiences, this appears to be possible.
Paola Manoni, "Discoverability" in the IIIF digital ecosystem
The IIIF APIs have been used since 2012 by a community of research, national and state libraries, museums, companies and image repositories committed to providing access to image resources. The IIIF technical groups have developed compelling tools for the display of more than a billion IIIF-compatible images.
We can figure out that with hundreds of institutions participating worldwide, the possibilities, for instance, for IIIF-based scholarship are growing so one question could be about the discovery of those images relevant to one’s research interests in order to discover them for their consultation or, even more, for their reuse.
While IIIF specifications discussion has focused on the machine-to-machine mechanisms of making IIIF resources harvestable, we have yet to implement an end-to-end solution that demonstrates how discovery might be accomplished at scale and across a range of differing standards for metadata arising from libraries, archives, and museums.
The coexistence of separate authority files within the main databases managed by the ICCU for entities of the same kind is going to be superseded in the new SRI portal through the integration, at the level of cooperative application, of the authority files for EDIT16 and Manus OnLine with those of SBN. The clustering of authority files, made possible through batch procedures and services provided by the applicative protocol SBNMARC, is intended to the development of browsable links between different representations of the same entity. The presence of identifiers and link keys between informative objects is therefore crucial to match data from the specialised databases EDIT16 and Manus OnLine, stored in the digital aggregator Internet Culturale, and shared through the collective catalogue SBN, with diverse quality and model but referred to the same resources and entities. The cluster of entities will be built upon the SBN Index, according to the quantity of data already available in its authority file and to exploit existing services and infrastructures which make shared cataloguing possible. SBN will also provide the spine of the integrated representation of entities through the public access platform of the new portal.
Elisabeth Mödden, Artificial intelligence, machine learning and DDC Short Numbers
In the German National Library, both verbal and classificatory subject cataloguing are used for subject indexing. In the course of introducing automated subject cataloguing procedures, we also work on the automated assignment of Dewey Decimal Classification numbers. For this purpose, a set of abridged DDC numbers, based on, but not limited to the DDC Abridged Edition and referred to as DDC Short Numbers, is being developed. First experiences in the automatic assignment of abridged numbers were gained with regard to DDC class 610, medicine. Since 2005, medical dissertations have been classified using a set of 140 DDC Short Numbers. Since 2015, these short numbers have been assigned automatically by utilizing artificial intelligence. Short Number sets for other DDC classes are currently being developed. We are planning to extend the automatic assignment of short numbers to all subjects, constantly reviewing the process and its results. In this presentation I will shed a light on how artificial intelligence is used in this process; furthermore, I will talk about the challenges posed by the development of short numbers and machine-based classification for different scientific fields. In addition, the questions of labeling machine-assigned notations, data delivery and quality management will be addressed as well.
Oddrun Pauline Ohren, The Norwegian national bibliography. Policy and services
The operation of National Library of Norway (NLN) is governed by the Legal Deposit Act of 1989, latest amendment in 2015. By this law, all documents of any type made publicly available in Norway, must be provided to the National Library for registration, preservation and dissemination. According to an added regulation in 2018, NLN may also require the digital version of printed documents, as well as core metadata. Another important policy document is issued by The Ministry of Culture and The Ministry of Education and Research, outlining a library strategy for the period 2020-2023. While including all types of libraries, the strategy has a strong focus on NLN as a driving force and service provider for the rest of the Norwegian library sector, in mandating NLN to support other libraries in a number of ways, - financially through funding development projects, structurally by way of providing crucial infrastructure and developmentally by conducting our own innovation activities. The national bibliography forms the backbone for most of the infrastructure services, like the national repository of bibliographical data, constituting one single authorized source of metadata for Norwegian libraries, various authority files, as well as several thematic bibliographies. It also lies at the heart of enabling end users to access the vast collections of digitized material, even much of the IPR-restricted material, obtained through deals with rightsholder associations.
Tessa Piazzini, Bibliographic control and institutional repositories: welcome to the jungle
In 1994 cognitive scientist Stevan Harnad made what he defined a “subversive proposal” to his colleagues: «immediately start serf-archiving their papers on the Internet». Since then, institutional repositories have been chaotically developing, alongside disciplinary repositories. In the early XXI Century the public debate was centered on their purposes and therefore on what they were supposed to contain; librarians joined the discussion and contributed to it by implementing descriptive standards such as Dublin Core and interoperability protocols (OAI-PMH). The themes under discussion were closely related to bibliographic and authority control, given that the quality of metadata has a profound impact on the quality of the services offered to users. Presently, we are still trying to answer some of those old questions: what (or whom) are IRs for? Is bibliographic control so necessary within an environment that has never failed in self-archiving? Can we consider IRs a bibliographic tool? We also need to deal with a wider vision: in a scenario that saw the transition from OPACs (created, managed and controlled by librarians) to current discovery tools (with their information redundancy and the related problems on data correctness and quality control) can librarians still be authoritative and act effectively?
Nathan Putnam, VIAF and the linked data ecosystem
In surveys looking at sources of linked data, data within The Virtual Internal Authority File (VIAF) ranks near the top in terms of usage and consumption. In the 2018 International Linked Data Survey for Implementers, 51% of responders consumed VIAF data, topped only by id.loc.gov usage at 57%. Given the prevalent use of this data, this talk looks at the VIAF linked data in the cultural heritage ecosystems and explores potential futures for the VIAF data.
Pat Riva, The multilingual challenge in bibliographic description and access
Cataloguing has taken many steps towards greater internationalisation and inclusion, but one area remains stubbornly intractable: providing transparent access to users despite differences in language of descriptive cataloguing and language of subject access. As constructed according to present cataloguing practices, bibliographic records contain a number of language-dependent elements. This may be inevitable, but does not have to impede access to resources for a user searching in a language other than the language used for cataloguing. When catalogues are set up as multiple unilingual silos, the work of bridging the language barrier is pushed onto the user. Yet providing access through metadata is supposed to be the role of the catalogue. While a full theoretical approach to multilingual metadata is elusive, several pragmatic actions can be implemented to make language less of a barrier in searching and interpreting bibliographic data. Measures can be applied both in the creation of the metadata, and in adjusting the search. Authority control, linked authority files, and controlled vocabularies have an important part to play. Examples of the problem and the approaches will be drawn from a newly established catalogue shared by a consortium of English language and French language university libraries in Québec, Canada.
Philip Schreur, "I'm as good as you": the death of expertise and entity management in the age of the Internet
In 2017, Tom Nichols published a book entitled The Death of Expertise: the Campaign Against Established Knowledge and Why it Matters. In his book, Nichols details how the rise of the information age through the ubiquitous presence of the Internet has created a misguided intellectual egalitarianism. The shift towards linked data within the cataloging world is creating a similar tension. Authorities and authority record creation have been hidden behind a nearly impenetrable screen of complex instructions (RDA) and communication formats (Machine Readable Cataloging, or, MARC) known only to libraries. With the rise of Google and Wikidata as authoritative sources of information, and linked data as the new, global means of communication for the Semantic Web, the structuring of data has become politicized. In his book, Nichols includes a quote by the Devil from C. S. Lewis’ Screwtape Letters, “I’m as good as you … is a useful means for the destruction of democratic societies.” As libraries embrace the transition to linked data and the freedom it creates, what ultimately will become the balance of trust and provenance between established knowledge and the rise of populism.
Osma Suominen, Annif and Finto AI: developing and implementing automated subject indexing
Manually indexing documents for subject-based access is a labour-intensive process that can be automated using AI technology. Algorithms for text classification must be trained and tested with examples of indexed documents, which can be obtained from existing bibliographic databases and digital collections.
We have created Annif, an open source toolkit for automated subject indexing and classification. Annif is multilingual, independent of the indexing vocabulary, and modular. It integrates many text classification algorithms, including Maui, fastText, Omikuji, and a neural network model based on TensorFlow. Best results can often be obtained by combining several algorithms. Many document corpora have been used for training and evaluating Annif. Finding the algorithms and configurations that give the best quality is an ongoing effort.
In May 2020, we launched Finto AI, a service for automated subject indexing based on Annif. It provides a simple Web form for obtaining subject suggestions for text. The functionality is also available as a REST API. Many document repositories and the cataloguing system for electronic publications at the National Library of Finland are using it to integrate semi-automated subject indexing into their metadata workflows. In the future, we are going to extend Annif with more algorithms and new functionality, and to integrate Finto AI with other metadata management workflows.
Richard Wallis, Follow me to the library! Bibliographic data in a discovery driven world
Libraries are generally welcoming organisations and places. Engaging with communities, inviting all comers to immerse themselves in the information rich environment curated for the benefit of all, from the entertainment seeker to the educational specialist. Traditionally this immersion would take place in open welcoming impressive buildings at the heart of the town square or university campus.
However, as witnessed by the phenomena of the declining town centre and the lockdown Zoom culture of 2020, traditional routes to resources are changing rapidly. In the online discovery and delivery world that has emerged, metadata especially quality metadata, about resources and information is key. Without a detailed understanding of available resources, it can be difficult if not impossible to direct them towards those that might benefit from reading, watching, analysing, interacting with, or purchasing them.
In the field of metadata about resources, the library and associated communities are somewhat unusual. We have been focusing on the production of bibliographic and similar quality metadata, for decades. Significant local and global efforts have been applied, often using the leading technologies of the time, resulting new international data standards and exemplar projects demonstrating their benefit. Libraries are rightly recognised as leaders in the field of metadata.
Equally unusual however, is the way that metadata is applied. Whereas other sectors often utilise their metadata to make others aware of the resources that may be useful to them; libraries tend keep their metadata only for the use of themselves and their peers. Whereas other sectors wishing to attract consumers to their resources, are sharing metadata in open cross sector standards with global discovery services; libraries are focusing on the development of data standards only of use to others in the library sector and their localised search interfaces.
It can quite rightly be argued however, those open cross sector standards consumed by global discovery services are not of sufficient quality and granularity to support the needs and processes of libraries and associated communities. On the surface libraries are presented with a metadata dichotomy: change direction and lower standards to be relevant in an increasingly discovery-driven global environment or; continue to invest in quality library focused standards, improving user services and experience in an internal world increasingly bypassed by a global discovery-driven environment.
Fortunately, semantic web and linked data technologies that underpin major developments, in library metadata (BIBFRAME) and structured data consumed by global discovery services (Schema.org), serve to translate this dichotomy into a surmountable challenge. For both BIBFRAME and Schema.org, the key to quality data is the description of resources as an interlinked collection of entity descriptions - people, places, works, etc. Linked data technologies provide the tools that enable an entity, once defined and described using one vocabulary, such as BIBFRAME, to be described using another, such as Schema.org, using an agreed rule set.
It is the definition of such rules, and associated example processes, that is the focus of the Bibframe2Schema W3C Community Group. The initial objectives of the group are to create a reference mapping from BIBFRAME 2.0 to Schema.org and sharing of reference software implementation(s). The implicit ambition being to enable the possibility of standard techniques to share a bibliographic view of resources that will guide users from their daily discovery universe towards resources curated by libraries.
By building upon the important and significant investments in the development and implementation of BIBFRAME linked data; Bibframe2Schema.org will enable, with small investment by comparison, the additional sharing of data describing and locating of resources in a form easily understood, consumed, and linked into the de facto global discovery environment. Thus, creating a linked trail of metadata breadcrumbs to be followed to the library.
Paul Weston, Should catalogues wade in open water?
In recent years, libraries, either on their own or in consortia, have carried out digitization projects which resulted in establishing criteria to make digital items accessible through the catalogue. Pushing the boundaries of the latter, cataloguers have considered the possibility of providing access to the digital version of a book whenever available in the public domain. As timely as ever in the aftermath of access restrictions due to Covid-19, librarians have started to question whether the catalogue, moving past the idea of being just a citational tool, should open itself to the web as the place where users, thanks to quality data, can gain easy access to freely available digital bibliographic material. This should include digital publishing, such as that provided by Gutenberg Project, LiberLiber or Biblioteca Italiana, as well as DH projects, all of which are based on editions published in printed format. This scenario urges to find quick policy answers: a) how should features which could act as search keys or filters be adequately described; b) how should flexibility and changeability of digital objects be dealt with; c) how traditional cataloguing procedures should change as a consequence of the number and the peculiarities of these items; d) which criteria should be adopted in marking the new border lines of the library catalogue mission.
Paolo Wos Bellini, The Italian national bibliography today
The statistics on the records produced for the Italian National Bibliography (BNI) in the last decade evidence a stable development with a growing trend. In the face of that, there has been a decrease of human resources never seen before in the history of the National Central Library of Florence (BNCF), that has drastically reduced the editorial staff of BNI to few units. The institutional tasks of BNCF, provided by law, have not changed though. Among these is the archival function for the Italian bibliographic production and its representation through adequate cataloguing and bibliographic instruments.
Therefore, in order to either maintain constant or increase BNI from a quantity point of view, by preserving its quality, some variations of a technical and organizational-managerial nature have been recently implemented, pursuing the following: 1) rapidity of cataloguing; 2) full implementation of the recommendations of the Central Institute for the General Catalogue of the Italian Libraries and for the Bibliographic Information (ICCU) as regards, for instance, cataloguing regulations, use of the codes of bibliographic qualification, creation and management of the authority files regarding personal names and uniform title; 3) constant attention to the role of BNCF in the cooperation within the National Library Service (SBN).
There are plans to intervene in some critical difficulties that still remain. The emergency due to Covid-19 pandemic has imposed a de facto meaningful and unpredictable reorganization of the work management (from the legal deposit to BNI and to the BNCF catalogue) through methods that could certainly be maintained even in future.