Visualization of Library Collections

Libraries have always produced numbers about themselves — counts of volumes, subject breakdowns, circulation statistics. For most of their history these figures lived in annual reports and spreadsheets, inert and largely unread. What changed was not the data, but the capacity to see it. When researchers began applying the emerging tools of information visualization to library holdings in the 1990s, a new kind of understanding became possible: not just knowing that a collection contained 334 records on Ukrainian local history, but seeing how the language of those records shifted after 1991, how publication geography correlated with political geography, how book format varied with city population. The numbers became landscapes.

This article traces that transformation — from the foundational IEEE infovis research of the 1980s and 1990s through the visual analytics movement, the cultural heritage visualization community, and into the present, where institutions like Europeana and the Digital Public Library of America treat their collections as datasets inviting computational exploration. Three interactive visualizations accompany the text: a timeline of key milestones, a collapsible intellectual lineage tree, and a thematic sunburst map of the field.

From vision to knowledge

The intellectual genealogy of collection visualization runs through a small community of computer scientists who gathered under the IEEE banner in the late 1980s. At Xerox PARC, Stuart Card, Jock Mackinlay, and George Robertson were building systems that treated the computer screen not as a document surface but as a cognitive prosthetic — a way of extending what the human mind could hold, compare, and reason about. Their 1989 Information Visualizer introduced three-dimensional animated navigation of large information spaces. Mackinlay's earlier 1986 paper had already established that the choice of visual encoding was not arbitrary: different mappings of data attributes to visual properties (position, color, size, shape) had different expressive power depending on data type. This seemingly technical point had profound implications — it meant visualization design could be reasoned about, even automated.

"The power of the unaided mind is highly overrated. Without external aids, memory, thought, and reasoning are all constrained. The real powers come from devising external aids."

— Card, Mackinlay & Shneiderman, Readings in Information Visualization (1999)

Ben Shneiderman at the University of Maryland contributed two techniques that remain in daily use: treemaps (1992), which fill a rectangle with nested tiles proportional to a numeric attribute, originally for visualizing disk usage but immediately applicable to library subject hierarchies; and dynamic queries (1992), interactive sliders that filter a dataset in real time and update a visual display. His laboratory, HCIL, also produced LifeLines (1996) with Catherine Plaisant — multi-track parallel timelines for personal histories, applied first to juvenile justice records and medical charts, but conceptually transferable to any temporal collection of records. The 1999 volume Readings in Information Visualization: Using Vision to Think, edited by Card, Mackinlay, and Shneiderman, consolidated this work into a field.

The key concept the volume introduced was knowledge crystallization — the process by which a user, interacting with a visualization, extracts insight that would not be available from the raw data alone. The metaphor is apt: something dispersed in solution suddenly takes structural form. For library collections, crystallization meant the ability to see a collection's character — its temporal gaps, its geographic concentrations, its linguistic history — without reading every record.

The geodigital library

While the infovis community was developing general-purpose visual tools, library and information scientists were asking a more specific question: what happens when you overlay a document collection onto a map? Ray Larson's 1996 work on spatial browsing gave users the ability to retrieve georeferenced documents by drawing a region on a digital map — a simple idea that opened a significant design space. The collections most amenable to this treatment were those about places: local history, cartographic archives, ethnographic records, travel literature.

Olha Buchel's doctoral research at the University of Western Ontario (2012), conducted with Kamran Sedig, developed this direction systematically. Their prototype system VICOLEX (VIsual COLlection EXplorer) applied to 334 records from the Library of Congress Ukrainian local history holdings, integrating Google Maps with coordinated secondary representations: Kohonen self-organizing maps, scatter plots, pie charts, hierarchical timelines, and embedded maps of publication places. The system was used to analyze a collection spanning 1917 to 2007, and the coordinated visualizations revealed findings that text browsing could not: the collection was dominated by post-1991 publications (reflecting Ukrainian independence), Polish-language books about Lviv increased sharply after 1981, and Russian-language materials clustered geographically in ethnically Russian-speaking regions. Each of these patterns crystallized from interaction with the visualization, not from reading records.

The 2014 paper by Buchel and Sedig, published in Information Research, analyzed the role of interaction in this sensemaking process. Their central finding was that individual interactions — filtering, selecting, annotating, gathering — appeared insignificant in isolation but combined to substantially reduce cognitive effort. This echoed a broader principle from Sedig and Parsons's later framework (2016): that visualization design must attend not only to visual encoding but to the full space of human-information interaction, including the temporal and task-based dimensions of how users move through an information space.

Visual analytics and the scale problem

The visual analytics movement that emerged in the mid-2000s addressed a challenge the original infovis community had encountered but not fully solved: what happens when collections become too large for visual inspection even with good tools? The National Visualization and Analytics Center (NVAC), established at Pacific Northwest National Laboratory in 2004 partly in response to the intelligence community's post-9/11 information overload, articulated a new research agenda. Thomas and Cook's 2005 volume Illuminating the Path defined visual analytics as "the science of analytical reasoning facilitated by interactive visual interfaces" — a definition that elevated human reasoning, not just visualization, to the center of the enterprise.

The key distinction was the tight coupling of automated analysis with interactive display. Rather than presenting pre-computed visualizations to a human viewer, visual analytics systems let the human and the algorithm collaborate: the algorithm handles what is computationally tractable (clustering, dimension reduction, topic modeling), the human handles what requires judgment (interpretation, relevance assessment, anomaly flagging). For libraries, this meant that a collection too large to browse could still be navigated — its thematic structure extracted by a topic model, its temporal evolution animated, its geographic distribution rendered on a map — and that the visualization would update as the human interacted.

Text and document visualization

Christopher Collins, Canada Research Chair in Linguistic Information Visualization at Ontario Tech University from 2013 to 2023, developed several techniques specifically suited to document collections. DocuBurst (2009), created with Sheelagh Carpendale and Gerald Penn, arranged a document's vocabulary in a radial layout following WordNet's IS-A hierarchy — so that words appearing frequently in the document inflated their sectors, giving a visual summary of semantic content rather than mere word frequency. The structure of the display was not arbitrary but followed the deep organization of the English lexicon.

Parallel Tag Clouds (2009), developed with Fernanda Viégas and Martin Wattenberg at IBM Research, combined the visual vocabulary of parallel coordinates with tag clouds to compare facets of very large text corpora. Applied to 600,000 US Circuit Court decisions over 50 years, the system revealed regional and linguistic differences between courts that were invisible in any single-facet view. The paper received the VAST Test of Time Award in 2019. A third contribution, VisGets (2008), created with Marian Dörk, Carpendale, and Williamson, provided coordinated interactive query filters for web-based information — an early example of what would become the standard architecture for faceted collection browsing.

Generous interfaces and the cultural heritage turn

The concept most influential in shifting how cultural institutions think about collection interfaces came not from the IEEE but from Mitchell Whitelaw's 2015 essay "Generous Interfaces for Digital Cultural Collections." Whitelaw argued that the standard library interface — a search box that returns a ranked list — was structurally inhospitable to the exploratory, serendipitous discovery that physical browsing enables. A generous interface, by contrast, presents the full scope and character of a collection immediately, inviting exploration before any query is formed. Users should be able to see what a collection contains before they know what to ask for.

Marian Dörk's related concept of the "information flaneur" (CHI 2011), developed with Carpendale and Williamson, used the metaphor of the urban walker — making meaning through unhurried, curious traversal of a space — to argue for a more exploratory mode of information seeking. Both concepts influenced a generation of cultural heritage interface design, particularly in the GLAM (Galleries, Libraries, Archives, Museums) sector.

The most comprehensive survey of this work is Windhager et al.'s 2019 paper in IEEE Transactions on Visualization and Computer Graphics, which reviewed more than 100 visualization systems built for cultural heritage collections. Their analysis found that 80% used multiple coordinated views, that temporal visualization was present in 81% of systems, and that the field was moving toward richer integration of spatial, temporal, and thematic encodings. Katy Börner's work at Indiana University — the "Visualizing Knowledge Domains" survey (2003) with Chaomei Chen and Kevin Boyack, the Places & Spaces traveling exhibit (2004–present), and her Atlas trilogy (MIT Press, 2010 and 2015) — provided the bibliometric and science-mapping foundations that continue to inform collection analysis.

The three interactive visualizations below

The timeline shows key milestones from 1986 to 2019 across five categories: information visualization, library collections, Katy Börner's work, visual analytics, and digital heritage. Hover for a quick reading or click any point to open the full detail panel. The collapsible tree shows the same intellectual lineage organized hierarchically — click any branch node to expand or collapse it. The sunburst map organizes the field thematically rather than chronologically, grouping concepts by domain. All three are different lenses on the same territory.

Intellectual lineage

The collapsible tree below maps the same field as a hierarchy. Each branch represents a researcher or research tradition; each leaf is a specific contribution. Click any branch node to expand it and reveal its leaves. Click a leaf for a description. The tree makes visible something the timeline does not: that the field of visualization of library collections is not a single lineage but a confluence of at least six parallel streams that converge on the same object — the collection — from different disciplinary directions.

A thematic map of the field

The sunburst below is modeled on Christopher Collins's DocuBurst — a radial, space-filling display in which the center represents the field as a whole, the first ring its major themes, and the outer ring individual concepts within each theme. The size of each segment reflects the relative weight of that concept in the literature. Hover over any segment to read a description; click to zoom into that theme.