It is increasingly acknowledged that the Digital Humanities have placed too much emphasis on data creation and that the major priority should be turning digital sources into contributions to knowledge. While this sounds relatively simple, doing it involves intermediate stages of research that enhance digital sources, develop new methodologies and explore their potential to generate new knowledge from the source. While these stages are familiar in the social sciences they are less so in the humanities. In this paper we explore these stages based on research on the British Library’s Nineteenth Century Newspaper Collection, a corpus of many billion words that has much to offer to our understanding of the nineteenth century but whose size and complexity makes it difёcult to work with.