AI Powered Archive Indexing: 6 Week - Discovery

Peak Indicators Limited

Azure architecture, leveraging Azure Cognitive Services to scan historical archives to build an AI Search Index in the cloud for easy human searches.

Peak Indicators (now TPXimpact Data & Insights) is a part of a digital transformation company on a mission to build a future where people, places and the planet are supported to thrive. This solution improves accessibility across heritage collections, using AI.

What's the problem?

  • Linking together historical collections (including artworks, artefacts and texts) is difficult as they are often unstructured and difficult to organise into a database.
  • Search capabilities are lacking because metadata (such as author, artist or date of creation) often appears inconsistently across collections, which could help categorise and link pieces, without manual effort

The solution needs to enrich, index and interlink the historical content, so users can access all collections from a single point of search, whether that is online or through consoles at visitor sites. This will empower users to search across multiple historic collections using cloud-based AI and digital technologies from Microsoft.

Here’s how it works:

  • We use AI techniques, like Natural Language Processing (NLP), to assign tags to items and convert audio/video recordings into text. NLP also identifies additional metadata to build new links between items across collections.
  • We use Azure Cognitive Search (from Microsoft) to index and provide querying capabilities that supported rich user search experiences. These include autocomplete and auto suggest capabilities, as well as being able to identify similar items through metadata and semantic links.
  • We also use Azure Data Factory and Azure Data Lake Service to manage the flow of information across collections and storage. With this, we create an automated, end-to-end solution that scales with the collection. Any new content added will flow into a central storage space (data lake) and become easily accessible via search.

What's the impact?

  • The solution scans, deduplicates, automatically tags and creates a searchable index covering high volumes and variety of historic archives - removing the manual effort required
  • The data is imported into a modern, future proof cloud system, with support for a variety of endpoints (website, mobile, interactive displays) - giving easy access to users
  • Security is inbuilt, so data can be shared with external agencies and the solution is scalable to support continue growth of archives - meaning data is secure and the cloud platform will grow with your business

This solution is perfect for:

  • Any council with historic archives that need indexing and stored in the cloud
  • The backend of history websites
  • Any project where large scale AI automation would remove/reduce manual processes

Products Responsible:

  • Azure Data Factory
  • Azure Data Lake
  • Azure SQL Database
  • Cognitive Services
  • Cognitive Search
  • Azure Functions
  • Azure Machine Learning