https://store-images.s-microsoft.com/image/apps.10812.554eeb65-0ccf-4375-94ac-530cd4d09044.01452dd6-86ba-47b4-920d-72a4b46cdad7.d755aebd-f273-483a-97e7-7fe8f0164dd0

Reader-LM 1.5b

Jina AI

Reader-LM 1.5b

Jina AI

Small Language Models for Cleaning and Converting HTML to Markdown

Jina Reader-LM 1.5 b is a small language model that converts HTML content to Markdown content, which is useful for content conversion tasks. The model is trained on a curated collection of HTML content and its corresponding Markdown content.

Highlights:
  • Jina Reader-LM 1.5b is designed to efficiently convert noisy HTML into clean markdown, showcasing a novel approach to web content extraction that is both cost-effective and scalable.
  • Jina Reader-LM 1.5b has been optimized for long context support, handling up to 256K tokens, which is crucial for dealing with the intricacies of modern HTML, including inline CSS and scripts.
  • Jina Reader-LM 1.5b outperforms larger language models in the HTML-to-markdown conversion task, despite being significantly smaller in size, which is a testament to their specialized training and design for this specific task.