Important: For step by step guide on how to setup this vm , please refer to our
Getting Started guide
Deploy a ready-to-use virtual machine powered by Ragflow and Ollama, fully loaded with lead-ing open-source language models and optimized for high-performance GPU inference.
What’s Inside:
1.Ragflow – End-to-End RAG Workflow Orchestration
Ragflow is an open-source framework purpose-built for Retrieval-Augmented Generation (RAG) pipelines for deep document understanding. It lets you easily build, manage, and deploy AI systems that combine LLM reasoning with your proprietary or domain-specific data.
Offering features such as:
Deep Document Understanding – Intelligent layout analysis, template-based chunking (for PDFs, tables, resumes, legal docs, etc.), visual chunking, and explainable citations that reduce hallucinations and support traceability “Quality-In, Quality-Out” – High-fidelity input leads to accurate, grounded outputs, even with large contexts or complex formats
Broad Multimodal Support – Works across diverse sources, including Word, PPT, Excel, images, scanned docs, web pages, structured data
Seamless Pipeline Orchestration – Provides both Workflow and Agentic Workflow, a unified canvas for low-code and prompt-driven logic, simplifying complex orchestration
Deep Research Multi Agent Engine – Built-in template enabling dynamic, iterative ex-ploration of user queries across internal and external sources, using a robust agent hi-erarchy and prompt-engineered decision flows:
2. Ollama – Local LLM Inference
Ollama allows you to run large language models locally with ease. It’s designed for perfor-mance, portability, and low latency, making it perfect for developers and enterprises alike.
Preinstalled and ready to go with GPU acceleration, Ollama on this VM includes the following models:
Deepseek-R1 – family of open reasoning models
Qwen 2.5 – High-performing general-purpose model
Mistral – Compact and efficient model for reasoning tasks
Gemma – Open, lightweight LLM by Google
LLaVA – Vision-language model for image + text use cases
LLaMA 3.3 - optimized for dialogue/chat use cases
3. NVIDIA GPU Support
Fully configured GPU-ready environment
Harness the power of GPU-accelerated inference to drastically reduce latency and in-crease throughput for LLM tasks
Works seamlessly with Ollama and Ragflow for high-speed GenAI workflows
Use Cases
Deep Research Agents – Autonomously break down research tasks into sub-tasks, re-trieve across multiple sources, and synthesize executive-level reports.
Document Q&A & Knowledge Assistants – Tap into structured data across formats with accurate citation and transparency.
AI Copilots & Knowledge Workers – Leverage visual and text inputs to power multi-modal assistants.
Secure, Scalable RAG Applications – Everything runs within your own cloud environ-ment with full workflow control and observability.
Low-Latency LLM APIs – Direct deployment of Ollama LLMs for high-performance AI endpoints.
Why Choose This VM?
Full Data Control & Security: Everything runs in your isolated cloud environment giv-ing you Full control over your environment and data, Ideal for sensitive workloads, in-ternal documents, and enterprise-grade compliance.
Flexible Model Support: Use your own embeddings, documents, and vector DBs with Ragflow. Comes with preinstalled LLMs (Deepseek-R1, Qwen 2.5, Mistral, Gemma, Llama, LLaVA) and allows you to easily add your own models via Ollama or any Other LLM provider, giving you complete control over what models you use and how you run them.
All-in-One: Everything you need for GenAI development in a single VM
Instant Setup: No need to install anything , spin up and start working
Multimodal Ready: Includes LLaVA for image+text inference
Disclaimer: Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and/or names or their products and are the property of their respective owners. We disclaim proprietary interest in the marks and names of others.