Apache Druid packaged by Data Science Dojo

Data Science Dojo

Apache Druid packaged by Data Science Dojo

Data Science Dojo

Monitor, ingest and query data from various sources using Apache Druid.

Data Science Dojo delivers data science education, consulting, and technical services to harvest the power of data.

Trademarks: This software listing is packaged by Data Science Dojo. The respective trademarks mentioned in the offering are owned by the respective companies, and use of them does not imply any affiliation or endorsement.

About the offer:

Apache Druid is an interactive real-time database backend environment for ingesting, maintaining, and segmenting data from a variety of sources either streaming or in batches, thus making it flexible. It is a scalable distributed system with parallel processing for queries and has a column-based structure for storing datasets, indicating the properties of each ingestion. Druid stores the data safely in deep storage and provides indexing and time-based partitioning for faster filtering and search performance. Users can query the ingested datasets with Druid's optimized SQL engine. It also provides automatic summarization and algorithmic approximation of data as well.

Who benefits from this offer:

Following can benefit from Apache Druid:

  • Students
  • Teams of Developers
  • Data Engineers
  • Data Scientists
  • DevOps
  • Companies focusing on web and mobile analytics
  • Solutions architect who want to monitor network performance
  • BI developers
  • And anyone else interested in data science tools

What is included in this offer:
  • Web accessible Apache Druid application service
  • Rich server user interface
  • In-browser SQL environment
  • Flexibility to load data from several sources
  • High uptime and fast aggregations
  • Easy to operate and user friendly
  • Feature of tuning and partitioning data

Our instance of Apache Druid supports the following data sources:

  • Apache Kafka
  • HDFS
  • HTTP(s)
  • Local disk
  • Azure Event Hub
  • Paste Data
  • Other custom sources

By specifying credentials and adding extensions you can also ingest from :

  • Azure Data Lake
  • Google Cloud Storage
  • Amazon S3 & Kinesis
Technical Specifications:

Apache Druid throughput improves with more CPU cores, more RAM, and faster disks

  • Minimum memory: 8GB RAM
  • Minimum vCPU: 2 vCPUs
  • Operating System: Ubuntu 20.04
How to access:

The default port Apache Druid listens to is 8888. You can access the web interface at http://yourip:8888