StreamSets Transformer is a data pipeline engine designed for any developer or data engineer to build ETL and ML pipelines that execute on Spark-- without coding. Transformer pipelines also provide unparalleled visibility into the execution of Spark applications with data previews and easy trouble-shooting, reducing the time to design and operate pipelines on Spark for developers of all skill levels.
Design and operate ETL pipelines that execute directly on HDInsight, Databricks and SQL Server Big Data Clusters. Batch load data into Azure Storage, and perform ETL and ML operations to land data into Azure Synapse or Databricks Delta Lake for analytics.
StreamSets Transformer (Large) allows a maximum of twenty executors per pipeline.
It takes a few minutes for the VM to be deployed. Once the instance is started, StreamSets Transformer is available as a service and the web based UI will be available on port 19630.
Note: The instance is automatically configured to allow TCP traffic from anywhere on Transformer default port 19630. StreamSets highly recommends restricting access to this port based on your organizational rules.
To access Transformer, enter the following URL in the address bar of your browser: http://[Public IP of the VM]:19630
For example if the Public IP of the VM is 220.127.116.11, enter http://18.104.22.168:19630 on the browser.
To log in to the Transformer UI, use the following credentials: Username: admin Password: admin
For more information, please refer to the documentation.