I am a passionate data engineer and analytics consultant with over 10 years of working experience. I help my customers overcome the challenges they have with data, allowing them to unlock value and gain valuable insight in their data.
Self-serve, or self-service, has been around in various guises for decades from Automatic Teller Machines (ATMs) through to self-serve analytics. With the advent of spreadsheeting tools, self-serve analytics has arguably been around for a long time, and with the more recent introduction of dedicated self-serve Business Intelligence tools, self-serve has become embedded in the analytics vernacular. But what do we really mean when we say “self-serve”? In my experience, we all have different interpretations and understanding of what we mean - and those interpretations will be specific to the context in which we operate.
Problem Interactive and SQL Warehouse (formerly known as SQL Endpoint) clusters take time to become active. This can range from around 5 mins through to almost 10 mins. For some workloads and users, this waiting time can be frustrating if not unacceptable. For this use case, we had streaming clusters that needed to be available for when streams started at 07:00 and to be turned off when streams stopped being sent at 21:00.
Context A project that I’m working on uses Azure Synapse Serverless as a serving layer option for its data platform. The main processing and transformation of data is achieved using Databricks, with the resulting data being made available as a Delta file. Our processes ensure that the Delta files are registered automatically within Databricks as Delta Tables, but there is no native way to register Delta objects in Synapse. Therefore, we’ve gone down a route of creating a series of Stored Procedures in Synapse - which can be called from Databricks - which register the Delta files as views within Synapse.