Our Tech Stack
Given that we are consultants, the technology we use can vary widely depending on the clients' needs. However, we often operate in the Azure eco-system and so there are some common components which will help you get up to speed.
Languages
Before delving into the commonly used components, let's first discuss the programming languages you should seek to become an expert in:
- Python (PySpark)
- SQL
You should understand Spark/Distributed Compute
Technology
Platforms
-
Databricks: This should be the platform you invest most effort in becoming expert in.
-
Azure Synapse Analytics: This platform is secondary to Databricks, however the Pipelines feature (formerly known as Azure Data Factory) is also worth learning.
The Azure eco-system is huge, however the components that we use most commonly are:
- Blob Storage / Data Lake
- Serverless SQL Pool (Synapse Analytics)
- Dedicated SQL Pool (Synapse Analytics)
- Event Hub (For event-triggered pipelines)
Less commonly used, but important:
- CosmosDB (Advanced)
Version Control
Version control should be utilised on all client projects. The exact tooling may vary, but typically it is:
- GitHub
- Azure DevOps Repos
Power BI
Power BI is a useful tool that offers numerous advantages for data engineers. Here are some ways it can be used:
- Data Integration: Power BI can connect to a wide variety of data sources, including databases, cloud services, and on-premises data warehouses. Allowing you to integrate diverse data sets into a single platform.
- Data Transformation: With Power Query, data engineers can clean, transform, and reshape data before loading it into Power BI. This capability simplifies the ETL (Extract, Transform, Load) process and ensures that the data is in the right format for analysis.
- Visualization and Reporting: Power BI provides robust visualization tools that help data engineers create interactive and insightful dashboards and reports. This makes it easier to communicate complex data insights to stakeholders.
- Data Modeling: Power BI includes features for creating data models, defining relationships, and creating calculated columns and measures.
- Data Sharing: Power BI allows for easy sharing and collaboration. Data engineers can publish reports and dashboards to the Power BI service, making it accessible to team members and stakeholders.
- Integration with Azure: Power BI integrates with Azure services, enhancing capabilities around data storage, processing, and advanced analytics.
Coming Soon...
- dbt