When it came to managing a data warehouse of over 7,000 models and overseeing data governance before SDF, the team at Deel manually tracked data flow. At this scale, maintaining complete control of the data warehouse often proves challenging and requires a team of detail-oriented experts.
As a custodian of a large body of data, Deel remains committed to ensuring the protection of high-risk data and PII, such as SSNs. The team was using different tools to track the hashing and flow of data and were processing thousands of data transformation models each day.
With a desire to scale up data management and consolidate all sensitive data reporting, the Deel team turned to SDF Labs.
The primary value of SDF lies in its ability to enhance developer productivity by providing key benefits such as improved data quality, governance, management, and transformations. The Deel team discovered SDF through a recent blog post and was eager to test its capabilities, including SQL types and column-level lineage.
The speed of complete data warehouse compilation and immediate understanding of the data map allowed the team to identify exactly how data flowed, no matter how complex. When a RevOps manager requested the deletion of a Salesforce column, the data engineer was able to immediately understand the downstream dependencies. This common and error prone requirement normally requires a complex analysis of queries, tables, and schemas taking up several hours of data development time.
Utilizing SDF Cloud, the team could visualize their sales data and understand the interdependencies within seconds.
The visualized and command-line-based lineage not only bolstered the Deel data engineering team's confidence in their decisions but also allowed for the simulation of data deletion before runtime.SQL changes could be re-compiled with SDF and further analyzed to understand potential impacts on downstream dependencies.
Early in the proof-of-concept phase, the Deel team expressed interest in SDF's ability to further enhance its strict control over PII and sensitive data. Implementing SDF on 300 tables, which accounted for less than 5% of the data warehouse, allowed for its effects to propagate to every affected table and increased visibility into which tables contained PII.
The team started by classifying three tables where SSNs were copied from a PostgreSQL database and stored inSnowflake. After compilation, the SSN classifiers were propagated to all downstream dependencies. Anywhere a hashing function was applied was updated to a "PII Hashed" classifier to indicate it had been transformed into a hashed value.
The data engineering team was also able to run a simple SDF Check to determine there were no SSNs exposed to analytics.
Deel plans to place this test, along with others, into their CI/CD process as part of normal code deployment to ensure new analytics pipelines comply with their standards for sensitive data management.
Deel is the all-in-one HR and payroll platform for global teams. It helps companies simplify every aspect of managing a workforce, from onboarding, compliance and performance management, to global payroll, HRIS and immigration support. Deel works for independent contractors and full-time employees in more than 150 countries, compliantly. And getting set up takes just a few minutes.