**Please strictly adhere to the following resume naming convention:
ALL CAPS, NO SPACES BETWEEN UNDERSCORES
PTN_US_GBAMSREQID_CandidateBeelineID
Example: PTN_US_9999999_SKIPJOHNSON0413
: -
MSP Owner: Michelle Lee
Location: Marlborough, NH
Duration: 6 months
skill id: 10715995
Experience Required
5+ years of hands-on data engineering experience
3+ years focused on the Databricks / Spark ecosystem
Key Responsibilities
Data Pipeline Development
Design, develop, and deploy robust, scalable batch and streaming data pipelines using:
PySpark
Spark SQL
Delta Live Tables
Ingest data from multiple source systems, including:
Point-of-Sale (POS)
E-commerce platforms
Loyalty systems
Marketing clouds
Data Modeling & Transformation
Implement complex data transformations and business logic using the Medallion Architecture:
Bronze
Silver
Gold
Build, optimize, and maintain Gold-layer customer dimension tables as the single source of truth for Customer 360 use cases.
Data Quality & Reliability
Design and implement data quality frameworks and cleansing routines.
Ensure accuracy, consistency, and trustworthiness of Customer 360 datasets.
Performance & Cost Optimization
Proactively monitor, debug, and tune Databricks jobs and Spark clusters.
Apply best practices for:
Partitioning
Caching
Delta Lake data layout
Optimize workloads for performance and cost efficiency.
Infrastructure as Code & CI/CD
Partner with DevOps teams to manage:
Databricks environments
Clusters
Job deployments
Use Infrastructure as Code (IaC) tools such as:
Terraform
AWS DevOps
GitHub Actions
Champion CI/CD best practices for data pipelines.
Data Governance & Security
Implement governance capabilities using Databricks Unity Catalog, including:
Data lineage tracking
Role-based access controls
Data masking
Ensure compliance with organizational security and data standards.
Collaboration
Work closely with:
Functional Consultants
Data Scientists
Analytics Engineers
Translate data requirements into well-structured, consumption-ready datasets.
Required Skills & Qualifications
Databricks & Spark Expertise
Deep hands-on experience with the Databricks Lakehouse Platform, including:
Delta Lake
Structured Streaming
Delta Live Tables
Cluster configuration and optimization
Programming & Data Engineering
Expert-level proficiency in Python and PySpark.
Advanced SQL skills for transformation, validation, and analysis.
Strong understanding of ETL / ELT design patterns.
Data Warehousing Concepts
Strong knowledge of data modeling principles, including:
Dimensional modeling (Kimball)
Data warehousing fundamentals
Cloud & Software Engineering
Proven experience with a major cloud platform:
AWS, Azure, or GCP
Strong familiarity with cloud storage (e.g., S3 or equivalent).
Hands-on experience with:
Git version control
Code reviews
Testing
CI/CD pipelines
Preferred Qualifications (Nice to Have)
Databricks Certified Data Engineer - Professional certification.
Experience with Terraform or other IaC tools.
Experience delivering data solutions in retail or e-commerce environments.
Familiarity with orchestration tools such as Airflow.
Experience with modern data stack tools, including:
dbt
Snowflake
Fivetran
Experience with Customer Data Platforms (CDPs) or Master Data Management (MDM) solutions., Project Code :
ALL CAPS, NO SPACES BETWEEN UNDERSCORES
PTN_US_GBAMSREQID_CandidateBeelineID
Example: PTN_US_9999999_SKIPJOHNSON0413
: -
MSP Owner: Michelle Lee
Location: Marlborough, NH
Duration: 6 months
skill id: 10715995
Experience Required
5+ years of hands-on data engineering experience
3+ years focused on the Databricks / Spark ecosystem
Key Responsibilities
Data Pipeline Development
Design, develop, and deploy robust, scalable batch and streaming data pipelines using:
PySpark
Spark SQL
Delta Live Tables
Ingest data from multiple source systems, including:
Point-of-Sale (POS)
E-commerce platforms
Loyalty systems
Marketing clouds
Data Modeling & Transformation
Implement complex data transformations and business logic using the Medallion Architecture:
Bronze
Silver
Gold
Build, optimize, and maintain Gold-layer customer dimension tables as the single source of truth for Customer 360 use cases.
Data Quality & Reliability
Design and implement data quality frameworks and cleansing routines.
Ensure accuracy, consistency, and trustworthiness of Customer 360 datasets.
Performance & Cost Optimization
Proactively monitor, debug, and tune Databricks jobs and Spark clusters.
Apply best practices for:
Partitioning
Caching
Delta Lake data layout
Optimize workloads for performance and cost efficiency.
Infrastructure as Code & CI/CD
Partner with DevOps teams to manage:
Databricks environments
Clusters
Job deployments
Use Infrastructure as Code (IaC) tools such as:
Terraform
AWS DevOps
GitHub Actions
Champion CI/CD best practices for data pipelines.
Data Governance & Security
Implement governance capabilities using Databricks Unity Catalog, including:
Data lineage tracking
Role-based access controls
Data masking
Ensure compliance with organizational security and data standards.
Collaboration
Work closely with:
Functional Consultants
Data Scientists
Analytics Engineers
Translate data requirements into well-structured, consumption-ready datasets.
Required Skills & Qualifications
Databricks & Spark Expertise
Deep hands-on experience with the Databricks Lakehouse Platform, including:
Delta Lake
Structured Streaming
Delta Live Tables
Cluster configuration and optimization
Programming & Data Engineering
Expert-level proficiency in Python and PySpark.
Advanced SQL skills for transformation, validation, and analysis.
Strong understanding of ETL / ELT design patterns.
Data Warehousing Concepts
Strong knowledge of data modeling principles, including:
Dimensional modeling (Kimball)
Data warehousing fundamentals
Cloud & Software Engineering
Proven experience with a major cloud platform:
AWS, Azure, or GCP
Strong familiarity with cloud storage (e.g., S3 or equivalent).
Hands-on experience with:
Git version control
Code reviews
Testing
CI/CD pipelines
Preferred Qualifications (Nice to Have)
Databricks Certified Data Engineer - Professional certification.
Experience with Terraform or other IaC tools.
Experience delivering data solutions in retail or e-commerce environments.
Familiarity with orchestration tools such as Airflow.
Experience with modern data stack tools, including:
dbt
Snowflake
Fivetran
Experience with Customer Data Platforms (CDPs) or Master Data Management (MDM) solutions., Project Code :