About
Highly accomplished Data Engineer with 5 years of experience designing, building, and optimizing large-scale data pipelines and analytical platforms in big data and cloud environments. Proven track record in finance, telecom, and analytics, leveraging expertise in ETL/ELT, Apache Spark, PySpark, and AWS (EMR, S3, Lambda, Athena) to improve data quality, reduce processing times, and enable data-driven decision-making at scale. Adept at integrating diverse data, implementing robust governance frameworks, and collaborating with cross-functional Agile teams to deliver high-impact data solutions.
Work
Marlabs
|Data Engineer
Recife, Pernambuco, Brazil
→
Summary
Currently leading big data pipeline development and operations on AWS for Serasa Experian, focusing on sensitive data governance, quality, and cost efficiency to deliver standardized datasets.
Highlights
Architected, implemented, and orchestrated scalable data pipelines using EMR (Spark/PySpark/Scala), Airflow, and Lambda, integrating diverse data sources like TXT, CSV, Parquet, Iceberg, mainframe, SQL databases, and APIs.
Qualified, prioritized, and classified sensitive data, strengthening data governance and ensuring LGPD/PII compliance across massive datasets (billions of records).
Standardized and enriched data through schema normalization, consistent naming, data typing, deduplication, cleansing, and masking, reducing duplicates by ~15% and increasing critical field completeness to ~98%.
Stored and exposed data within the Silver layer using EMR/Scala with Glue Data Catalog, enabling efficient querying via Athena for multiple business units.
Automated deployment and continuous monitoring with Jenkins and Airflow DAGs, ensuring high availability and meeting critical SLAs.
Collaborated with Agile Scrum squads and business stakeholders to define and prioritize data engineering requirements.
Datainfo
|Data Engineer
Recife, Pernambuco, Brazil
→
Summary
Engineered and optimized large-scale fiscal data pipelines for SEFAZ-PE, ensuring high-quality datasets for auditors and directors to drive financial analysis and decision-making.
Highlights
Engineered and optimized data ingestion and transformation pipelines using Hadoop, Spark, Hive, Impala, and SQL, processing millions of daily fiscal records from diverse sources (databases, XML, TXT).
Implemented robust data quality checks, typing, and calculated fields before publishing, ensuring trusted datasets for downstream consumption by auditors and BI analysts.
Contributed to strategic fiscal modernization projects (NF3-e and DIMP), centralizing financial transaction monitoring across the state.
Provided trusted data for thousands of monthly queries, improving decision accuracy and increasing fiscal data audit speed by 90%.
Supported Pernambuco's largest fiscal modernization initiatives, impacting thousands of taxpayers and enhancing revenue effectiveness.
Accenture
|Data Engineer
Recife, Pernambuco, Brazil
→
Summary
Integrated and modeled sales and marketing data for Oi Place's e-commerce marketplace, supporting strategic decision-making and KPI analysis.
Highlights
Developed and optimized ETL/ELT pipelines to ingest data from Mirakl sales/marketing APIs and Google Analytics 4 (GA4), consolidating into Cloudera CDP data warehouses and data lakes.
Created multidimensional data models to support KPI analysis for sales performance, marketing campaign results, user registration, and customer engagement.
Calculated sales and marketing KPIs for executive dashboards, reducing analysis time by 30% and speeding up decision-making.
Participated in the first Oi squad to work natively on Cloudera CDP, developing all pipelines and ensuring >95% reliability.
Processed millions of daily sales and access records in a big data environment (Hadoop, Hive, Impala, Spark, PySpark).
Collaborated in an Agile Scrum environment, ensuring continuous delivery and strong alignment with business and technical teams.
Produced weekly executive reports tracking KPIs for thousands of SKUs (e.g., smartphones, appliances, air conditioners).
Received leadership recognition (Distinctive Achievement) for ensuring delivery continuity during team transitions, taking on increased technical responsibilities and managing a team member.
Accenture
|Software Engineer
Recife, Pernambuco, Brazil
→
Summary
Customized and maintained Oracle BRM 12.0 for Oi's billing team, ensuring adherence to business rules and optimizing billing and revenue routines.
Highlights
Customized and maintained Oracle BRM 12.0, developing and tuning MTAs and Opcodes (C), pipeline configuration, and data modeling via PODLs and NAPs to ensure billing rule adherence.
Automated billing and revenue routines using Shell Script (Unix), SQL, and PL/SQL, creating batch jobs and deployments via Azure DevOps.
Generated and maintained invoices and reports with Oracle BI Publisher, including creation/updates of RTF templates and standardized layouts.
Provided integration and support for bill run cycles, troubleshooting incidents using logs and database queries, bug fixing, and continuous performance improvements.
Managed source control with Git and delivered solutions within Agile Scrum teams.
Education
Instituto Federal da Paraíba
→
Bachelor's degree
Computer Engineering
Languages
English
Portuguese
Certificates
Data Engineering with Databricks, SQL, and Spark
Issued By
Databricks
Skills
Python
Python.
SQL
SQL.
AWS
AWS.
Azure
Azure.
Apache Spark
Apache Spark.
PySpark
PySpark.
Spark SQL
Spark SQL.
Git
Git.
Bash
Bash.
Shell Script
Shell Script.
Docker
Docker.
Jenkins
Jenkins.
Azure DevOps
Azure DevOps.
Databricks
Databricks.
AWS Lambda
AWS Lambda.
Amazon EMR
Amazon EMR.
Amazon S3
Amazon S3.
Apache Iceberg
Apache Iceberg.
Hadoop
Hadoop.
HiveQL
HiveQL.
Hue
Hue.
HDFS
HDFS.
Hive
Hive.
Impala
Impala.
Sqoop
Sqoop.
Azure Data Lake
Azure Data Lake.
DBeaver
DBeaver.
Oracle Database
Oracle Database.
Microsoft SQL Server
Microsoft SQL Server.
IBM DB2
IBM DB2.
PostgreSQL
PostgreSQL.
MongoDB
MongoDB.
Azure SQL Database
Azure SQL Database.
Amazon RDS
Amazon RDS.
JETL
JETL.
DataStage
DataStage.
Sagent DataFlow
Sagent DataFlow.
Azure Data Factory
Azure Data Factory.
Airflow
Airflow.
Cloudera CDP
Cloudera CDP.
Scrum
Scrum.
Agile Methodologies
Agile Methodologies.
ETL/ELT
ETL/ELT.
Data Lake
Data Lake.
Data Lakehouse
Data Lakehouse.
Scala
Scala.
Data Governance
Data Governance, LGPD/PII.
AWS Glue Data Catalog
AWS Glue Data Catalog.
Athena
Athena.
ODS
ODS.
C
C.
PODLs
PODLs.
NAPs
NAPs.
Oracle BRM 12.0
Oracle BRM 12.0.
Oracle BI Publisher
Oracle BI Publisher.
Google Analytics 4 (GA4)
Google Analytics 4 (GA4).
REST APIs
REST APIs.
Data Modeling
Data Modeling.