Eduard Ara Paredes — Data Engineer

About me

Data Engineer · Barcelona

Eduard Ara Paredes

Data Engineer focused on building scalable data pipelines and efficient data architectures. Proficient in Python, SQL, and cloud technologies, with experience in data processing, warehousing, and Big Data environments.

Strong analytical mindset with the ability to solve complex problems and deliver data-driven solutions. Background in machine learning and analytics, providing a well-rounded understanding of modern data ecosystems.

Team-oriented and adaptable, with strong communication skills and fluent in English.

Years Exp

Data Projects

Programming Languages

Database Systems

Working at SDG Group

Barcelona, Spain

2 Years Experience

Remote · Hybrid

English C1

Experience

SDG Group

Data Analyst · Data Engineer · Data Governance

Sep 2026 – Present

Kanguro

Junior Business & Data Analyst

Jan 2026 – Aug 2026

Kanguro

Business & Data Analyst Intern

Oct 2025 – Jan 2026

Projects

Data Analytics Projects

Database & Data Systems Projects

ML, Predictive & Neural Network Models

Specialization

• End-to-end data engineering solutions
• ETL pipelines and automation
• Distributed data systems
• Design of cloud-native architectures

Data Engineer Data Scientist AI Engineer Data Analyst

Availability

Currently working

Open to new opportunities (full-time / hybrid / remote)

Mobility

B driving licence · own vehicle · open to relocation

Languages

Catalan (native) · Spanish (native) · English (C1)

Education & Certifications

UAB

University Degree

Bachelor's Degree in Data Engineering

Universitat Autònoma de Barcelona

PythonSQLNoSQLETL pipelinesData WarehousingDatabasesBig Data (Spark, Hadoop)AWS (S3, Redshift, Lambda)Google Cloud (BigQuery, Dataflow)Azure (Data Factory, Synapse)Cloud ComputingDockerGitData StructuresMachine LearningNeural NetworksData ScienceData AnalyticsTableauQlik

2021 – 2025

Completed

Certifications

DeepLearning.AI

Machine Learning & AI · Jul 2025

Neural Networks and Deep Learning

Completed

Neural NetworksDeep LearningPythonTensorFlowKeras

Improving Deep Neural Networks: Hyperparameter Tuning

Completed

Hyperparameter TuningRegularizationOptimization

Convolutional Neural Networks

Completed

CNNsPyTorchComputer VisionAlgorithmsSupervised Learning

IBM

Data Engineering · Apr – May 2025

Introduction to Data Engineering

Completed

Data ManagementETLBig DataData ScienceNetwork Security

ETL and Data Pipelines with Shell, Airflow & Kafka

Completed

KafkaAirflowShell

Databases and SQL for Data Science with Python

Completed

SQLPythonDatabases

dbt

dbt Labs

Data Transformation · Oct 2025

dbt Fundamentals

Completed

dbt Coredbt CloudETLData ModellingSQL Transformation

CAM

Cambridge University Press

English Language

B2 First Certificate in English (FCE)

Completed

C1 Advanced Certificate in English (CAE)

In Progress

My Stack

Languages

Core programming

Python

Proficient

SQL

Proficient

Advanced

Intermediate

Bash

Basic

MATLAB

Basic

Data Engineering

Pipelines & orchestration

ETL / ELT

Proficient

Apache Airflow

Advanced

Apache Spark

Advanced

ODI

Intermediate

Apache Kafka

Intermediate

dbt

Intermediate

Cloud & Infra

Platforms & managed services

AWS

Basic

S3 · Redshift
Lambda · Glue

Azure

Basic

Data Factory
Synapse · ADF

GCP

Intermediate

BigQuery
Dataflow · Pub/Sub

❄ Snowflake

Intermediate

Warehousing
Data Sharing

Databases

Storage engines

PostgreSQL

Proficient

MySQL

Proficient

BigQuery

Advanced

MongoDB

Intermediate

Neo4j

Intermediate

Redis

Intermediate

Machine Learning

Modelling & analysis

Pandas

Proficient

NumPy

Proficient

Scikit-learn

Advanced

Matplotlib

Advanced

Deep Learning

Neural networks

TensorFlow

Proficient

PyTorch

Proficient

Neural Networks

Advanced

Keras

Intermediate

Analytics & BI

Visualisation & reporting

Tableau

Advanced

Excel / Sheets

Advanced

Qlik

Intermediate

Looker Studio

Intermediate

DevOps & Tools

Environment & workflow

Jupyter/VS Code

Proficient

Git · GitHub

Advanced

Docker

Advanced

REST APIs

Advanced

Work Experience

Kanguro

Oct 2025 – Jan 2026 · 4 months

Intern · Business & Data Analyst

First professional experience in data analysis within a fast-paced insurtech startup. Designed and maintained interactive dashboards using Power BI and Looker Studio to support strategic decision-making, and automated manual reporting workflows using Python and Google Sheets. Built predictive models to optimize the expansion of Kanguro point networks, leveraging package volume and demand trends.

SQLPostgreSQLPower BILooker StudioMetabasePython

Kanguro

Jan 2026 – Aug 2026 · 7 months

Junior Business & Data Analyst

Promoted to Junior after demonstrating strong impact. Owned end-to-end KPI dashboards using Power BI and Looker Studio, and managed relational databases (SQL), ensuring data integrity, consistency, and performance. Contributed to database migration projects and built scalable ETL pipelines using Python to transform data from transactional systems into a centralized data warehouse. Applied data modeling best practices to support analytical use cases and collaborated closely with Product and Operations teams to enable data-driven decision-making.

SQLPostgreSQLPower BILooker StudioMetabasePython

Current Position

SDG Group

Sep 2026 – Present · +8 months

Data Engineer · Data Analytics · Data Governance

CaixaBank

• Security Control & Access Management Migration: The migration of CaixaBank's security control data environment was carried out from on-premise infrastructure to Google Cloud Platform (BigQuery). The migration strategy was defined, and end-to-end data workflows were automated using Apache Airflow, improving the reliability and scalability of data pipelines. The process was supported by SQL queries for data construction and transformation, and a Python layer was added to automate and orchestrate workflows from Cloud Composer.

• ETL Pipelines & Dashboarding Ecosystem: Built and maintained ETL pipelines within on-premise environments, integrating multiple data sources (Qlik, R, Jupyter Notebooks) to support analytical workloads. Contributed to the development of interactive dashboards in Qlik and ensured the maintenance and reliability of existing reporting solutions used by business stakeholders for decision-making.

Artifacts

• Metadata Lineage: Developed a metadata lineage model designed to extract and centralize metadata from multiple data sources, including BigQuery, MySQL, and PostgreSQL. Built a metadata artifact layer to structure and unify metadata information, enabling a data lineage system with propagation of metadata across tables and data models to improve traceability and governance. The development was implemented using Python.

Google Cloud Platform (GCP)BigQueryApache AirflowCloud ComposerPythonSQLETL PipelinesData LineageMetadata ManagementData GovernanceMySQLPostgreSQLQlik

Selected Projects

Deep Learning

AI-Based Poker Hand Classification

Intelligent system for Texas Hold'em that classifies poker hands and estimates win, draw, and loss probabilities using a CNN trained on a custom-generated dataset. Full-stack web app with a Flask backend and Angular frontend combining deep learning with Monte Carlo simulations.

PythonCNNFlaskAngularDeep Learning

Deep LearningComputer Vision

ANPR: Automatic License Plate Recognition

End-to-end pipeline for detecting and recognising Spanish license plates through plate localisation with YOLOv5, classical image processing for character segmentation, and a hybrid CNN + SVM classifier achieving high accuracy across real-world lighting conditions and perspectives.

YOLOv5CNNSVMOpenCV

Deep LearningComputer Vision

Real-Time Vehicle Tracking & Counting

Real-time vehicle tracking and counting system integrating YOLOv5 and YOLOv8 with DeepSORT and Kalman filtering for consistent identity tracking across frames. Motion-based filtering and reference-line logic determine vehicle direction while frame skipping and image cropping achieve near-real-time performance.

YOLOv8DeepSORTKalman FilterPython

Deep LearningComputer Vision

Helicobacter Pylori Detection in Histopathology

Automated detection of H. pylori in whole slide images using two complementary approaches: an AutoEncoder-based anomaly detection model trained exclusively on negative samples, and an attention-based model for direct patient-level classification aggregating patch-level features.

AutoEncoderAttentionDeep LearningWSI

Deep LearningComputer Vision

Epileptic Seizure Detection from EEG Signals

Automated seizure detection from multichannel EEG data using CNN with spatial and channel attention, LSTM for temporal modelling, and a hybrid CNN-LSTM architecture achieving the strongest recall on the CHB-MIT clinical dataset under both population-based and personalised cross-validation strategies.

CNNLSTMEEGDeep Learning

Graph Analysis

Graph-Based Music Artist Recommendation

Spotify's related-artist network crawled using BFS and DFS algorithms to build directed, undirected, and weighted graphs across 384 artists. Girvan-Newman community detection achieved modularity 0.722, with centrality analysis revealing that network hubs and information brokers are structurally non-overlapping roles.

PythonNetworkXGephiSpotify API

Graph Database

Historical Census Analysis — Neo4j

Graph database modelling historical census data (1833–1889) from two Catalan municipalities using Neo4j and Cypher. Includes 13 thematic queries covering kinship networks and cross-census identity resolution, plus GDS library analysis for Weakly Connected Components and PageRank centrality.

Neo4jCypherGDSPageRank

NoSQL Database

Comic Book Database — MongoDB

Document-oriented NoSQL database for a comic book publishing system, restructuring an ER model into two MongoDB collections using extended reference and attribute design patterns. Includes a Python ingestion script and ten aggregation pipeline queries covering cross-collection lookups and complex array filtering.

MongoDBPythonNoSQLAggregation

Machine Learning

Amazon Recommender Systems

Comprehensive recommender system applying Item-Item collaborative filtering, SVD matrix factorisation, and Content-Based filtering on large-scale Amazon datasets across Books, Electronics, and Beauty. Models evaluated with Precision@K, Recall@K, MAE, RMSE, and NDCG to identify the most scalable approach.

PythonSVDCollaborative FilteringScikit-learn

Data Visualization

Moderna & COVID-19 — Data Visualization

End-to-end study comparing Moderna's global vaccine distribution against Pfizer, AstraZeneca, and Sinopharm. Data cleaning and EDA in R; interactive dashboards, animated maps, and time-series charts in Tableau and Shiny correlating vaccination rates, case counts, and socioeconomic indicators.

RTableauShinyEDA

Statistics

Statistical Analysis Project

Full statistical data analysis in R covering data cleaning, exploratory analysis, and inferential statistics. Applies descriptive techniques, hypothesis testing, and visual exploration to extract insights from real datasets, with results documented in a structured reproducible report.

RStatistical AnalysisEDAHypothesis Testing

Let's Connect

I'm open to new opportunities — full-time, hybrid or remote.
Feel free to reach out through any of the channels below.

Barcelona, Spain

B driving licence · own vehicle · open to relocation

Catalan (native) · Spanish (native) · English (C1)

eduardaraparedes03@gmail.com

EduardAraParedes / Portfolio

→