↳ DATA ENGINEER · BARCELONA

Eduard Ara Paredes

_

Turning raw data into scalable systems and real-world impact.

Barcelona, Spain 2 Years Exp. Remote · Hybrid

Profile

About me

EA
Data Engineer · Barcelona
Eduard Ara Paredes
Data Engineer focused on building scalable data pipelines and efficient data architectures. Proficient in Python, SQL, and cloud technologies, with experience in data processing, warehousing, and Big Data environments.
Strong analytical mindset with the ability to solve complex problems and deliver data-driven solutions. Background in machine learning and analytics, providing a well-rounded understanding of modern data ecosystems.
Team-oriented and adaptable, with strong communication skills and fluent in English.
2
Years Exp
11
Data Projects
+5
Programming Languages
+6
Database Systems
Working at SDG Group
Barcelona, Spain
2 Years Experience
Remote · Hybrid
English C1
Experience
SDG Group
Data Analyst · Data Engineer · Data Governance
Sep 2026 – Present
Kanguro
Junior Business & Data Analyst
Jan 2026 – Aug 2026
Kanguro
Business & Data Analyst Intern
Oct 2025 – Jan 2026
Projects
+1
Data Analytics Projects
+4
Database & Data Systems Projects
+3
ML, Predictive & Neural Network Models
Specialization
• End-to-end data engineering solutions
• ETL pipelines and automation
• Distributed data systems
• Design of cloud-native architectures
Data Engineer Data Scientist AI Engineer Data Analyst
Availability
Currently working
Open to new opportunities (full-time / hybrid / remote)
Mobility
B driving licence · own vehicle · open to relocation
Languages
Catalan (native) · Spanish (native) · English (C1)

Credentials

Education & Certifications

University Degree
Bachelor's Degree in Data Engineering
Universitat Autònoma de Barcelona
PythonSQLNoSQLETL pipelinesData WarehousingDatabasesBig Data (Spark, Hadoop)AWS (S3, Redshift, Lambda)Google Cloud (BigQuery, Dataflow)Azure (Data Factory, Synapse)Cloud ComputingDockerGitData StructuresMachine LearningNeural NetworksData ScienceData AnalyticsTableauQlik
2021 – 2025
Completed
Certifications
DeepLearning.AI
Machine Learning & AI · Jul 2025
Neural Networks and Deep Learning
Completed
Neural NetworksDeep LearningPythonTensorFlowKeras
Improving Deep Neural Networks: Hyperparameter Tuning
Completed
Hyperparameter TuningRegularizationOptimization
Convolutional Neural Networks
Completed
CNNsPyTorchComputer VisionAlgorithmsSupervised Learning
IBM
Data Engineering · Apr – May 2025
Introduction to Data Engineering
Completed
Data ManagementETLBig DataData ScienceNetwork Security
ETL and Data Pipelines with Shell, Airflow & Kafka
Completed
KafkaAirflowShell
Databases and SQL for Data Science with Python
Completed
SQLPythonDatabases
dbt Labs
Data Transformation · Oct 2025
dbt Fundamentals
Completed
dbt Coredbt CloudETLData ModellingSQL Transformation
Cambridge University Press
English Language
B2 First Certificate in English (FCE)
Completed
C1 Advanced Certificate in English (CAE)
In Progress

Technical

My Stack

Languages
Core programming
Python
Proficient
SQL
Proficient
R
Advanced
C
Intermediate
Bash
Basic
MATLAB
Basic
Data Engineering
Pipelines & orchestration
ETL / ELT
Proficient
Apache Airflow
Advanced
Apache Spark
Advanced
ODI
Intermediate
Apache Kafka
Intermediate
dbt
Intermediate
Cloud & Infra
Platforms & managed services
Basic
S3 · Redshift
Lambda · Glue
Basic
Data Factory
Synapse · ADF
Intermediate
BigQuery
Dataflow · Pub/Sub
Intermediate
Warehousing
Data Sharing
Databases
Storage engines
PostgreSQL
Proficient
MySQL
Proficient
BigQuery
Advanced
MongoDB
Intermediate
Neo4j
Intermediate
Redis
Intermediate
Machine Learning
Modelling & analysis
Pandas
Proficient
NumPy
Proficient
Scikit-learn
Advanced
Matplotlib
Advanced
Deep Learning
Neural networks
TensorFlow
Proficient
PyTorch
Proficient
Neural Networks
Advanced
Keras
Intermediate
Analytics & BI
Visualisation & reporting
Tableau
Advanced
Excel / Sheets
Advanced
Qlik
Intermediate
Looker Studio
Intermediate
DevOps & Tools
Environment & workflow
Jupyter/VS Code
Proficient
Git · GitHub
Advanced
Docker
Advanced
REST APIs
Advanced

Career

Work Experience

Kanguro
Oct 2025 – Jan 2026  ·  4 months
Intern · Business & Data Analyst

First professional experience in data analysis within a fast-paced insurtech startup. Designed and maintained interactive dashboards using Power BI and Looker Studio to support strategic decision-making, and automated manual reporting workflows using Python and Google Sheets. Built predictive models to optimize the expansion of Kanguro point networks, leveraging package volume and demand trends.

SQLPostgreSQLPower BILooker StudioMetabasePython
Kanguro
Jan 2026 – Aug 2026  ·  7 months
Junior Business & Data Analyst

Promoted to Junior after demonstrating strong impact. Owned end-to-end KPI dashboards using Power BI and Looker Studio, and managed relational databases (SQL), ensuring data integrity, consistency, and performance. Contributed to database migration projects and built scalable ETL pipelines using Python to transform data from transactional systems into a centralized data warehouse. Applied data modeling best practices to support analytical use cases and collaborated closely with Product and Operations teams to enable data-driven decision-making.

SQLPostgreSQLPower BILooker StudioMetabasePython
Current Position
SDG Group
Sep 2026 – Present  ·  +8 months
Data Engineer · Data Analytics · Data Governance

CaixaBank

Security Control & Access Management Migration: The migration of CaixaBank's security control data environment was carried out from on-premise infrastructure to Google Cloud Platform (BigQuery). The migration strategy was defined, and end-to-end data workflows were automated using Apache Airflow, improving the reliability and scalability of data pipelines. The process was supported by SQL queries for data construction and transformation, and a Python layer was added to automate and orchestrate workflows from Cloud Composer.

ETL Pipelines & Dashboarding Ecosystem: Built and maintained ETL pipelines within on-premise environments, integrating multiple data sources (Qlik, R, Jupyter Notebooks) to support analytical workloads. Contributed to the development of interactive dashboards in Qlik and ensured the maintenance and reliability of existing reporting solutions used by business stakeholders for decision-making.

Artifacts

Metadata Lineage: Developed a metadata lineage model designed to extract and centralize metadata from multiple data sources, including BigQuery, MySQL, and PostgreSQL. Built a metadata artifact layer to structure and unify metadata information, enabling a data lineage system with propagation of metadata across tables and data models to improve traceability and governance. The development was implemented using Python.

Google Cloud Platform (GCP)BigQueryApache AirflowCloud ComposerPythonSQLETL PipelinesData LineageMetadata ManagementData GovernanceMySQLPostgreSQLQlik

Work

Selected Projects

01
Deep Learning

AI-Based Poker Hand Classification

Intelligent system for Texas Hold'em that classifies poker hands and estimates win, draw, and loss probabilities using a CNN trained on a custom-generated dataset. Full-stack web app with a Flask backend and Angular frontend combining deep learning with Monte Carlo simulations.

PythonCNNFlaskAngularDeep Learning
02
Deep LearningComputer Vision

ANPR: Automatic License Plate Recognition

End-to-end pipeline for detecting and recognising Spanish license plates through plate localisation with YOLOv5, classical image processing for character segmentation, and a hybrid CNN + SVM classifier achieving high accuracy across real-world lighting conditions and perspectives.

YOLOv5CNNSVMOpenCV
03
Deep LearningComputer Vision

Real-Time Vehicle Tracking & Counting

Real-time vehicle tracking and counting system integrating YOLOv5 and YOLOv8 with DeepSORT and Kalman filtering for consistent identity tracking across frames. Motion-based filtering and reference-line logic determine vehicle direction while frame skipping and image cropping achieve near-real-time performance.

YOLOv8DeepSORTKalman FilterPython
04
Deep LearningComputer Vision

Helicobacter Pylori Detection in Histopathology

Automated detection of H. pylori in whole slide images using two complementary approaches: an AutoEncoder-based anomaly detection model trained exclusively on negative samples, and an attention-based model for direct patient-level classification aggregating patch-level features.

AutoEncoderAttentionDeep LearningWSI
05
Deep LearningComputer Vision

Epileptic Seizure Detection from EEG Signals

Automated seizure detection from multichannel EEG data using CNN with spatial and channel attention, LSTM for temporal modelling, and a hybrid CNN-LSTM architecture achieving the strongest recall on the CHB-MIT clinical dataset under both population-based and personalised cross-validation strategies.

CNNLSTMEEGDeep Learning
06
Graph Analysis

Graph-Based Music Artist Recommendation

Spotify's related-artist network crawled using BFS and DFS algorithms to build directed, undirected, and weighted graphs across 384 artists. Girvan-Newman community detection achieved modularity 0.722, with centrality analysis revealing that network hubs and information brokers are structurally non-overlapping roles.

PythonNetworkXGephiSpotify API
07
Graph Database

Historical Census Analysis — Neo4j

Graph database modelling historical census data (1833–1889) from two Catalan municipalities using Neo4j and Cypher. Includes 13 thematic queries covering kinship networks and cross-census identity resolution, plus GDS library analysis for Weakly Connected Components and PageRank centrality.

Neo4jCypherGDSPageRank
08
NoSQL Database

Comic Book Database — MongoDB

Document-oriented NoSQL database for a comic book publishing system, restructuring an ER model into two MongoDB collections using extended reference and attribute design patterns. Includes a Python ingestion script and ten aggregation pipeline queries covering cross-collection lookups and complex array filtering.

MongoDBPythonNoSQLAggregation
09
Machine Learning

Amazon Recommender Systems

Comprehensive recommender system applying Item-Item collaborative filtering, SVD matrix factorisation, and Content-Based filtering on large-scale Amazon datasets across Books, Electronics, and Beauty. Models evaluated with Precision@K, Recall@K, MAE, RMSE, and NDCG to identify the most scalable approach.

PythonSVDCollaborative FilteringScikit-learn
10
Data Visualization

Moderna & COVID-19 — Data Visualization

End-to-end study comparing Moderna's global vaccine distribution against Pfizer, AstraZeneca, and Sinopharm. Data cleaning and EDA in R; interactive dashboards, animated maps, and time-series charts in Tableau and Shiny correlating vaccination rates, case counts, and socioeconomic indicators.

RTableauShinyEDA
11
Statistics

Statistical Analysis Project

Full statistical data analysis in R covering data cleaning, exploratory analysis, and inferential statistics. Applies descriptive techniques, hypothesis testing, and visual exploration to extract insights from real datasets, with results documented in a structured reproducible report.

RStatistical AnalysisEDAHypothesis Testing

Get in touch

Let's Connect

I'm open to new opportunities — full-time, hybrid or remote.
Feel free to reach out through any of the channels below.

Barcelona, Spain
B driving licence · own vehicle · open to relocation
Catalan (native) · Spanish (native) · English (C1)