Archived — Backend

TaskQ

Q: Are you available for remote work?

Yes, I am fully available for remote opportunities, freelance projects, and full-time roles across different time zones.

Q: Do you accept freelance projects?

Yes, I work on contract and freelance projects ranging from backend architecture consultation to full system implementation.

Q: What is your expertise?

I specialize in backend architecture, distributed systems, machine learning operations (MLOps), API design, system design, and production-scale AI integration.

Q: Where are you based?

I am based in Cairo, Egypt, but work with teams globally and am comfortable with remote and hybrid arrangements.

Distributed task scheduler — senior thesis project

Started

September 2019

Repository Private / Internal

The Context

For my senior thesis at Cairo University, I wanted to figure out how to parallelize massive, independent computation tasks across commodity machines in a local network. I was frustrated by long-running batch jobs that were written as single-threaded scripts and wanted to build a fault-tolerant system that let a user throw a directed acyclic graph (DAG) of tasks at a cluster and walk away.

Architecture & Execution

TaskQ used a centralized coordinator model to manage task distribution. A client submitted a dependency graph of Python functions. The coordinator parsed the DAG to find executable leaf nodes and deployed them to the worker nodes over a ZeroMQ message bus. I implemented a rudimentary work-stealing algorithm and used SQLite on the coordinator to persist state for crash recovery. If a worker missed three heartbeats, the coordinator would reassign its tasks.

Post-Mortem Lessons

Coordination overhead is the silent killer of distributed systems. For tasks that took less than 500ms to execute natively, the time spent scheduling, serializing data, and handling network I/O completely erased the gains from parallelism.

Clock synchronization across different nodes is dramatically harder than you expect. Timestamps that were off by just a few hundred milliseconds made debugging race conditions nearly impossible.

ZeroMQ is brilliant for simple messaging patterns, but trying to fight its core connection management assumptions to implement complex recovery logic consumed weeks of my life. I should have used a dumb queue like RabbitMQ.

The system worked on paper and during the academic demo. It did not survive production-like loads for more than ten minutes. It taught me more about what *not* to do than what to do.

Writing

Lessons from a failed distributed system →

Projects

Event Pipeline →

← Return to Case Studies

Worker Nodes

Theoretical Speedup

~2x

Actual Speedup

1.3x

Thesis Grade

A-

Core Technologies

Python ZeroMQ SQLite

[ Coordinator Node ]
 - DAG analysis
 - Critical-path scheduling
          ↓
[ ZeroMQ Message Bus ]
          ↓
[ Worker Nodes (1..N) ]
 - Task sandboxes
 - State updates