Antreas Antoniou - click to toggle

psst... click me

Antreas Antoniou

AI Researcher & Engineer

Cofounder at Axiotic AI

About

Who I Am

I'm an AI researcher and engineer with a PhD in Machine Learning and Meta-Learning from the University of Edinburgh. I've worked at Google, Amazon, and as principal scientist across AI startups in the UK and US. I specialize in large language models, multimodal learning, and self-supervised methods.

I believe intelligence emerges from structure, interaction, and information—not from sheer scale alone. The field has converged on one dominant recipe: make it bigger. But that's not how natural intelligence works. Brains compress experience, maintain state, and build hierarchical representations. They don't re-ingest the universe every time they need to think.

My work focuses on the first principles of learning—how structure, interaction, and information theory can replace massive parameter counts. I'm driven by that Cambrian Explosion spirit of the 2010s: new architectures, new paradigms, genuine exploration of the research tree.

What drives me:

  • Smarter learning, not bigger models — efficiency through insight, not brute force
  • First principles — understanding why things work, not just that they work
  • Open science — releasing code, models, and datasets to push the field forward
  • Building things — ideas are cheap, working systems are what matter

Philosophy

How I Think About Research

Overarching Theme

My goal is to emulate the trajectory of human-like representation learning—starting from foundational representations akin to those seen in human infants, but without requiring eons of evolutionary fine-tuning. This serves as a springboard for my broader ambition: investigating how these infant-like representations can be fine-tuned in concert with higher-level abstract concepts to pave the way for general artificial intelligence. To achieve this, my research focuses on scalable, data-efficient, and generalizable self-supervised learning in a multimodal setting. I integrate insights from neuroscience and evolutionary computation to explore optimal learning sequences and curricula, paying close attention to architectural choices and their corresponding training recipes.

Research Focus

Leading my research interests is Multi-Modal Learning—the synergistic integration of text, images, audio, and video. This is followed by Self-Supervised Methods inspired by infant learning and evolutionary computation.

Multi-Modal Learning Self-Supervised Methods Meta-Learning Adversarial Learning Evolutionary Optimization Computational Efficiency Memory-Augmented Networks

Research Philosophy

I operate within a pragmatic framework, aiming to identify high-leverage focal points conducive to in-depth investigation. This allows for efficient allocation of both computational and cognitive resources. In line with evolutionary tenets and the Pareto Principle, my methodology focuses on the "fittest" 20% of research avenues likely to contribute 80% of impactful results.

Current Work

What I'm Building

🔬

Axiotic AI

Cofounder & Research Lead

An open-science AI research lab paired with a consultancy. The consultancy funds the lab, the lab sets the agenda. We're exploring what's beyond the current scaling paradigm—better learning signals, novel architectures, and systems that actually understand.

Learn more →

Also partnering with Pieces for Developers on efficient on-device AI systems.

Research Agenda

What I Think About

I'm fundamentally inspired by information theory, evolution, and biology—how can we use what we observe about human learning and nature to build better artificial intelligence?

Improved Tasks

Beyond next-token prediction. Predicting masked past, present, and future states to force causal structure, long-horizon planning, and robust memory.

Multimodal Learning

Inspired by the brain's multimodal neurons—"synesthetic" architectures processing text, images, and audio in a shared latent space.

Memory & Recursion

Explicit memory and latent reasoning loops turning transformers into differentiable computers—infinite-context learning without expensive CoT tokens.

Hierarchical Attention

Fractal attention patterns operating at multiple resolutions simultaneously—seeing the forest and the trees without brute-force scaling.

Novel Architectures

New connectivity patterns, activation mechanisms, and dense routing schemes where deep layers attend directly to shallow features.

AI Flywheels

Data quality is the new frontier. Closed-loop engines where models generate, filter, and curriculum-sort their own training data.

Embodied Learning

Intelligence requires grounding. Using simulation as a primary data source, training agents on interaction physics before sim-to-real transfer.

Evolution meets SGD

SGD is greedy; evolution generalizes but is slow. Researching how to synergize them for robust, efficient learning.

Information & Distillation

Distillation through an information-theoretic lens—cracking the physics of compression to train small models to their maximum potential.

Local Learning Rules

Efficiency in nature is achieved through local learning rules. Leveraging this for efficient learning without global backprop.

Asynchronous Dynamics

Rhythms across multiple temporal scales define natural systems. Exploring how neural architectures can internalize these for efficient learning.

Bio-Inspired Learning

Nature solved learning over billions of years. Drawing from biological neural networks, synaptic plasticity, and adaptive mechanisms.

Publications

Selected Research

See all on Google Scholar →

NeurIPS 2024

einspace: Searching for Neural Architectures from Fundamental Operations

L. Ericsson, M. Espinosa, C. Yang, A. Antoniou, A. Storkey, S.B. Cohen, S. McDonagh, E.J. Crowley

A unified search space for neural architecture search based on einsum operations and fundamental building blocks.

NeurIPS 2024

EEVEE and GATE: Finding the Right Benchmarks for Vision-Language Models

A. Antoniou, E. Triantafillou, H. Larochelle, S. Montella, F. Rezk, K. Kim, L. Ericsson, P. Vougiouklis, J. Engelmann, E.J. Crowley, S. Humbarwadi, Y. Liu, G. Yang, J.Z. Pan, A. Storkey

Rethinking evaluation benchmarks for vision-language models to better measure true capabilities.

Preprint 2024

Adversarial Augmentation Training Makes Action Recognition Models More Robust to Realistic Video Distribution Shifts

K. Kim, S.N. Gowda, P. Eustratiadis, A. Antoniou, R.B. Fisher

Using adversarial augmentation to improve robustness of action recognition models against natural distribution shifts.

ICML 2023

ACAT: Adversarial Counterfactual Attention for Classification and Detection in Medical Imaging

A. Fontanella, A. Antoniou, W. Li, J. Wardlaw, G. Mair, E. Trucco, A. Storkey

Using adversarial counterfactual attention to improve interpretability in medical imaging classification and detection.

ICLR 2023

Contrastive Meta-Learning for Partially Observable Few-Shot Learning

A. Jelley, A. Storkey, A. Antoniou, S. Devlin

Combining contrastive learning with meta-learning for few-shot scenarios where only partial observations are available.

NeurIPS Workshop 2023

Is Scaling Learned Optimizers Worth It? Evaluating The Value of VeLO's 4000 TPU Months

F. Rezk, A. Antoniou, H. Gouk, T. Hospedales

Critical evaluation of VeLO, the largest learned optimizer to date. We found it's not necessarily better than tuned Adam.

Journal 2023

Development of a Deep Learning Method to Identify Acute Ischemic Stroke Lesions on Brain CT

A. Fontanella, W. Li, G. Mair, A. Antoniou, E. Platt, P. Armitage, E. Trucco, J. Wardlaw, A. Storkey

Deep learning for automated detection of acute ischemic stroke lesions from CT scans, achieving 72% accuracy with better performance on larger lesions.

PhD Thesis 2020

Meta Learning for Supervised and Unsupervised Few-Shot Learning

A. Antoniou

Survey 2020

Meta-Learning in Neural Networks: A Survey

T. Hospedales, A. Antoniou, P. Micaelli, A. Storkey

A comprehensive overview of the meta-learning landscape—how it works, why it matters, and where it's going.

NeurIPS Workshop 2020

Defining Benchmarks for Continual Few-Shot Learning

A. Antoniou, M. Patacchiola, M. Ochal, A. Storkey

What happens when few-shot learning meets continual learning? We built the benchmarks to find out.

NeurIPS 2019

Learning to Learn via Self-Critique

A. Antoniou, A. Storkey

A meta-learning approach where models learn to critique and improve their own learning process.

ICLR 2019

How to Train Your MAML

A. Antoniou, H. Edwards, A. Storkey

MAML is elegant but tricky to train. We figured out what actually matters and what doesn't.

Preprint 2019

Assume, Augment and Learn: Unsupervised Few-Shot Meta-Learning

A. Antoniou, A. Storkey

Few-shot meta-learning via random labels and data augmentation—no labeled data required.

Dataset 2018

CINIC-10 is not ImageNet or CIFAR-10

L.N. Darlow, E.J. Crowley, A. Antoniou, A.J. Storkey

A new benchmark dataset bridging the gap between CIFAR-10 and ImageNet.

ICANN 2018

Data Augmentation Generative Adversarial Networks

A. Antoniou, A. Storkey, H. Edwards

Using GANs to create training data. Especially useful when you only have a few examples to work with.

Preprint 2018

Dilated DenseNets for Relational Reasoning

A. Antoniou, A. Słowik, E.J. Crowley, A. Storkey

IJCNN 2016

A General Purpose Intelligent Surveillance System for Mobile Devices

A. Antoniou, P. Angelov

Deep learning for real-time surveillance on mobile devices.

Background

Experience

Cofounder & Research Lead

2025 - Present

Axiotic AI

Building an open-science research lab funded by consultancy. Exploring efficient learning, novel architectures, and what's beyond scaling.

Principal Research Partner

2025 - Present

Pieces for Developers

Research partnership focused on efficient on-device AI and practical ML systems.

Principal AI Scientist & Head of ML

2025 - 2026

Pieces for Developers

Led ML research agenda. Built nano-models and foundation models for efficient deployment across CPUs, GPUs, NPUs, and LPUs.

Lead Research Scientist

2024

Malted AI

Efficient LLM training, distillation, synthetic data generation, and LLM-as-a-Judge systems.

Research Associate

2019 - 2024

University of Edinburgh

Supervised by Prof. Amos Storkey. Part of BayesWatch and the Adaptive and Neural Computation (ANC) institute.

PhD in Machine Learning

2016 - 2020

University of Edinburgh

Thesis: "Meta Learning for Supervised and Unsupervised Few-Shot Learning"

Community

Outreach & Open Source

Building communities and democratizing ML research

🤖

LeRobot Edinburgh Hackathon

7th Worldwide

Lead Organizer & Competitor · 2025

Originally conceived and initiated bringing the global LeRobot Hackathon to the University of Edinburgh. Built the organizing team and managed 60% of organizational work. Sourced and provided all specialized hardware and high-performance GPU infrastructure. Served as expert mentor for all teams throughout the 30-hour competition. Also competed and led my team to 7th place out of 1000+ teams worldwide.

🖥️

EIDF A100 GPU Cluster Community

Early Adopter & Community Lead · 2023

One of the early adopters and key community support members for Edinburgh's EIDF A100 GPU cluster. Created the Slack server, answered hundreds of user questions, scheduled community meetings for key issues, and served as a bridge between users and developers. Authored early documentation and developed kubejobs, a Python package simplifying Kubernetes job specifications.

🎯

Workshop & Event Organizing

Co-organizer · 2024-2025

Co-organized the International Workshop on Efficient Generative AI 2024, a GAIL-funded event bringing together leading researchers from academia and industry to discuss efficient approaches to generative AI.

🎤

Guest Lectures & Invited Talks

University of Edinburgh · 2025

Model Compression

Guest lecture for ML Systems course (Dr. Luo Mai)

Slides

GenAI Superpowers

Teach-A-Thon presentation

Slides

LLMs for Teaching

Workshop on using LLMs in education

Slides

3000 Hours with ChatGPT

What I learned from extensive LLM usage

Slides

AI-Assisted Development: A Crash Course

Practical guide to coding with AI assistants

Slides

EIDF Cluster, Kubernetes & Docker Primer

Introduction to UoE's compute infrastructure

Slides
📚

Research Talks & Presentations

To Learn or Not to Learn

A journey across modern neural network inductive biases

Slides

Transferable Representation Learning

Discovering learning priors across domains, tasks, and modalities

Slides

TALI Dataset

Temporally and semantically Aligned Audio, Language and Images

Slides

Better DL Benchmarks & Datasets

...and other mythical creatures

Slides

Continual Few-Shot Learning

NeurIPS Workshop paper presentation

Slides

Learning to Learn via Self-Critique

NeurIPS 2019 paper presentation

Slides

Parting Talk: UoE 2024

Farewell presentation at University of Edinburgh

Slides

How to Cat

Cat ownership, biology, evolution & psychology

Slides
🎙️

Podcasts & Interviews

Nano Models, Transformers, and Long-Term Memory AI

Pieces for Developers Podcast · 2025

Watch
✍️

Writing & Articles

Beyond the Cloud: SLMs, Local AI, and Agentic Constellations

Pieces Blog · August 2025 — A vision for local-first AI built on biology-inspired architectures

Read

Too Much of a Good Thing: How Chasing Scale is Stifling AI Innovation

Pieces Blog · July 2025 — The "Great Amnesia" of AI's scaling monoculture

Read
🌐

Open Source Contributions

My conviction in the democratization of ML research stems from the irreplaceable value of individual expertise and the power of collective collaboration. Open source is not merely a development model—it's a fundamental necessity for driving innovation and maintaining ethical standards in the field.

🔧

Infrastructure & Tooling

University of Edinburgh · 2021-2024

  • • Secured $20K in Google Cloud Platform Research Credits and a Google TRC Compute Grant
  • • Procured a £50K deep learning research server through market analysis and vendor negotiations
  • • Built and deployed a Kubernetes cluster for the research group with Python tooling and tutorials
  • • Authored a minimal ML research framework following best practices
  • • Built comprehensive wiki documenting best practices, tools, and resources for the group

Playground

Demos & Datasets

Interactive explorations of my research. Dive in!

🚀 Featured Dataset

TALI: Temporally & Semantically Aligned Audio, Language and Images

TALI is my response to the growing demand for multimodal understanding in deep learning. It's a large-scale, tetramodal dataset that aligns text, video, images, and audio—a playground for innovative self-supervised learning tasks and multimodal research. With TALI, we're exploring how different modalities and data/model scaling affect downstream performance. I'm excited about the diverse research ideas it will inspire!

More demos coming soon...

Teaching

Sharing Knowledge

Current Teaching

Quest Lecture: LLM 101 and Model Compression

Machine Learning Systems Course

Machine Learning Practical (2017-2019)

Lead TA at University of Edinburgh. Created comprehensive tutorials, coursework materials, and supervised research projects. Course Website →

Google Cloud Platform

Teaching Awards & Recognition

Staff Award for MLP TA Service

2020-21

University of Edinburgh — Recognition for exceptional teaching assistance

View Award →

Teaching Award Nominations

View Letters →
Best Practice in Inclusive Learning Award 2019
Best Support Staff Award 2019
Best Student Who Tutors Award (2 nominations) 2019
Best UK PhD Tutor Award 2019
Best Student Who Tutors Award 2018

Contact

Let's Talk

Interested in research collaboration, consulting, or just want to chat about AI?