AI Superpowers

A Strategic Framework for High-Velocity Research

Antreas Antoniou

The Researcher's Dilemma: The Time Barrier & The Quality Trade-off

Turning a research idea into a robust experiment is fundamentally slow, often taking days, not hours. This time barrier is a major bottleneck.

To go faster, we are forced to compromise on code quality, leading to technical debt that slows down future progress.

The question: Can AI shorten the cycle from days to hours AND break the speed vs. quality trade-off?

The Solution: The AI-Accelerated Development Loop

A structured, iterative loop that integrates AI as a partner at every stage:

Plan: Discuss with AI, document the goal, and write failing tests.
Implement & Fix: Use AI to generate code to pass the tests in a tight loop.
Review: Conduct a rigorous `git diff` review once tests pass.
Document: Capture the trajectory and outcomes for future context.
Demo & Critique: Visually inspect the result with the AI to find the next goal.
Evaluate & Repeat: Assess the outcome and begin the loop again.

Pillar 1: Strategic Oracle Selection

Don't use a single generalist model. Build a team of specialists.

The Architect (e.g., Gemini Pro): For high-level planning, brainstorming APIs, and structuring the project.
The Builder (e.g., Claude 4.0 Sonnet): A powerful and cost-effective model for the nuts-and-bolts implementation.
The Specialist (e.g., Claude Opus): For the truly hard problems where deep reasoning is required.

Oracle AI Roster: A Capability & Cost Matrix

Tier	Model	Grade	Expertise	Cost ($/M tokens In/Out)
Daily	Gemini 2.5 Pro	S	All-Rounder, Multimodal, Reasoning	~$2.50 / $15.00
Daily	Claude 4.0 Sonnet	A+	Code Gen, Synthetic Data, Instruction Following	~$3.00 / $15.00
Cost-Ef	Gemini 2.5 Flash	A	Utility Tasks, Summaries, Free Tier	~$0.60 / $1.80 (Free on Vertex)
Special	Claude 4.0 Opus	S+	Exceptionally Hard Coding Problems	~$15.00 / $75.00
OSS	Kimi-K2-Instruct	A-	Open Source Alternative	(Hardware/Operational)
Emerging	Grok-Code	A-	Emerging Code Leader	(~$20/mo Subscription)

*Note: Costs are approximate as of Sept 2025 and can vary. Grade is a subjective measure of capability for core tasks.

Pillar 2: The Art of Context Engineering

Great context is not ruthless minimization. It's a gentle game of composition, like music.

Your Role: Be a Context Conductor, artfully blending the essential "melody" (core code, docs, tests) with high-value "harmony" (risky ideas, obscure papers) to create a breakthrough moment.

The Conductor's Toolkit: Selecting Good Context

The Map (`map_directory.py`): A high-level overview of the codebase.
The Curated Bookshelf (`ingest_content.py`): Bundles key local files or remote repos into a single context file.
The Definition of Done (Failing Test): The clearest way to communicate intent.
The 'Gold Standard' (Exemplar Code): An example of the style and quality you want.
The Problem (Error Logs & Diffs): The most direct context for debugging.

Pillar 3: The Collaborative Hub

The solution emerges from a collaborative dialogue. The human is the Strategist, the AI is the Implementer.

Your role: Set the goal, conduct the context, and guide the overall direction.
The AI's role: Take your guidance and perform the bulk of the code-writing and testing.
The `git diff` review isn't to approve an idea; it's to validate the *implementation* of the idea you've already agreed upon.

Crafting Your AI's DNA: The Principles

A great .cursorrules file has three layers:

The Persona (The 'Who'): Define the role for the AI. 'You are a senior Python developer.'
The Principles (The 'Why'): Your development philosophy. 'Prioritize readability. Never use relative imports.'
The Process (The 'How'): Your specific workflow. 'Always write a failing test before implementation.'

Crafting Your AI's DNA: An Example

Here's how those principles look in a real file:


# 1. PERSONA
You are Cursor, an AI assistant for Antreas, an expert in ML.

# 2. PRINCIPLES
- Surgical Precision: All code modifications must be direct and necessary.
- Python Stack: Use pathlib, rich, fire.
- Imports: ALWAYS use full absolute imports.
- Value System: 1. Readability, 2. Maintainability...

# 3. PROCESS
- Phase 1: Map the repository (`map_directory.py`).
- Phase 2: Create a failing test or demo script first.
- Phase 3: Modify one file, then `git diff` and self-review.
- Phase 4: Run the test/demo and confirm success from stdout.

The Hackathon Project: From Problem to Prize

The Problem: Most robotics datasets have simple labels ('pick up the cube'), wasting the rich information in the robot's own video.

The Solution: Use a multimodal LLM to watch the video and generate rich, hierarchical instructions, turning a single label into a detailed guide.

The Outcome: 7th place out of 1000+ teams. View the demo and the official results.

Presenter Cue: The links on this slide are clickable. I encourage you to open the demo link now to have it ready.

Case Study: A 90-Minute Hackathon Sprint

The Method: I applied the Development Loop under pressure.

Plan: Discussed the API with Gemini & wrote a failing test.
Implement: Used Codex to write code to pass the test.
Review: Validated the `git diff`.
Document & Demo: The demo evolved from a simple CLI to a full browser-based interface for richer critique.

I just kept turning the crank on this loop for 90 minutes. Now, let's watch the demo...

Audience Quickstart Checklist

Draft Your Oracle Roster: Consciously choose and document your models for planning, coding, and critique.
Build Your Context Toolkit: Use scripts like `map_directory.py` and a `state_of_build.md`.
Define Your AI's DNA: Start a `.cursorrules` file with a simple persona.
Run a 90-Minute Sprint: Pick a small feature and practice the orchestration. The goal is practice, not perfection.
Ship & Retro: Ship an artifact, then ask: What context was missing? Which model struggled?