AI Superpowers

A Strategic Framework for High-Velocity Research

Antreas Antoniou

The Researcher's Dilemma: The Time Barrier & The Quality Trade-off

Turning a research idea into a robust experiment is fundamentally slow, often taking days, not hours. This time barrier is a major bottleneck.

To go faster, we are forced to compromise on code quality, leading to technical debt that slows down future progress.

The question: Can AI shorten the cycle from days to hours AND break the speed vs. quality trade-off?

The Solution: The AI-Accelerated Development Loop

A structured, iterative loop that integrates AI as a partner at every stage:

  1. Plan: Discuss with AI, document the goal, and write failing tests.
  2. Implement & Fix: Use AI to generate code to pass the tests in a tight loop.
  3. Review: Conduct a rigorous `git diff` review once tests pass.
  4. Document: Capture the trajectory and outcomes for future context.
  5. Demo & Critique: Visually inspect the result with the AI to find the next goal.
  6. Evaluate & Repeat: Assess the outcome and begin the loop again.

Pillar 1: Strategic Oracle Selection

Don't use a single generalist model. Build a team of specialists.

  • The Architect (e.g., Gemini Pro): For high-level planning, brainstorming APIs, and structuring the project.
  • The Builder (e.g., Claude 4.0 Sonnet): A powerful and cost-effective model for the nuts-and-bolts implementation.
  • The Specialist (e.g., Claude Opus): For the truly hard problems where deep reasoning is required.

Oracle AI Roster: A Capability & Cost Matrix

Tier Model Grade Expertise Cost ($/M tokens In/Out)
DailyGemini 2.5 ProSAll-Rounder, Multimodal, Reasoning~$2.50 / $15.00
DailyClaude 4.0 SonnetA+Code Gen, Synthetic Data, Instruction Following~$3.00 / $15.00
Cost-EfGemini 2.5 FlashAUtility Tasks, Summaries, Free Tier~$0.60 / $1.80 (Free on Vertex)
SpecialClaude 4.0 OpusS+Exceptionally Hard Coding Problems~$15.00 / $75.00
OSSKimi-K2-InstructA-Open Source Alternative(Hardware/Operational)
EmergingGrok-CodeA-Emerging Code Leader(~$20/mo Subscription)

*Note: Costs are approximate as of Sept 2025 and can vary. Grade is a subjective measure of capability for core tasks.

Pillar 2: The Art of Context Engineering

Great context is not ruthless minimization. It's a gentle game of composition, like music.

Your Role: Be a Context Conductor, artfully blending the essential "melody" (core code, docs, tests) with high-value "harmony" (risky ideas, obscure papers) to create a breakthrough moment.

The Conductor's Toolkit: Selecting Good Context

  1. The Map (`map_directory.py`): A high-level overview of the codebase.
  2. The Curated Bookshelf (`ingest_content.py`): Bundles key local files or remote repos into a single context file.
  3. The Definition of Done (Failing Test): The clearest way to communicate intent.
  4. The 'Gold Standard' (Exemplar Code): An example of the style and quality you want.
  5. The Problem (Error Logs & Diffs): The most direct context for debugging.

Pillar 3: The Collaborative Hub

The solution emerges from a collaborative dialogue. The human is the Strategist, the AI is the Implementer.

  • Your role: Set the goal, conduct the context, and guide the overall direction.
  • The AI's role: Take your guidance and perform the bulk of the code-writing and testing.
  • The `git diff` review isn't to approve an idea; it's to validate the *implementation* of the idea you've already agreed upon.

Crafting Your AI's DNA: The Principles

A great .cursorrules file has three layers:

  1. The Persona (The 'Who'): Define the role for the AI. 'You are a senior Python developer.'
  2. The Principles (The 'Why'): Your development philosophy. 'Prioritize readability. Never use relative imports.'
  3. The Process (The 'How'): Your specific workflow. 'Always write a failing test before implementation.'

Crafting Your AI's DNA: An Example

Here's how those principles look in a real file:


# 1. PERSONA
You are Cursor, an AI assistant for Antreas, an expert in ML.

# 2. PRINCIPLES
- Surgical Precision: All code modifications must be direct and necessary.
- Python Stack: Use pathlib, rich, fire.
- Imports: ALWAYS use full absolute imports.
- Value System: 1. Readability, 2. Maintainability...

# 3. PROCESS
- Phase 1: Map the repository (`map_directory.py`).
- Phase 2: Create a failing test or demo script first.
- Phase 3: Modify one file, then `git diff` and self-review.
- Phase 4: Run the test/demo and confirm success from stdout.
                    

The Hackathon Project: From Problem to Prize

The Problem: Most robotics datasets have simple labels ('pick up the cube'), wasting the rich information in the robot's own video.

The Solution: Use a multimodal LLM to watch the video and generate rich, hierarchical instructions, turning a single label into a detailed guide.

The Outcome: 7th place out of 1000+ teams. View the demo and the official results.

Presenter Cue: The links on this slide are clickable. I encourage you to open the demo link now to have it ready.

Case Study: A 90-Minute Hackathon Sprint

The Method: I applied the Development Loop under pressure.

  • Plan: Discussed the API with Gemini & wrote a failing test.
  • Implement: Used Codex to write code to pass the test.
  • Review: Validated the `git diff`.
  • Document & Demo: The demo evolved from a simple CLI to a full browser-based interface for richer critique.

I just kept turning the crank on this loop for 90 minutes. Now, let's watch the demo...

Audience Quickstart Checklist

  1. Draft Your Oracle Roster: Consciously choose and document your models for planning, coding, and critique.
  2. Build Your Context Toolkit: Use scripts like `map_directory.py` and a `state_of_build.md`.
  3. Define Your AI's DNA: Start a `.cursorrules` file with a simple persona.
  4. Run a 90-Minute Sprint: Pick a small feature and practice the orchestration. The goal is practice, not perfection.
  5. Ship & Retro: Ship an artifact, then ask: What context was missing? Which model struggled?