A Strategic Framework for High-Velocity Research
Antreas Antoniou
Turning a research idea into a robust experiment is fundamentally slow, often taking days, not hours. This time barrier is a major bottleneck.
To go faster, we are forced to compromise on code quality, leading to technical debt that slows down future progress.
The question: Can AI shorten the cycle from days to hours AND break the speed vs. quality trade-off?
A structured, iterative loop that integrates AI as a partner at every stage:
Don't use a single generalist model. Build a team of specialists.
| Tier | Model | Grade | Expertise | Cost ($/M tokens In/Out) |
|---|---|---|---|---|
| Daily | Gemini 2.5 Pro | S | All-Rounder, Multimodal, Reasoning | ~$2.50 / $15.00 |
| Daily | Claude 4.0 Sonnet | A+ | Code Gen, Synthetic Data, Instruction Following | ~$3.00 / $15.00 |
| Cost-Ef | Gemini 2.5 Flash | A | Utility Tasks, Summaries, Free Tier | ~$0.60 / $1.80 (Free on Vertex) |
| Special | Claude 4.0 Opus | S+ | Exceptionally Hard Coding Problems | ~$15.00 / $75.00 |
| OSS | Kimi-K2-Instruct | A- | Open Source Alternative | (Hardware/Operational) |
| Emerging | Grok-Code | A- | Emerging Code Leader | (~$20/mo Subscription) |
*Note: Costs are approximate as of Sept 2025 and can vary. Grade is a subjective measure of capability for core tasks.
Great context is not ruthless minimization. It's a gentle game of composition, like music.
Your Role: Be a Context Conductor, artfully blending the essential "melody" (core code, docs, tests) with high-value "harmony" (risky ideas, obscure papers) to create a breakthrough moment.
The solution emerges from a collaborative dialogue. The human is the Strategist, the AI is the Implementer.
A great .cursorrules file has three layers:
Here's how those principles look in a real file:
# 1. PERSONA
You are Cursor, an AI assistant for Antreas, an expert in ML.
# 2. PRINCIPLES
- Surgical Precision: All code modifications must be direct and necessary.
- Python Stack: Use pathlib, rich, fire.
- Imports: ALWAYS use full absolute imports.
- Value System: 1. Readability, 2. Maintainability...
# 3. PROCESS
- Phase 1: Map the repository (`map_directory.py`).
- Phase 2: Create a failing test or demo script first.
- Phase 3: Modify one file, then `git diff` and self-review.
- Phase 4: Run the test/demo and confirm success from stdout.
The Problem: Most robotics datasets have simple labels ('pick up the cube'), wasting the rich information in the robot's own video.
The Solution: Use a multimodal LLM to watch the video and generate rich, hierarchical instructions, turning a single label into a detailed guide.
The Outcome: 7th place out of 1000+ teams. View the demo and the official results.
Presenter Cue: The links on this slide are clickable. I encourage you to open the demo link now to have it ready.
The Method: I applied the Development Loop under pressure.
I just kept turning the crank on this loop for 90 minutes. Now, let's watch the demo...