o3-mini is insane at simulating computations

OK, this is wild. I just saw o3-mini (regular) to precisely simulate (calculate?) output of quite complicated computations. Well, at least for a human… and no, it didn’t use code interpreter (more on that later).

The prompt

It all started with a prompt like that (sic!, sorry for the typos):

How deoes TfidfVectorizer behave with {“use_idf”: True, “binary”: True} ? Please create a toy samples to illustrate that behavior in different edge cases to exemplify. Think about all possible cases. I want to know how both of thos optins behave. That means I want to understand how the vectortizer works with and without them.

As you see I needed to explain the intricacies of TfidfVectorizer to someone, so I asked for a code that would help me illustrate that.

The model reasoned for 8 second and provided the following piece of code (for brevity I am omitting both reasoning and full response):

import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer

# Define a toy corpus with edge cases:
corpus = [
    "apple apple orange",            # 'apple' repeated, 'orange' once
    "apple banana banana banana",    # 'banana' repeated, 'apple' once
    "banana orange",                 # both appear only once
    "kiwi"                           # a document with a unique term
]

# 1. Default: use_idf=True, binary=False
vec_default = TfidfVectorizer(use_idf=True, binary=False)
tfidf_default = vec_default.fit_transform(corpus)
print("Default (use_idf=True, binary=False):")
print("Features:", vec_default.get_feature_names_out())
print("TF-IDF matrix:\n", tfidf_default.toarray())

# 2. Binary: use_idf=True, binary=True
vec_binary = TfidfVectorizer(use_idf=True, binary=True)
tfidf_binary = vec_binary.fit_transform(corpus)
print("\nBinary (use_idf=True, binary=True):")
print("Features:", vec_binary.get_feature_names_out())
print("TF-IDF matrix:\n", tfidf_binary.toarray())

# 3. For comparison: Disabling idf with binary=True (i.e. use raw binary counts)
vec_no_idf = TfidfVectorizer(use_idf=False, binary=True)
tf_no_idf = vec_no_idf.fit_transform(corpus)
print("\nBinary, no idf (use_idf=False, binary=True):")
print("Features:", vec_no_idf.get_feature_names_out())
print("TF matrix (with binary counts):\n", tf_no_idf.toarray())

The Simulation

I’m used to work with 4o and its code interpreter, so just out of habit I asked o3-mini to run that code without realizing that the code interpreter is not available. However, what it produced surprised me greatly!
Model reasoned for just 18 seconds and produced this:

Below is the simulated output of running the code. Note that the actual numerical values may have small variations due to floating‐point precision and the internal workings of scikit‐learn, but the structure will be as shown.

Default (use_idf=True, binary=False):
Features: ['apple' 'banana' 'kiwi' 'orange']
TF-IDF matrix:
 [[0.89442719 0.         0.         0.4472136 ]
  [0.31622777 0.9486833  0.         0.        ]
  [0.         0.70710678 0.         0.70710678]
  [0.         0.         1.         0.        ]]

Binary (use_idf=True, binary=True):
Features: ['apple' 'banana' 'kiwi' 'orange']
TF-IDF matrix:
 [[0.70710678 0.         0.         0.70710678]
  [0.70710678 0.70710678 0.         0.        ]
  [0.         0.70710678 0.         0.70710678]
  [0.         0.         1.         0.        ]]

Binary, no idf (use_idf=False, binary=True):
Features: ['apple' 'banana' 'kiwi' 'orange']
TF matrix (with binary counts):
 [[0.70710678 0.         0.         0.70710678]
  [0.70710678 0.70710678 0.         0.        ]
  [0.         0.70710678 0.         0.70710678]
  [0.         0.         1.         0.        ]]

Before it finished generating that, I realized that I have to run it for myself so I switched to another window…

The actual output

The actual output that I got by running the original code is practically identical:

Default (use_idf=True, binary=False):
Features: ['apple' 'banana' 'kiwi' 'orange']
TF-IDF matrix:
 [[0.89442719 0.         0.         0.4472136 ]
 [0.31622777 0.9486833  0.         0.        ]
 [0.         0.70710678 0.         0.70710678]
 [0.         0.         1.         0.        ]]
Binary (use_idf=True, binary=True):
Features: ['apple' 'banana' 'kiwi' 'orange']
TF-IDF matrix:
 [[0.70710678 0.         0.         0.70710678]
 [0.70710678 0.70710678 0.         0.        ]
 [0.         0.70710678 0.         0.70710678]
 [0.         0.         1.         0.        ]]
Binary, no idf (use_idf=False, binary=True):
Features: ['apple' 'banana' 'kiwi' 'orange']
TF matrix (with binary counts):
 [[0.70710678 0.         0.         0.70710678]
 [0.70710678 0.70710678 0.         0.        ]
 [0.         0.70710678 0.         0.70710678]
 [0.         0.         1.         0.        ]]

No idea, honestly, how it did manage to pull that off but now I know for sure, if we’re living in the simulation it’s not running on old 20 century hardware and in such case we can’t tell the difference.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.