Where Do Rule-Based Systems Break Down?
Bank transfers and tax calculations are deterministic systems: the same input always produces the same output. The conditions are clear and the input range is finite, so code solves them perfectly. The trouble starts in areas where rules are blurry, like recognizing a cat in a photo. A rule like "pointy ears mean cat" collapses on a fold-eared Scottish Fold, and covering every angle, lighting condition, breed, and pose triggers a combinatorial explosion in which the number of rules grows exponentially.
Maintenance is another problem. The spam filters of the 1990s are the classic case: block the word "free" and senders evade it with variants like "free!". The more rules you add, the more the system becomes strong only against known patterns and collapses instantly on new variations — it actually grows more fragile.
What Did Machine Learning Flip?
Machine learning inverts the premise. Where traditional programming adds rules to data to get results, machine learning adds answers (labels) to data to get the rules — that is, a model. Train it on spam and legitimate mail, and the model finds the mathematical boundary separating the two classes by itself; when a new mail arrives, it answers with a probability, like "93% likely spam." In the 2000s, recommendation systems, credit scoring, and card-fraud detection were put to practical use this way.
But which features to look at still had to be decided by humans. The quality of feature engineering — word frequencies, sender domains, delinquency counts — largely determined model performance, and this manual work hit its limits on unstructured data like images and audio. Deep learning solved this by handing feature extraction over to the model as well.
Why Did Deep Learning Only Take Off in 2012?
The concept of artificial neural networks already existed in the 1980s. It exploded only in 2012 because three things finally came together at once: ImageNet (2009), with 1.2 million images labeled across 1,000 categories; GPUs, born for game graphics but a precise fit for parallel matrix multiplication (CUDA released in 2006); and open-source frameworks like TensorFlow and PyTorch, plus cloud services renting GPUs by the hour.
That year, AlexNet recorded a top-5 error rate of 15.3% in the ImageNet competition, beating the runner-up (26.2%) by more than 10 percentage points. In 2015, ResNet, with 152 layers, drove the error rate down to 3.57%, dipping below the human average (about 5%) for the first time. Speech recognition, translation, AlphaGo, and medical imaging followed, with the same approach overtaking existing techniques one by one.
What Should Non-Developers Take From This History?
There is one conclusion for non-developers in this history: today's AI is a machine that outputs probabilities, not rules. That is why the first question SH Consulting asks when decomposing work in AX consulting is this: is this task a problem solved by rules, or by patterns? Send settlement and transfers to code, judgment and classification to models, and half the question of what to delegate to AI is already settled.
At the time, deep learning models were all task-specific. A translation model could not read medical images, and every new task required labeled data built from scratch. The question of whether one model could handle many kinds of work with language at the center is what leads to today's LLMs.