During her presentation at QCon London, Ines Montani, co-founder and CEO of explosion.ai (the maker of spaCy), stated that economies of scale are not enough to create monopolies in the AI space and that open-source techniques and models will allow everybody to keep up with the “Gen AI revolution”.
Montani opened her presentation by asking for a show of hands to identify the open-source users in the audience. The vast majority of the audience raised their hand, easily demonstrating that open-source is ubiquitous (“it would be easier to ask who doesn’t use open-source’”). She pointed out the multiple benefits of the OSS, underlining the “zero cost” as an additional advantage.
Even if in the case of machine learning (ML) there is more than just code, there is a growing ecosystem of open-source models that share everything (code, data and weights). Generally, the models can be split into three categories:
- Task-specific models – focused on a single domain (spaCy, DaCy, Sci, Stanza and Hugging Face models). They are usually small, especially by today’s standards, fast, and very cheap to run but “they don’t generalise well” – additional data is needed for fine-tuning them for a specific use case.
- Encoder models – a “bit bigger” but still require some fine-tuning if “we want something specific”. Examples, such as Bert and its local variations, Electra and T5, are usually the base for task-specific models with an additional degree of specialisation. Even if they require more resources, they can still be run in-house. These models generalise better but still need some data to fine-tune them to do something more specific.
- Large generative models – developed from encoder models and generated a category that evolves quickly. Innovation in the space was “shockingly simple: make the models bigger, a lot bigger“. Exponents of this category are models like Falcon, Mixtral, Qwen, Smaug and “different plays on llamas and alpacas“.
The “very significant” difference between encoder and large generative models is that the first ones have a task-specific network that outputs structured data domain-specific. In contrast, the second relies on prompting to narrow the problem outputting free text needed to be parsed.
The first two categories have been used in production for quite some time, while large generative models require more resources making them harder to deploy for most companies.
Montani continued the presentation with a rhetorical question:
How can companies take something freely available and ask money for it? How does that make sense? […] Well, the reason is economies of scale
The scale of large technology companies gives them multiple advantages (e.g. access to talent, computing at wholesale prices etc). Still, in the context of AI, the ability to batch requests is a worthy one given the “incredible” GPUs’ parallelisation capabilities. Batching is similar to train schedules: “In a small town, it is not viable for trains to come every five minutes, while in London trains and tubes run all the time and there are always enough people there”.
Montani underlined that one important aspect to understand is the difference between the human-facing application and the machine-facing model. The human-facing applications are plain applications that relay the end user’s request to the model, ensuring that the output is safe to be consumed. By decoupling the two layers of the product, companies like OpenAI ensured that their large generative models could be swapped at any moment. The whole amount of data collected is related to consumer habits and requests that can help improve the human-facing application (its UX/UI, features, guardrails etc) but can’t be helped for training or tweaking the model. So, she points out that the company is the leader of the “human virtual assistant space” but not the large generative models space. The latter is based on freely available research, trained on freely available data.
Montani classified the usual tasks of large generative models in practice:
- Generative tasks: Summarisation of single or multiple documents, reasoning, problem-solving, question answering, paraphrasing, style transfer
- Predictive tasks: Text classification, entity recognition, relation extraction, coreference resolution, grammar and morphology, semantic parsing, discourse structure). Have more structured data.
She concluded by stating that we are evolving the problem definition towards “how to tell computers what do to”. This has evolved from humans writing rules or instructions in the “programming space”, towards “programming by example” during the “ML era”, and finally ending up with a mixture of using rules, instructions and examples in the “in context learning”.
Even if they are labour-intensive, these approaches are powerful and large generative models can help skip the “cold-start problem” and help build the prototype. Using knowledge distillation and transfer learning allows engineers to iteratively evolve their solution to be as accurate and efficient as the task requires. This allows engineers to move from prototype to production very rapidly. This approach has multiple benefits similar to open source software development: modularity, no vendor lock-in, testable, extensible, flexible, efficient (some models run on a CPU with sizes as small as 10 MB), private, programmable, predictable and transparent.
Montani concluded her presentation by pointing out that the only feasible monopoly strategy in the AI space is regulation (“You have a monopoly because the government says so”). To avoid this, we must look past the lobbying groups to ensure we regulate use cases rather than technology (human or machine-facing). Otherwise, we might gift somebody a monopoly.
She considered that other monopoly strategies, such as compounding with economies of scale or network effects, or resource control, such as “there are no mines or telephone lines”, don’t apply in this case.
Access recorded QCon London talks with a Video-Only Pass.
Source link
lol