The Open Source Initiative seeks to standardize exactly what the term means when  applied to software and operating systems. (Source: Adobe Stock)

Organization Says  Tech Industry Needs Standards for Calling Code ‘Open Source’

The term “open-source AI” is becoming increasingly popular in the tech industry, with companies like Meta committing to creating open-source general AI. However, there’s no industry-wide agreement on what “open-source AI” actually means. This lack of consensus is problematic as it could become a tool for powerful tech companies to manipulate to suit their own needs, potentially entrenching the dominance of today’s leading players, according to a story on

The Open Source Initiative (OSI), a nonprofit acting as the custodian of the Open Source Definition, is attempting to address this issue. The organization has assembled a 70-strong group of researchers, lawyers, policymakers, activists, and representatives from big tech companies to come up with a working definition of open-source AI.

However, establishing a universally acceptable definition is not straightforward. While many companies like Meta and OpenAI have released models described as open source, there’s considerable disagreement about whether these models meet the criteria. For example, some come with licenses that restrict what users can do with the models, which contradicts open-source principles. Additionally, the sheer number of components that go into today’s AI models, from training data to underlying architecture, complicates the issue, as it’s unclear which ingredients are necessary to study and modify models in a meaningful way.

Data availability is a major sticking point. While all major AI companies have released pretrained models, they have not shared the data sets on which these models were trained. Some argue that this seriously constrains efforts to modify and study models, automatically disqualifying them as open source.

Despite these challenges, it’s crucial for the tech industry to settle on a definition to enjoy the same benefits software developers gained from the open-source concept, such as lower costs for compliance and shared understanding.
The reluctance of companies to share training data, a key ingredient in AI model development, may be due to its value as a competitive advantage. However, not sharing training data is arguably not in the spirit of open source, and raises questions about whether it’s possible to truly study a model without knowing what information it was trained on.

There’s also a lack of clarity regarding what people hope to achieve by making AI “open source.” The community needs to decide whether it’s simply following market trends or aiming to make the market more open. The open-source community needs to coalesce around a single standard, otherwise the industry will simply ignore it and decide for itself what “open” means.