How do you quantify the risks of using of web services that make calls to various APIs available commercially and in the public domain for “free” usage?
In the past, I have explored the challenges of managing web services in applications, including the ones that use open source. In this blog, I describe a methodology that our research team has developed to quantify the risks that come with using web services that make calls to various APIs available commercially and in public domain for “free” usage.
Although the definition of “free” is subjective, every API comes with a set of obligations, which are typically documented in various (legally binding) agreements (for example, terms of service, developer agreement, privacy statement) that govern the usage of API and its underlying data and functionalities. According to our research, there are essentially four key factors that affect the governance of API usage.
Arguably, a large number of agreements mean more legal and/or technical requirements that could enforce various constraints on the usage of an API (factor 1). The larger the size of the agreement the more time consuming it is to conduct the due diligence needed for an API usage (factor 2). Furthermore, the more often agreements change, more frequently due diligence might be needed to re-evaluate the usage of an API (factor 3). Finally, understanding the natural language of the agreements is important for using an API in a clear and compliant way (factor 4).
Natural language processing is a nontrivial problem and still an active area of research. Fortunately, various computational techniques are available that can be used (with a varying degree of accuracy) for extracting the semantic meanings of agreements written in natural language. Essentially, these techniques allow us to represent the semantic meanings (extracted from the statements written in natural language) through numerical values. These numerical values can then be used to quantify the risks that are associated with the usage of APIs.
The intuition here is that each statement poses some “risk” in terms of how the corresponding APIs must be used. Usually, these risks are evaluated/decided by experts such as lawyers who have a good understanding of technical legal compliance issues in intellectual property matters. Table 1 summarizes some statements (from the real-world API agreements) and their risks as assigned by our researchers/experts.
|1||Your device may have sensors that provide information to assist in a better understanding of your location. For example, an accelerometer can be used to determine things like speed, or a gyroscope to figure out direction of travel.||Low|
|2||The IP address assigned to your device is used to send the data you requested back to your device. For example, if you have many different sharing options, enables sharing with others quickly and easily.||Low|
|3||We may share aggregated, non-personally identifiable information publicly. When lots of people start searching for something, it can provide very useful information about particular trends at that time.||Medium|
|5||Your domain administrator may be able to view statistics regarding your account, like statistics regarding applications you install.||High|
|6||You grant us and our partners or sublicenses the right to use the name that you submit in connection with such content if they choose. You acknowledge that (a) we have not tested or screened third-party content, (b) your use of any third-party content is at your sole risk, and (c) third-party content may be subject to separate license terms as determined by the third party.||High|
One key scalability challenge here is that there are thousands of APIs and hundred-thousands of agreements (that govern those APIs). Constantly tracking changes and quantifying risks is a non-trivial problem that cannot be solved by human experts alone due to the scalability challenges. In this context, computational techniques for natural language processing play an important role in assisting human experts such as lawyers, technical architects, and developers.
Besides the four factors mentioned earlier, there are potentially many more factors that could contribute toward API risks, such as reliability of an API provider, the country of origin of an API (and their underlying data) and similar others. All these factors can eventually be converted into numerical values to quantify their prospective risks. My purpose here is not to list all the possible factors, but to discuss some of them to understand the basics of a methodology that can potentially quantify the risks associated with APIs.
The Synopsys Cybersecurity Research Center (CyRC) is tracking 50,000+ web services, 25,000+ providers, 500+ categories of web services, and 100,000+ agreements (including terms of services, privacy statements, and legal acts) to implement the methodology described above. Furthermore, our state-of-the-art technology allows us to discover vulnerable libraries and SDKs that enable various APIs from various providers. Intuitively, usage of such vulnerable libraries and SDKs may lead to data security and/or privacy issues. The technologies and methodologies developed by our research team allow our customers to discover the use of web services in their codebase and to quantify and mitigate the risks in an efficient and cost-effective way.