Facebook launches static code analysis tool for Python

Facebook has opened a static code analyzer to find and fix flaws in Python code.

The Python Static Analyzer, or Pysa, analyzes Python source code and analyzes how data flows through the application to identify security vulnerabilities, Facebook said. Many attacks rely on finding a way to get the user’s input to access the codebase unexpectedly or return a result that was not intended. A security and privacy breach is often described as a situation where data has landed where it shouldn’t.

The in-house developed tool detected 44% of all security bugs in Instagram’s server-side Python code in the first half of 2020, Facebook said. Pysa found 330 unique issues in the proposed code changes, of which 15% (49) were classified as “significant issues” and 40% (131) were “real but had extenuating circumstances that made them less severe”, Facebook’s Graham Bleaney and Sinan Cepel said.

Pysa looks for connections between sources, or where important data comes from, and sinks, or where data from the source shouldn’t be able to end up. If Pysa finds a path where a source eventually connects to a sink, the tool flags it as a problem. Common source types are places where user-controlled data enters the application. Receivers can include APIs that run code or access the file system.

Focus on data flows

Pysa can verify that internal frameworks designed to prevent access to user data and expose user data are properly implemented. Pysa can also detect cross-site scripting and SQL injection flaws. For example, a code used to upload a user’s profile picture would not be a problem because the data it receives is restricted. However, if there was a way to go from user-controlled input to a SQL query, Pysa would flag the problem.

However, Pysa’s focus on data streams means there are limits on the types of security issues it can detect. Not all security or privacy issues are related to data streams. Pysa will not be able to ensure that an authorization check has been performed before initiating a privileged operation, for example. The fact that Python is a dynamic language also complicates the task of a static code analyzer – code can be dynamically imported virtually at any time, but the analyzer does not yet know what that code is doing.

“Python’s dynamism means there are endless pathological examples of data streams that Pysa cannot detect,” Facebook said.

Speed ​​Priority

A static code analyzer that can scan something as big as Instagram’s codebase needs to be fast. If the scan takes too long, the tool will be less likely to be used because waiting for the scan can lead to missing code release windows or delay in shipping code. Pysa is able to step through millions of lines of code anywhere from 30 minutes to hours, Facebook said. A manual code review can take weeks or months.

This design decision meant a “trade-off between performance for precision and accuracy,” Facebook said. In addition to being fast, Pysa has to find security vulnerabilities, so it’s designed to “avoid false positives and catch as many problems as possible”. False negatives would be instances where a tool fails to detect a genuine security issue. This meant accepting that there would be a high rate of false positives, when the tool indicated there was a security issue when there really wasn’t. Facebook said nearly half of the results originally returned by Pysa when scanning Instagram code were false positives.

To reduce the number of false positives to check, Facebook introduced sanitizers and other features in Pysa to filter results after analysis.

“Even with Pysa’s bias to avoid false negatives and our willingness to accept a good number of false positives, we still managed to limit false positives to 150 (45%) of reported issues,” said Facebook.

Support frames

Pysa is extensible, as it can be used with different Python frameworks and libraries. Facebook uses the Python Django and Tornado frameworks, but Pysa can support other frameworks with “a few lines of configuration” to tell the tool where data is entering the server.

“Because we use open-source Python server frameworks such as Django and Tornado for our own products, Pysa may begin to find security issues in projects using these frameworks on first runtime,” Facebook said.

Zulip, an open-source team chat platform, has integrated Pysa into its code base, Facebook said. Pysa was used to find a vulnerability in Zulip Server’s image thumbnail manager (CVE-2019-19775.

Facebook built earlier Zoncolan, a static analysis tool capable of finding “thousands of potential security issues” in more than 100 million lines of code. Pysa uses the same algorithms to perform static analysis and shares the code with Zoncolan.

With publish Pysa on GitHub, Facebook also provided bug definitions used to help find problems.

“Overall, we’re happy with the compromises we’ve made with Pysa to help security engineers scale, but there’s always room for improvement. We built Pysa for continuous improvement, through close collaboration between security engineers and software engineers,” Facebook said.

Comments are closed.