Cointime

Download App
iOS & Android

How AI Can Enhance Cybersecurity: A Primer on Deep Learning and its Applications

Validated Project

AI-powered tools such as ChatGPT have attracted considerable interest in the software development community. These tools have many applications, including to the security of Web3 platforms and smart contracts. We have already discussed their potential to improve the auditing process by summarizing program functions and helping to find code vulnerabilities.

ChatGPT and similar models use deep learning, which is a subset of machine learning, as their underlying technology. Deep learning has been extensively researched as a means of improving software security over the past decade. While using a chatbot like ChatGPT to ask questions about code is one approach to enhancing software security, there are many other exciting ways in which deep learning can be applied. Research into improving software security through deep learning is in its infancy, and ideas are just beginning to reach the implementation and testing stages. In this article, we present a review of the current state-of-the-art in the application of deep learning to secure software.

Deep Learning

Deep learning is a type of machine learning that involves training a system to recognize patterns in data. The system is provided with a set of inputs and their corresponding classifications, and it learns to identify patterns that differentiate one input from another. Once trained, the system can be used to classify data inputs it hasn't seen before. This allows for the creation of powerful models that can recognize complex patterns in data, such as locating bugs or understanding programs.

Traditional Software Security Techniques

Vulnerable code can be identified using heuristics, static analysis, formal verification, and fuzzing, among others, which are some of the current approaches employed to help secure programs.

Heuristics are methods for identifying insecure coding patterns based on rules of thumb or experience-based techniques. Common heuristics include avoiding the use of unsafe functions, properly handling user input, avoiding the use of hard-coded passwords, and validating all inputs before processing them.

Code bugs can also be detected using tools that conduct static code analysis both at the source code and bytecode levels. These tools have the ability to pinpoint vulnerable code patterns and create visual representations to aid in the understanding of smart contracts. Over time, the accuracy and effectiveness of the tools will increase as the database of discovered issues grows.

In addition to static analysis, CertiK performs formal verification of code to ensure it meets its requirements. Formal verification is a mathematical process used to prove that a program functions as anticipated by expressing program properties and expected behavior as mathematical formulas and using automated tools to verify them.

Fuzzing, also known as fuzz testing, is a technique used for software testing that involves injecting invalid, uncommon, or arbitrary data into a program. The software is then observed for any crashes, failures in built-in assertions, or potential vulnerabilities that may arise as a result of this input.

How Deep Learning Can Enhance Traditional Software Security

Despite their strengths, current vulnerability identification techniques often encounter challenges out in the wild, especially when performing large-scale smart contract audits. For example, it is not feasible to use manual inspection to classify the bugs in the CertiK database of manual audit reports with over 60,000 findings. Aside from the sheer number of bugs, the diversity of natural language used to describe them renders traditional phrase matching ineffective.

As another instance of a scaling challenge, static tools generate findings that must be manually verified by security engineers to eliminate false positives and improve the quality of results. False positives occur when a static tool identifies a benign or non-critical issue as a bug. Often, false positives cause developers to waste time and effort investigating and fixing issues that do not exist. In some cases, false positives can also lead to legitimate bugs being overlooked, as developers may become desensitized to the large number of false positives generated by the tool.

Large projects also pose challenges to formal verification. There are usually white papers and design documents in English describing what smart contracts are supposed to do. Extracting mathematical rules from the white papers, design documents, and programs is a difficult and time-consuming process. Additionally, the complexity of smart contracts can also result in a large number of possible execution paths. With larger systems, fuzzing all possible paths becomes impossible.

Due to the challenges of current techniques, it is essential to find new and more efficient ways to improve smart contract and program security. Deep learning can alleviate the challenges associated with existing vulnerability detection techniques. Deep learning has been applied to vulnerability classification, specification inference, automated bug explanation, predicting program properties, reducing the number of false positives reported by traditional tools, and enhancing fuzzing. We will now examine how deep learning can be used to improve some existing techniques.

Code Classification

Code classification aims to classify code fragments (with or without their documentation) into a variety of classes. This classification can be a multi-class classification over the type of the bugs present in a piece of code (e.g., coding style, logical issue, mathematical issue, etc.) or the functionality of it (e.g., sorting algorithms, sending funds, etc.). There is an abundance of research in the area of code classification using deep learning which can greatly help navigate and analyze large databases of bug findings.

Clone Detection

In code clone detection, identical or similar pieces of source code or bytecode, known as clones, are found within or between software projects. Clone matches can help security engineers avoid missing previous findings when examining similar projects and functions. To achieve such a capability, for a pair of code fragments, a similarity score is measured; if the score exceeds a certain threshold, the pair is considered similar. Syntactic sugar and the diversity of coding styles can create problems for traditional (or rule-based) approaches to computing similarity scores. Nevertheless, a deep learning model can consider the complex semantics of code snippets or binaries to compute scores. There is even the possibility of detecting code matches between different programming languages.

Vulnerability Detection

An alternative or complementary approach to bug detection using static analysis is neural bug finding: finding bugs using deep learning techniques. In general, bug detection can be seen as a classification problem that can be addressed with deep neural networks trained on examples of buggy and non-buggy code. Researchers have proposed various instances of such a technology, including frameworks such as CodeTrek. CodeTrek uses a combination of traditional static analysis and relational databases integrated into deep learning to identify bugs in large, real-world projects. It is interesting to note that there are some bugs that traditional logic-based analyzers are fundamentally unable to detect. For instance, variable misuse bugs cannot be defined using logic rules. Therefore, even the most powerful static analysis framework cannot find them whereas various neural architectures have shown remarkable promise in spotting such bugs. It is especially useful since new types of attacks emerge frequently, and traditional rule-based approaches may not be able to keep up.

A significant challenge in vulnerability detection using deep learning is the lack of high-quality real-world data generated by human experts. In fact, some vulnerability detection models are trained using datasets that do not necessarily portray the real-world state of the vulnerabilities. CertiK's large database of previously-identified bugs provides a promising dataset for developing a learning-based vulnerability detection tool.

Prioritizing Bugs

In software development, prioritizing bugs helps code owners focus their resources on fixing the most critical bugs first. Deep learning can be used to assist in prioritizing bugs by analyzing bug reports and identifying patterns that indicate the severity of the issue. Such a model can be trained on a dataset of bug reports that have already been ranked, and it can learn to recognize patterns in the data that are associated with high-priority bugs. To ensure that the model is accurate and effective, it is imperative to use high-quality data for training and testing. This includes ensuring that the bug reports are labeled correctly and that the dataset includes a representative sample of bugs across a range of severity levels. At CertiK, security engineers assign severity scores to bug findings after manual inspection. These scores act as a ranking scheme. Such collection of high-quality data is invaluable for training an effective model.

Explaining Bugs

Deep learning can be used to explain bugs in software by analyzing log files and identifying patterns that indicate the root cause of the issue. This approach can help developers diagnose and fix bugs more quickly and accurately, saving time and improving the quality of software. A deep learning-based model can generate natural language explanations for software bugs by learning from a large corpus of bug fixes. To generate a natural language explanation for a bug, a model must leverage structural and semantic information about the program and bug patterns. This can be particularly useful in large and complex software systems, where it can be difficult and time-consuming to manually diagnose and fix every bug. Additionally, a bug-finding model that provides reasonable bug explanations provides the transparency needed to build trust in the community of security engineers and security experts to rely on in real-world and high-stake conditions.

Reducing False Positives

False positives are a common challenge for bug finding tools. As we mentioned earlier, false positives occur when a static tool identifies a benign or non-critical issue as a bug. Deep learning can be used to help reduce the false positive rate in these tools. To determine whether a new error report will also be a false positive, a deep learning-based model can be used to discover the program structures that provide false alarms in a static tool. Deep learning techniques can be applied to discover such correlations and to build a true-positive versus false-positive classifier.

Specification Inference

The challenging step in applying formal verification is extracting the intended behavior of software in the form of logical/mathematical formulas using its source code and/or documentation. Automatically inferring this behavior, which is known as specification inference, can remedy this challenge. Researchers have proposed various approaches to specification inference, such as inferring loop invariants. In one study, the authors presented a method for automatically determining assertions for specific program points using program execution traces. The approach is based on the use of deep neural networks and has shown promising results in detecting real-world software errors. Another study by Prof. Ronghui Gui, CertiK’s co-founder, and other researchers at Columbia University proposes a deep neural network approach to infer loop invariants, which, again, models loop behavior from program execution traces. The authors showed that their method outperforms state-of-the-art techniques in terms of accuracy and efficiency. Additionally, some studies have focused on learning nonlinear loop invariants, which can be challenging to infer using traditional methods.

Improving Fuzzing

Deep learning can be used to improve program fuzzing by leveraging its ability to learn patterns and extract features from large amounts of data. Deep learning is able to enhance fuzzing by guiding the generation of inputs towards more interesting instances that are more likely to trigger crashes. Models can be trained to learn which input parameters or sequences of inputs are more likely to cause a crash or trigger a buffer overflow. This can potentially improve the coverage of the fuzzer by exercising different parts of the program. This can help discover bugs or vulnerabilities that might have been missed by traditional fuzzing techniques.

Conclusion

The application of deep learning to improve software security has great potential. The technology has already demonstrated its efficacy in several areas, including vulnerability detection, code classification, clone detection, and more. Traditional software security techniques, such as heuristics, static analysis, formal verification, and fuzzing, have their strengths but also face scaling challenges. Deep learning can provide more efficient and effective ways to address these challenges and enhance the security of smart contracts and programs. Although research into deep learning's application to software security is still in its infancy, there are exciting opportunities for continued innovation and development in this area. As the industry continues to grow and expand, the integration of software security and deep learning technologies will play an increasingly pivotal role. This will enable us to improve the security and reliability of software systems.

Read more: https://www.certik.com/resources/blog/3cBFKLdUbaRlskx4IKApuQ-how-ai-can-enhance-cybersecurity-a-primer-on-deep-learning-and-its

Comments

All Comments

Recommended for you

  • Cointime May 5th News Express

    1.The Federal Reserve reduced its balance sheet by $77 billion in April, and the size of its balance sheet fell below $7.4 trillion2.Former Bitmex CEO: Bitcoin will trade between $60,000 and $70,000 before August 3.SLERF total destruction exceeds 7 million USD4.ether.fi large staker initiates pledge withdrawal application for 37,140 ETH5.Web3 digital asset company Alpha Transform Holdings makes strategic investments in Arhasi and Cloudbench 6.A trader spent 402 ETH to buy 732,326 FRIEND, with an unrealized profit of $653,0007.A certain address has sold a total of 677,197 FRIEND airdrops through BunnySwap, making a profit of approximately $1.15 million8.A multi-signature wallet withdrew 915.85 billion PEPE from Binance9.The NFT project Blob team engraved the rune EPIC•EPIC•EPIC•EPIC on the Epic Satoshi block of Bitcoin’s fourth halving10.On-Chain Analyst Predicts Six to Twelve Months of 'Parabolic Advance' for Bitcoin

  • Sui Network addresses claims about its token supply

    Sui maintains that its tokenomics are sound and that it uses reputable third parties to handle token storage.

  • Cointime May 4th News Express

    1. Hong Kong Bitcoin Spot ETF has held 4,218 BTC since its listing three days ago

  • Blockchain Asset Management announces launch of a dedicated blockchain fund for accredited investors

    Blockchain Asset Management, a cryptocurrency fund with a scale of $100 million, announced the launch of an exclusive blockchain fund for qualified investors. The specific amount of funds raised by the fund has not been disclosed yet, but it is said to have reached "eight figures", which means it is in the tens of millions of dollars. In addition, the investment threshold for the new fund is $100,000, and all investors are required to meet the approved standards (annual income exceeding $200,000, net assets exceeding $1 million).

  • Renault's BWT Alpine F1 Team announces partnership with ApeCoinDAO

    The BWT Alpine F1 team under Renault announced a partnership with ApeCoinDAO on X platform, which will introduce APE into the Alpine F1 ecosystem and collaborate with global token holders to launch peripheral products and digital assets inspired by the first ApeCoin. It is reported that according to the cooperation between the two parties, in the future, BAYC NFTs may be able to wear equipment and clothing with the Alpine team logo.

  • BTC breaks through $63,000

    The market shows BTC has broken through $63,000 and is currently trading at $63,014.9, with a daily increase of 6.11%. The market is volatile, so please exercise caution in risk management.

  • The total gas consumption on the Base chain exceeds 10,000 ETH

    According to the blockchain analysis platform Dune Analytics, the total gas consumption on the Base chain has exceeded 10,000 ETH, reaching 10,839.5062 ETH at the time of writing (equivalent to over $33.6 million at current prices). The average gas usage amount is about $0.1754 per transaction (0.000059661 ETH), and the total number of blocks has reached 13.41 million, with an average transaction volume of about 14.63 transactions per block. In addition, the data shows that the total transaction volume on the Base chain has exceeded 196.2 million, with over 8.366 million users and over 184 million user transactions at the time of writing. Furthermore, the total number of contracts created on the Base chain has exceeded 64 million, reaching 64,056,573 in the current period.

  • A wallet received 2,000 ETH from Alemeda/FTX

    As monitored by The Data Nerd, 6 hours ago, wallet 0xaEa received 2,000 ETH (approximately $6.23 million) from Alemeda/FTX. Within a week, it received a total of 8,000 ETH (approximately $24.71 million) from Alameda and deposited 6,000 ETH into Binance.

  • A single transaction with a transaction fee of up to 1.5 BTC appeared on the Bitcoin chain

    According to on-chain data tracking service monitoring , there has been a single transaction on the Bitcoin network with a transaction fee as high as 1.5 BTC, worth about $100,254. It is reported that the sender of the transaction is an address starting with "bc1p4n" and the recipient is an address starting with "bc1pqv".

  • 2 wallets deposited 211 billion SHIB into Coinbase within 10 hours

    According to The Data Nerd's monitoring, within 10 hours, 2 wallets (with the same amount of SHIB) deposited a total of 211 billion SHIB (about 5.16 million US dollars) into Coinbase. These wallets accumulated these SHIBs last week, and if sold at the current price, it would cause a small loss (about 120,000 US dollars).