Cointime

Download App
iOS & Android

A Feature Engineering Case Study in Consistency and Fraud Detection

Validated Venture

Main Takeaways

  • As the world’s largest crypto exchange, it’s crucial we have a risk detection system that is fast yet doesn’t compromise on accuracy. 
  • The challenge we encountered was ensuring our models always used up-to-date information, especially when detecting suspicious account activity in real-time. 
  • To achieve stronger feature consistency and greater production speed, we now make reasonable assumptions about our data and combine our batch and streaming pipelines. 

Discover how our feature engineering pipeline creates strong, consistent features to detect fraudulent withdrawals on the Binance platform. 

Inside our machine learning (ML) pipeline — which you can learn more about in a previous article — we recently built an automated feature engineering pipeline that funnels raw data into reusable online features that can be shared across all risk-related models. 

In the process of building and testing this pipeline, our data scientists encountered an intriguing feature consistency problem: How do we create accurate sets of online features that dynamically change over time?

Consider this real-world scenario: A crypto exchange — in this case, Binance — is trying to detect fraudulent withdrawals before money leaves the platform. One possible solution is to add a feature to your model that detects time lapsed since the user’s last specific operation (e.g., log in or bind mobile). It would look something like this:

user_id|last_bind_google_time_diff_in_days|...

1|3.52|...

The Challenge of Implementation

The number of keys required to calculate and update features in an online feature store is impractical. Using a streaming pipeline, such as Flink, would be impossible since it can only calculate users with records coming into Kafka at the present moment. 

As a compromise, we could use a batch pipeline and accept some delay. Let’s say a model can fetch features from an online feature store and perform real-time inference in around one hour. At the same time, if it takes one hour for a feature store to finish calculating and ingesting data, the batch pipeline would — in theory — solve the problem.

Unfortunately, there’s one glaring issue: using such a batch pipeline is highly time-consuming. This makes finishing within one hour unfeasible when you’re the world’s largest crypto exchange dealing with approximately a hundred million users and a TPS limit for writes.  

We’ve found that the best practice is to make assumptions about our users, thereby shrinking the amount of data going into our feature store. 

Easing the Issue With Practical Assumptions

Online features are ingested in real-time and are constantly changing because they represent the most up-to-date version of an environment. With active Binance users, we cannot afford to use models with outdated features.

It’s imperative that our system flags any suspicious withdrawals as soon as possible. Any added delay, even by a few minutes, means more time for a malicious actor to get away with their crimes. 

So, for the sake of efficiency, we assume recent logins hold relatively higher risk:

  • We find (250 days + 0.125[3/24 delay] day) produces relatively smaller errors than (1 day +  0.125[3/24 delay] day).
  • Most operations won’t exceed a certain threshold; let’s say 365 days. To save time and computing resources, we omit users who haven’t logged in for over a year. 

Our Solution

We use lambda architecture, which entails a process where we combine batch and streaming pipelines, to achieve stronger feature consistency.

What does the solution look like conceptually?

  • Batch Pipeline: Performs feature engineering for a massive user base.
  • Streaming Pipeline: Remedies batch pipeline delay time for recent logins.

What if a record is ingested into the online feature store between the delay time in batch ingestion?

Our features still maintain strong consistency even when records are ingested during the one-hour batch ingestion delay period. This is because the online feature store we use at Binance returns the latest value based on the event_time you specify when retrieving the value.

Comments

All Comments

Recommended for you

  • SBF ordered to forfeit more than $11 billion

    SBF has been ordered to confiscate more than 11 billion US dollars. SBF has now been sentenced to 25 years in prison.

  • Former CEO of FTX and Alameda Research Sentenced to 25 Years in Prison for Fraud and Money Laundering

    Sam Bankman-Fried, the co-founder and former CEO of FTX and Alameda Research, has been sentenced to 25 years in prison for fraud and money laundering. The judge criticized Bankman-Fried's behavior during the trial and deemed a 25-year sentence to be sufficient. Bankman-Fried's sentence may send a message to the crypto industry and there is no possibility of parole, but he may earn "good time" credit for good behavior while incarcerated. Bankman-Fried was found to have misused over $8 billion in customer funds and will be serving time in prison for his actions. The trial emphasized the importance of not using customers' funds without their knowledge or approval.

  • Web3 AI training company FLock raises $6 million in seed funding

    Web3 artificial intelligence training company FLock has raised $6 million in seed funding led by Lightspeed Faction and Tagus Capital. FLock will use these funds to develop its team and build a federated learning-driven artificial intelligence training platform.

  • Prisma: Vault owners need to prohibit delegation of contracts related to LST and LRT

    The LSD stablecoin protocol Prisma Finance stated in a post that for vault owners, please prohibit delegating authorization of the LST contract starting with 0xcC72 and the LRT contract starting with 0xC3eA.

  • MAS: Singapore is working on global first-tier fund tokenization regulation

    Chia Der Jiun, Managing Director of the Monetary Authority of Singapore, introduced some fund tokenization pilots at an event for asset managers. These pilots are part of the Project Guardian and MAS Global Layer 1 (GL1) tokenization plans. Chia Der Jiun emphasized the advantages of tokenization in real-time settlement and process automation, which can improve efficiency and achieve greater customization of funds. UK asset management company Schroders and fund distribution platform Calastone are exploring this as part of the Project Guardian public blockchain trial in Singapore. A recent survey by Calastone showed that 96% of asset management companies in the Asia-Pacific region plan to launch tokenized products within three years. Chia stated that as these Project Guardian pilot projects approach commercialization, MAS is working with the pilot project managers to study the legal and regulatory treatment and impact of tokenized investment funds."

  • Indonesia's Financial Services Authority to Regulate Crypto Industry in 2025 with Evaluation in Regulatory Sandbox

    Indonesia's Financial Services Authority (OJK) will take over regulation of the crypto industry from the commodities agency Bappebti. Crypto firms must undergo evaluation in a regulatory sandbox before being licensed to operate in the country. The OJK aims to prioritize consumer protection and education, and firms operating without evaluation in the sandbox will be considered illegal. The sandbox provides a safe and isolated environment for testing and innovation development, helping to enhance security and responsible management in the financial sector. Once under OJK's oversight, crypto assets will likely be reclassified as financial instruments.

  • The Shenzhen Illegal Fund Raising Prevention Office issued a risk warning on the "DDO digital options" business

    The Shenzhen Office for Preventing and Dealing with Illegal Fundraising issued a risk warning regarding the "DDO digital option" business. The activities related to the DDO digital option business conducted in the name of Dingyifeng International are essentially the issuance and trading of virtual currencies. According to the "Notice on Further Preventing and Dealing with Risks of Speculation in Virtual Currency Trading" jointly issued by ten departments including the People's Bank of China in September 2021, it is clear that virtual currency-related business activities are illegal financial activities, and overseas virtual currency exchanges providing services to residents within China are also illegal financial activities. The activities conducted by Dingyifeng International in the name of serving residents within China are suspected of illegal fundraising and other illegal financial activities. Our office has organized relevant departments to carry out work, resolutely deal with illegal fundraising and criminal activities, and seriously investigate the legal responsibilities of relevant personnel. (Shenzhen Local Financial Supervision and Administration Bureau)

  • The Hong Kong Legislative Council plans to review the relevant stable currency consultation and sandbox legislation at the end of this year or next year

    Hong Kong legislator Wu Jiezhuang revealed that Hong Kong will release stablecoin consultation and sandbox (computer security mechanism), which will allow the industry to innovate digital asset projects in the sandbox environment. Relevant legislation will be reviewed in the Legislative Council at the end of this year or next year, which will help the entire digital asset industry ecosystem. Hong Kong has been improving the digital asset (virtual asset) market on different legal levels. Last year, there were regulations on virtual currency trading platforms and issuance systems.

  • Vitalik: Humanity needs to create a world where blockchain and artificial intelligence work together

    Vitalik Buterin, the founder of Ethereum, stated at BiddleAsia 2024 held at Signiel Seoul in the Songpa district on March 28 that artificial intelligence is a huge market and its importance is increasing day by day. We need to create a world where blockchain and artificial intelligence work together. Artificial intelligence can now create applications with 100 to 500 lines of code. Vitalik also stated that the ability to write 10,000 lines of code can eliminate most of the bugs in the Ethereum virtual machine.

  • South Korean RWA blockchain technology development company PARAMETA completed a new round of financing of approximately US$7.5 million

    South Korean RWA blockchain technology development company PARAMETA announced the completion of a new round of financing of KRW 9 billion (approximately $7.5 million), with Shinhan Hyperconnect Investment Fund under Shinhan Venture Investment and Korea Asset Investment & Securities participating. As of now, the company's total financing has reached KRW 25 billion (approximately $20.8 million). PARAMETA plans to use this investment to expand its own blockchain technology research and development capabilities to meet RWA technology needs and expand from core technologies such as engines/chains to service applications. Relevant services are expected to be launched within the year.