[Back to Intro]

ADVANCES IN FINANCIAL MACHINE LEARNING

Academic materials for Cornell University's ORIE 5256 course.

AUTHORS YEAR TITLE ABSTRACT
Lopez de Prado, Marcos 2018 Advances in Financial Machine Learning: Lecture 1/10

The Pitfalls of Econometric Analysis.

Lopez de Prado, Marcos 2018 Advances in Financial Machine Learning: Lecture 2/10

Financial Applications of Machine Learning.

Lopez de Prado, Marcos 2018 Advances in Financial Machine Learning: Lecture 3/10

Data Analysis.

Lopez de Prado, Marcos 2018 Advances in Financial Machine Learning: Lecture 4/10

Modelling.

Lopez de Prado, Marcos 2018 Advances in Financial Machine Learning: Lecture 5/10

Backtesting I.

Lopez de Prado, Marcos 2018 Advances in Financial Machine Learning: Lecture 6/10

Backtesting II.

Lopez de Prado, Marcos 2018 Advances in Financial Machine Learning: Lecture 7/10

Machine Learning Portfolio Construction.

Lopez de Prado, Marcos 2018 Advances in Financial Machine Learning: Lecture 8/10

Useful Financial Features.

Lopez de Prado, Marcos 2018 Advances in Financial Machine Learning: Lecture 9/10

High-Performance Computing.

Lopez de Prado, Marcos 2018 Advances in Financial Machine Learning: Lecture 10/10

The 7 Reasons Most Machine Learning Funds Fail.

Lopez de Prado, Marcos 2019 Advances in Financial Machine Learning: Numerai's Tournament

Preparation for Numerai's Tournament.

 

RECENT SEMINARS AND ACADEMIC LECTURES

The best part of giving a seminar is the opportunity to meet people who have also thought deeply about that topic, and may have reached different conclusions. I have found these encounters very productive in advancing my own research.

AUTHORS YEAR TITLE ABSTRACT
Lopez de Prado, Marcos 2024 The Role of Causal Inference in the Scientific Method

Every students of statistics learns that correlation does not imply causation. Students are rarely exposed to the reasons behind this statement. This seminar discusses the central role that causality plays in the scientific method, and how the standard statistical toolkit has led to numerous false discoveries. Finally, the seminar proposes several solutions the replication crisis that currently afflicts scientific research.

Lopez de Prado, Marcos; Zoonekynd, Vincent 2024 Why Has Factor Investing Failed?

We show that: (1) factor strategies that over-control for colliders can yield systematic losses, even if all correlations remain constant and the risk premia are estimated with the correct sign; and (2) specification errors explain the erratic performance of factor investing strategies.

Lopez de Prado, Marcos 2023 Can Factor Investing Become Scientific?

I differentiate between type-A and type-B spurious claims, and explain how both types prevent factor investing from advancing beyond its current pre-scientific stage. This seminar analyzes the current state of causal confusion in the factor investing literature, and proposes solutions with the potential to transform factor investing into a truly scientific discipline.

Lopez de Prado, Marcos 2021 Escaping The Sisyphean Trap: How Quants Can Achieve Their Full Potential

While investment firms have attracted scientific talent, they have done a poor job at developing it. Firms hire specialists, but entice them to become generalists (e.g., portfolio managers). Under the ubiquitous silo/platform structure, quants succumb to the Sisyphean trap, and do not achieve their full potential.

Lopez de Prado, Marcos 2021 Detection of False Investment Strategies through FWER and FDR

This seminar explains how to detect false investment strategies by controlling for the familywise error rate (FWER) and the false discovery rate (FDR) of an organization.

Lopez de Prado, Marcos 2020 Interpretable Machine Learning: Shapley Values

This seminar demonstrates the use of Shapley values to interpret the outputs of ML models. With the help of interpretability methods, ML is becoming the primary tool of scientific discovery, through induction as well as abduction.

Lopez de Prado, Marcos 2020 Three Machine Learning Solutions to the Bias-Variance Dilemma

This seminar explores why machine learning algorithms are generally more appropriate for financial datasets, how they outperform classical estimators, and how they solve the bias-variance dilemma.

Lipton, Alexander; Lopez de Prado, Marcos 2020 Exit Strategies for COVID-19: An Application of the K-SEIR Model

We introduce a new mathematical model (called K-SEIR) to simulate the propagation of epidemics, and evaluate the outcomes of various government interventions. Unlike the standard SEIR model, K-SEIR computes the dynamics of K population groups with different mortality rates, thus allowing the implementation of targeted lockdowns and flexible exit strategies.

Lopez de Prado, Marcos 2020 Three Quant Lessons from COVID-19

Many quantitative firms have suffered substantial losses as a result of the COVID-19 selloff. In this note we highlight three lessons that quantitative researchers could learn.

Lopez de Prado, Marcos 2020 Overfitting: Causes and Solutions

When used incorrectly, the risk of machine learning (ML) overfitting is extremely high. However, ML counts with sophisticated methods to prevent: (a) train set overfitting, and (b) test set overfitting. Thus, the popular belief that ML overfits is false. A more accurate statement would be that: (1) in the wrong hands, ML overfits, and (2) in the right hands, ML is more robust to overfitting than classical methods.

Lopez de Prado, Marcos 2020 Clustered Feature Importance

In classical statistics, p-values are routinely used to determine the variables involved in a phenomenon. However, p-values suffer from various limitations that often lead to false positives and false negatives. Machine learning offers powerful feature importance methods that overcome many of the limitations of p-values.

Lopez de Prado, Marcos 2020 Statistical Association

Despite its popularity among economists, correlation has many known limitations in the contexts of financial studies In this seminar we will explore more modern measures of codependence, based on Information Theory, which overcome some of the limitations of correlations.

Lopez de Prado, Marcos 2020 Clustering

Many problems in finance require the clustering of variables or observations. Despite its usefulness, clustering is almost never taught in Econometrics courses. In this seminar we review two general clustering approaches: partitional  and hierarchical.

Lopez de Prado, Marcos 2019 Machine Learning Asset Allocation

We introduce the nested clustered optimization algorithm (NCO), a method that tackles both sources of efficient frontier's instability. Monte Carlo experiments demonstrate that NCO can reduce the estimation error by up to 90%, relative to traditional portfolio optimization methods (e.g., Black-Litterman).

Lopez de Prado, Marcos 2019 Quantitative Research Through Investment Tournaments

This presentation explores how data and experience barriers impact the quality of quantitative research, and how investment tournaments can help deliver better investment outcomes by overcoming those two barriers.

Lopez de Prado, Marcos 2019 The 7 Reasons Most Econometric Investments Fail

This presentation reviews the main reasons why investment strategies discovered through econometric methods fail. As a solution, it proposes the modernization of the statistical methods used by financial firms and academic authors.

Lopez de Prado, Marcos 2018 Type I and Type II Errors in Finance

Most papers in the financial literature control for Type I errors (false positive rate), while ignoring Type II errors (false negative rate). This is a mistake, because a low Type I error can only be achieved at the cost of a high Type II error. In this presentation we derive analytical expressions for both, after correcting for Non-Normality, Sample Length and Multiple Testing.

Lopez de Prado, Marcos 2018 Ten Financial Applications of Machine Learning

In this presentation, we review a few practical cases where machine learning solves financial tasks better than traditional methods.

Lopez de Prado, Marcos 2018 Market Microstructure in the Age of Machine Learning

In this presentation, we analyze the explanatory (in-sample) and predictive (out-of-sample) importance of some of the best known market microstructural features. Our conclusions are drawn over the entire universe of the 87 most liquid futures worldwide, covering all asset classes, going back through 10 years of tick-data history.

Lopez de Prado, Marcos 2018 A Practical Solution to the Multiple-Testing Crisis in Financial Research

Most discoveries in empirical finance are false, as a consequence of selection bias under multiple testing. This may explain why so many hedge funds fail to perform as advertised or as expected, particularly in the quantitative space. These false discoveries may have been prevented if academic journals and investors demanded that any reported investment performance incorporates the false positive probability, adjusted for selection bias under multiple testing.

Lopez de Prado, Marcos 2018 Financial Machine Learning in 10 Minutes

Most publications in Financial ML seem concerned with forecasting prices. While these are worthy endeavors, Financial ML can offer so much more. In this presentation, we review a few important applications that go beyond price forecasting.

Lopez de Prado, Marcos 2018 How the Sharpe Ratio Died, But Came Back to Life

Selection bias under multiple backtesting makes it impossible to assess the probability that a strategy is false. As a consequence, most quantitative firms invest in false positives. The goal of this presentation is to explain a practical method to prevent that selection bias leads to false positives.

Lopez de Prado, Marcos 2018 The Myth and Reality of Financial Machine Learning

In recent years, Machine Learning (ML) has been able to master tasks that until now only a few human experts could perform. Some of the most successful hedge funds in history apply ML every day. However, myths about Financial ML have proliferated. In this presentation we will review the rationale behind those claims.

Lopez de Prado, Marcos 2017 The 7 Reasons Most Machine Learning Funds Fail

The rate of failure in quantitative finance is high, and particularly so in financial machine learning. The few managers who succeed amass a large amount of assets, and deliver consistently exceptional performance to their investors. However, that is a rare outcome, for reasons that will become apparent in this presentation. Over the past two decades, I have seen many faces come and go, firms started and shut down. In my experience, there are 7 critical mistakes underlying most of those failures.

Lopez de Prado, Marcos 2017 Supercomputing for Finance: A gentle introduction

This presentation introduces key concepts needed to operate a high-performance computing cluster.

Lopez de Prado, Marcos 2016 Mathematics & Economics: A Reality Check

Economics (and by extension finance) is arguably one of the most mathematical fields of research. However, economists’ choice of math may be inadequate to model the complexity of social institutions.

Lopez de Prado, Marcos 2016 Financial Quantum Computing

Quantum computers can be used to solve some of the hardest problems in Finance. In this presentation we discuss some applications.

Lopez de Prado, Marcos 2016 Building Diversified Portfolios that Outperform Out-Of-Sample

Mean-Variance portfolios are optimal in-sample, however they tend to perform poorly out-of-sample (even worse than the 1/N naïve portfolio!) We introduce a new portfolio construction method that substantially improves the Out-Of-Sample performance of diversified portfolios.

Lopez de Prado, Marcos 2015 Quantum Computing (in 5 minutes or less)

The purpose of our work is to show that, in the near future, Quantum Computing algorithms may solve many currently intractable financial problems, and render obsolete many existing mathematical approaches.

Lopez de Prado, Marcos 2015 Multi-Period Integer Portfolio Optimization Using a Quantum Annealer

Computing a trading trajectory in general terms is a NP-Complete problem. This note illustrates how quantum computers can solve this problem in the most general terms.

Lopez de Prado, Marcos 2015 Backtesting

Empirical Finance is in crisis: Our most important “discovery” tool is historical simulation, and yet, most backtests published in the top Financial journals are wrong. We present practical solutions to this problem.

Lopez de Prado, Marcos 2015 Illegitimate Science: Why Most Empirical Discoveries in Finance Are Likely Wrong, and What Can Be Done About It

The proliferation of false discoveries is a pressing issue in Financial research. For a large enough number of trials on a given dataset, it is guaranteed that a model specification will be found to deliver sufficiently low p-values, even if the dataset is random. Most academic papers and investment proposals do not report the number trials involved in a discovery. The implication is that most published empirical discoveries in Finance are likely to be false. This has severe implications, specially with regards to the peer-review process and the Backtesting of investment proposals. We make several proposals on how to address these problems.

Lopez de Prado, Marcos 2014 Optimal Trading Rules Without Backtesting

Calibrating a trading rule using a historical simulation (also called backtest) contributes to backtest overfitting, which in turn leads to underperformance. In this paper we propose a procedure for determining the optimal trading rule (OTR) without running alternative model configurations through a backtest engine.

Lopez de Prado, Marcos 2014 Deflating the Sharpe Ratio

The Deflated Sharpe Ratio (DSR) corrects for two leading sources of performance inflation: Non-Normally distributed returns, and selection bias under multiple testing.

Lopez de Prado, Marcos 2014 Stochastic Flow Diagrams add Topology to the Econometric Toolkit

Just as Geometry could not help Euler solve the “Seven Bridges of Königsberg” problem, Econometric analysis or Linear Algebra alone are not able to answer many key questions about how financial markets coordinate. Statistical tables are detailed in terms of reporting estimated values, however that level of detail also obfuscates the logical relationships between variables. Stochastic Flow Diagrams (SFDs) add Topology to the Statistical and Econometric toolkit. SFDs are more insightful than the standard collection of statistical tables because SFDs shift the focus from the algebraic solution of the system to its logical structure, its topology.

Lopez de Prado, Marcos 2013 What to look for in a Backtest

A large number of quantitative hedge funds have historically sustained losses. In this study we argue that the back-testing methodology at the core of their strategy selection process may have played a role. Most firms and portfolio managers rely on back-tests (or historical simulations of performance) to allocate capital to investment strategies. If a researcher tries a large enough number of strategy configurations, a back-test can always be fit to any desired performance for a fixed sample length. Thus, there is a minimum back-test length (MinBTL) that should be required for a given number of trials. Standard statistical techniques designed to prevent regression over-fitting, such as hold-out, are inaccurate in the context of back-test evaluation. The practical totality of published back-tests do not report the number of trials involved, and thus we must assume those results may be overfit.

Lopez de Prado, Marcos 2013 How long does it take to recover from a Drawdown?

Investment management firms routinely hire and fire employees based on the performance of their portfolios. Such performance is evaluated through popular metrics that assume IID Normal returns, like Sharpe ratio, Sortino ratio, Treynor ratio, Information ratio, etc. However, investment returns are far from IID Normal. We find that firms evaluating performance through Sharpe ratio are firing up to three times more skillful managers than originally targeted. This is very costly to firms and investors, and is a direct consequence of wrongly assuming that returns are IID Normal. An implication is that an accurate performance evaluation methodology is worth a substantial portion of the fees paid to hedge funds.

Lopez de Prado, Marcos

2013

A Journey through the "Mathematical Underworld" of Portfolio Optimization

It has been estimated that the current size of the asset management industry is approximately US$58 trillion. Portfolio optimization is one of the problems most frequently encountered by financial practitioners. It appears in various forms in the context of Trading, Risk Management and Capital Allocation. The Critical Line Algorithm (CLA) is the only algorithm specifically designed for inequality-constrained portfolio optimization problems, which guarantees that the exact solution is found after a predefined number of iterations. Surprisingly, open-source implementations of CLA in a scientific language appear to be inexistent or unavailable. The lack of publicly available CLA software, commercially or open-source, means that trillions of dollars are likely to be suboptimally allocated as a result of practitioners using general-purpose quadratic optimizers. For a video of this presentation, follow this link.

Lopez de Prado, Marcos

2012

Low-Frequency Traders in a High-Frequency World: A Survival Guide

Multiple empirical studies have shown that Order Flow Imbalance has predictive power over the trading range. The PIN Theory (Easley et al. [1996]) reveals the Microstructure mechanism that explains this observed phenomenon. VPIN is a High Frequency estimate of PIN, which can be used to detect the presence of Informed Traders.

Lopez de Prado, Marcos

2012

Managing Risks in a Risk-On/Risk-Off Environment

Every structure has natural frequencies. Minor shocks in these frequencies can bring down any structure, e.g. a bridge. An Investment Universe also has natural frequencies, characterized by its eigenvectors. A concentration of risks in the direction of any such eigenvector exposes a portfolio to the possibility of greater than expected losses (indeed, maximum risk for that portfolio size), even if that portfolio is below the risk limits. This is particularly dangerous in a risk-on/risk-off regime. Managing Risk is not only about limiting its amount, but also controlling how this amount is concentrated around the natural frequencies of the investment universe.

Lopez de Prado, Marcos

2012

The Sharp Razor: Performance Evaluation with Non-Normal Returns

Because the Sharpe ratio only takes into account the first two moments, it wrongly “translates” skewness and excess kurtosis into standard deviation. As a result: (a) It deflates the skill measured on “well-behaved” investments (positive skewness, negative excess kurtosis). (b) It inflates the skill measure on “badly-behaved” investments (negative skewness, positive excess kurtosis). Sharpe ratio estimates need to account for higher moments, even if investors only care about two moments (Markowitz framework).

Lopez de Prado, Marcos

2012

Concealing the Trading Footprint: Optimal Execution Horizon

Market Makers adjust their trading range to avoid being adversely selected by Informed Traders; Informed Traders reveal their future trading intentions when they alter the Order Flow; Consequently, Market Makers’ trading range is a function of the Order Flow imbalance. The Optimal Execution Horizon (OEH) algorithm presented here takes into account order imbalance to determine the optimal participation rate.

Lopez de Prado, Marcos

2012

Portfolio Oversight: An Evolutionary Approach

An analogue can be made between: (a) the slow pace at which species adapt to an environment, which often results in the emergence of a new distinct species out of a once homogeneous genetic pool, and (b) the slow changes that take place over time within a fund, with several co-existing investment style which mutate over time. A fund’s track record provides a sort of genetic marker, which we can use to identify mutations. The biometric procedure presented here can detect the emergence of a new investment style within a fund’s track record. In doing so, we answer the question: “What is the probability that a particular PM’s performance is departing from the reference distribution used to allocate her capital?”