ADVANCES IN FINANCIAL MACHINE LEARNING
Academic materials for Cornell University's ORIE 5256 course.
RECENT SEMINARS AND ACADEMIC LECTURES
The best part of giving a seminar is the opportunity to meet people who have also thought deeply about that topic, and may have reached different conclusions. I have found these encounters very productive in advancing my own research.
AUTHORS  YEAR  TITLE  ABSTRACT 
Lopez de Prado, Marcos  2020  Interpretable Machine Learning: Shapley Values 
This seminar demonstrates the use of Shapley values to interpret the outputs of ML models. With the help of interpretability methods, ML is becoming the primary tool of scientific discovery, through induction as well as abduction. 
Lopez de Prado, Marcos  2020  Three Machine Learning Solutions to the BiasVariance Dilemma 
This seminar explores why machine learning algorithms are generally more appropriate for financial datasets, how they outperform classical estimators, and how they solve the biasvariance dilemma. 
Lipton, Alexander; Lopez de Prado, Marcos  2020  Exit Strategies for COVID19: An Application of the KSEIR Model 
We introduce a new mathematical model (called KSEIR) to simulate the propagation of epidemics, and evaluate the outcomes of various government interventions. Unlike the standard SEIR model, KSEIR computes the dynamics of K population groups with different mortality rates, thus allowing the implementation of targeted lockdowns and flexible exit strategies. 
Lopez de Prado, Marcos  2020  Three Quant Lessons from COVID19 
Many quantitative firms have suffered substantial losses as a result of the COVID19 selloff. In this note we highlight three lessons that quantitative researchers could learn. 
Lopez de Prado, Marcos  2020  Overfitting: Causes and Solutions 
When used incorrectly, the risk of machine learning (ML) overfitting is extremely high. However, ML counts with sophisticated methods to prevent: (a) train set overfitting, and (b) test set overfitting. Thus, the popular belief that ML overfits is false. A more accurate statement would be that: (1) in the wrong hands, ML overfits, and (2) in the right hands, ML is more robust to overfitting than classical methods. 
Lopez de Prado, Marcos  2020  Clustered Feature Importance 
In classical statistics, pvalues are routinely used to determine the variables involved in a phenomenon. However, pvalues suffer from various limitations that often lead to false positives and false negatives. Machine learning offers powerful feature importance methods that overcome many of the limitations of pvalues. 
Lopez de Prado, Marcos  2020  Codependence 
Despite its popularity among economists, correlation has many known limitations in the contexts of financial studies In this seminar we will explore more modern measures of codependence, based on Information Theory, which overcome some of the limitations of correlations. 
Lopez de Prado, Marcos  2020  Clustering 
Many problems in finance require the clustering of variables or observations. Despite its usefulness, clustering is almost never taught in Econometrics courses. In this seminar we review two general clustering approaches: partitional and hierarchical. 
Lopez de Prado, Marcos  2019  Machine Learning Asset Allocation 
We introduce the nested clustered optimization algorithm (NCO), a method that tackles both sources of efficient frontier's instability. Monte Carlo experiments demonstrate that NCO can reduce the estimation error by up to 90%, relative to traditional portfolio optimization methods (e.g., BlackLitterman). 
Lopez de Prado, Marcos  2019  The Past and Future of Quantitative Research 
This presentation explores how data and experience barriers impact the quality of quantitative research, and how investment tournaments can help deliver better investment outcomes by overcoming those two barriers. 
Lopez de Prado, Marcos  2019  The 7 Reasons Most Econometric Investments Fail 
This presentation reviews the main reasons why investment strategies discovered through econometric methods fail. As a solution, it proposes the modernization of the statistical methods used by financial firms and academic authors. 
Lopez de Prado, Marcos  2018  Type I and Type II Errors in Finance 
Most papers in the financial literature control for Type I errors (false positive rate), while ignoring Type II errors (false negative rate). This is a mistake, because a low Type I error can only be achieved at the cost of a high Type II error. In this presentation we derive analytical expressions for both, after correcting for NonNormality, Sample Length and Multiple Testing. 
Lopez de Prado, Marcos  2018  Ten Financial Applications of Machine Learning 
In this presentation, we review a few practical cases where machine learning solves financial tasks better than traditional methods. 
Lopez de Prado, Marcos  2018  Market Microstructure in the Age of Machine Learning 
In this presentation, we analyze the explanatory (insample) and predictive (outofsample) importance of some of the best known market microstructural features. Our conclusions are drawn over the entire universe of the 87 most liquid futures worldwide, covering all asset classes, going back through 10 years of tickdata history. 
Lopez de Prado, Marcos  2018  A Practical Solution to the MultipleTesting Crisis in Financial Research 
Most discoveries in empirical finance are false, as a consequence of selection bias under multiple testing. This may explain why so many hedge funds fail to perform as advertised or as expected, particularly in the quantitative space. These false discoveries may have been prevented if academic journals and investors demanded that any reported investment performance incorporates the false positive probability, adjusted for selection bias under multiple testing. 
Lopez de Prado, Marcos  2018  Financial Machine Learning in 10 Minutes 
Most publications in Financial ML seem concerned with forecasting prices. While these are worthy endeavors, Financial ML can offer so much more. In this presentation, we review a few important applications that go beyond price forecasting. 
Lopez de Prado, Marcos  2018  How the Sharpe Ratio Died, But Came Back to Life 
Selection bias under multiple backtesting makes it impossible to assess the probability that a strategy is false. As a consequence, most quantitative firms invest in false positives. The goal of this presentation is to explain a practical method to prevent that selection bias leads to false positives. 
Lopez de Prado, Marcos  2018  The Myth and Reality of Financial Machine Learning 
In recent years, Machine Learning (ML) has been able to master tasks that until now only a few human experts could perform. Some of the most successful hedge funds in history apply ML every day. However, myths about Financial ML have proliferated. In this presentation we will review the rationale behind those claims. 
Lopez de Prado, Marcos  2017  The 7 Reasons Most Machine Learning Funds Fail 
The rate of failure in quantitative finance is high, and particularly so in financial machine learning. The few managers who succeed amass a large amount of assets, and deliver consistently exceptional performance to their investors. However, that is a rare outcome, for reasons that will become apparent in this presentation. Over the past two decades, I have seen many faces come and go, firms started and shut down. In my experience, there are 7 critical mistakes underlying most of those failures. 
Lopez de Prado, Marcos  2017  Supercomputing for Finance: A gentle introduction 
This presentation introduces key concepts needed to operate a highperformance computing cluster. 
Lopez de Prado, Marcos  2016  Mathematics & Economics: A Reality Check 
Economics (and by extension finance) is arguably one of the most mathematical fields of research. However, economists’ choice of math may be inadequate to model the complexity of social institutions. 
Lopez de Prado, Marcos  2016  Financial Quantum Computing 
Quantum computers can be used to solve some of the hardest problems in Finance. In this presentation we discuss some applications. 
Lopez de Prado, Marcos  2016  Building Diversified Portfolios that Outperform OutOfSample 
MeanVariance portfolios are optimal insample, however they tend to perform poorly outofsample (even worse than the 1/N naïve portfolio!) We introduce a new portfolio construction method that substantially improves the OutOfSample performance of diversified portfolios. 
Lopez de Prado, Marcos  2015  Quantum Computing (in 5 minutes or less) 
The purpose of our work is to show that, in the near future, Quantum Computing algorithms may solve many currently intractable financial problems, and render obsolete many existing mathematical approaches. 
Lopez de Prado, Marcos  2015  MultiPeriod Integer Portfolio Optimization Using a Quantum Annealer 
Computing a trading trajectory in general terms is a NPComplete problem. This note illustrates how quantum computers can solve this problem in the most general terms. 
Lopez de Prado, Marcos  2015  Backtesting 
Empirical Finance is in crisis: Our most important “discovery” tool is historical simulation, and yet, most backtests published in the top Financial journals are wrong. We present practical solutions to this problem. 
Lopez de Prado, Marcos  2015  Illegitimate Science: Why Most Empirical Discoveries in Finance Are Likely Wrong, and What Can Be Done About It 
The proliferation of false discoveries is a pressing issue in Financial research. For a large enough number of trials on a given dataset, it is guaranteed that a model specification will be found to deliver sufficiently low pvalues, even if the dataset is random. Most academic papers and investment proposals do not report the number trials involved in a discovery. The implication is that most published empirical discoveries in Finance are likely to be false. This has severe implications, specially with regards to the peerreview process and the Backtesting of investment proposals. We make several proposals on how to address these problems. 
Lopez de Prado, Marcos  2014  Optimal Trading Rules Without Backtesting 
Calibrating a trading rule using a historical simulation (also called backtest) contributes to backtest overfitting, which in turn leads to underperformance. In this paper we propose a procedure for determining the optimal trading rule (OTR) without running alternative model configurations through a backtest engine. 
Lopez de Prado, Marcos  2014  Deflating the Sharpe Ratio 
The Deflated Sharpe Ratio (DSR) corrects for two leading sources of performance inflation: NonNormally distributed returns, and selection bias under multiple testing. 
Lopez de Prado, Marcos  2014  Stochastic Flow Diagrams add Topology to the Econometric Toolkit 
Just as Geometry could not help Euler solve the “Seven Bridges of Königsberg” problem, Econometric analysis or Linear Algebra alone are not able to answer many key questions about how financial markets coordinate. Statistical tables are detailed in terms of reporting estimated values, however that level of detail also obfuscates the logical relationships between variables. Stochastic Flow Diagrams (SFDs) add Topology to the Statistical and Econometric toolkit. SFDs are more insightful than the standard collection of statistical tables because SFDs shift the focus from the algebraic solution of the system to its logical structure, its topology. 
Lopez de Prado, Marcos  2013  What to look for in a Backtest 
A large number of quantitative hedge funds have historically sustained losses. In this study we argue that the backtesting methodology at the core of their strategy selection process may have played a role. Most firms and portfolio managers rely on backtests (or historical simulations of performance) to allocate capital to investment strategies. If a researcher tries a large enough number of strategy configurations, a backtest can always be fit to any desired performance for a fixed sample length. Thus, there is a minimum backtest length (MinBTL) that should be required for a given number of trials. Standard statistical techniques designed to prevent regression overfitting, such as holdout, are inaccurate in the context of backtest evaluation. The practical totality of published backtests do not report the number of trials involved, and thus we must assume those results may be overfit. 
Lopez de Prado, Marcos  2013  How long does it take to recover from a Drawdown? 
Investment management firms routinely hire and fire employees based on the performance of their portfolios. Such performance is evaluated through popular metrics that assume IID Normal returns, like Sharpe ratio, Sortino ratio, Treynor ratio, Information ratio, etc. However, investment returns are far from IID Normal. We find that firms evaluating performance through Sharpe ratio are firing up to three times more skillful managers than originally targeted. This is very costly to firms and investors, and is a direct consequence of wrongly assuming that returns are IID Normal. An implication is that an accurate performance evaluation methodology is worth a substantial portion of the fees paid to hedge funds. 
Lopez de Prado, Marcos 
2013 
A Journey through the "Mathematical Underworld" of Portfolio Optimization 
It has been estimated that the current size of the asset management industry is approximately US$58 trillion. Portfolio optimization is one of the problems most frequently encountered by financial practitioners. It appears in various forms in the context of Trading, Risk Management and Capital Allocation. The Critical Line Algorithm (CLA) is the only algorithm specifically designed for inequalityconstrained portfolio optimization problems, which guarantees that the exact solution is found after a predefined number of iterations. Surprisingly, opensource implementations of CLA in a scientific language appear to be inexistent or unavailable. The lack of publicly available CLA software, commercially or opensource, means that trillions of dollars are likely to be suboptimally allocated as a result of practitioners using generalpurpose quadratic optimizers. For a video of this presentation, follow this link. 
Lopez de Prado, Marcos 
2012 
LowFrequency Traders in a HighFrequency World: A Survival Guide 
Multiple empirical studies have shown that Order Flow Imbalance has predictive power over the trading range. The PIN Theory (Easley et al. [1996]) reveals the Microstructure mechanism that explains this observed phenomenon. VPIN is a High Frequency estimate of PIN, which can be used to detect the presence of Informed Traders. 
Lopez de Prado, Marcos 
2012 
Every structure has natural frequencies. Minor shocks in these frequencies can bring down any structure, e.g. a bridge. An Investment Universe also has natural frequencies, characterized by its eigenvectors. A concentration of risks in the direction of any such eigenvector exposes a portfolio to the possibility of greater than expected losses (indeed, maximum risk for that portfolio size), even if that portfolio is below the risk limits. This is particularly dangerous in a riskon/riskoff regime. Managing Risk is not only about limiting its amount, but also controlling how this amount is concentrated around the natural frequencies of the investment universe. 

Lopez de Prado, Marcos 
2012 
The Sharp Razor: Performance Evaluation with NonNormal Returns 
Because the Sharpe ratio only takes into account the first two moments, it wrongly “translates” skewness and excess kurtosis into standard deviation. As a result: (a) It deflates the skill measured on “wellbehaved” investments (positive skewness, negative excess kurtosis). (b) It inflates the skill measure on “badlybehaved” investments (negative skewness, positive excess kurtosis). Sharpe ratio estimates need to account for higher moments, even if investors only care about two moments (Markowitz framework). 
Lopez de Prado, Marcos 
2012 
Concealing the Trading Footprint: Optimal Execution Horizon 
Market Makers adjust their trading range to avoid being adversely selected by Informed Traders; Informed Traders reveal their future trading intentions when they alter the Order Flow; Consequently, Market Makers’ trading range is a function of the Order Flow imbalance. The Optimal Execution Horizon (OEH) algorithm presented here takes into account order imbalance to determine the optimal participation rate. 
Lopez de Prado, Marcos 
2012 
Portfolio Oversight: An Evolutionary Approach 
An analogue can be made between: (a) the slow pace at which species adapt to an environment, which often results in the emergence of a new distinct species out of a once homogeneous genetic pool, and (b) the slow changes that take place over time within a fund, with several coexisting investment style which mutate over time. A fund’s track record provides a sort of genetic marker, which we can use to identify mutations. The biometric procedure presented here can detect the emergence of a new investment style within a fund’s track record. In doing so, we answer the question: “What is the probability that a particular PM’s performance is departing from the reference distribution used to allocate her capital?” 