What Do We Know About the Second Moment of Financial Markets?

Recent research shows that the vast majority of scientific studies published in leading finance journals fails scientific replication (Hou, Xue, and Zhang, 2020; Harvey, Liu, and Zhu; 2016). This study argues that p-hacking, publication pressure and the selection bias from leading finance journals are perhaps not the underlying root cause for this issue. We show that standard methodologies often used in finance research are inevitably sample-specific due to the very nature of financial markets. While the consensus of earlier research postulates a rejection of the time-honored Levy hypothesis, our results strongly indicate that the variance of variance does not exist in any of the financial key markets we consider. An unexpected finding of this study is that the variance process governing the U.S. dollar foreign exchange rate market is generating more extreme events than the Bitcoin market. Our results cast doubts on the validity of methodologies currently used in finance research.

"Truth -or more precisely, an accurate understanding of reality -is the essential foundation of any good outcome. Most

people fight seeing what's true when it's not what they want it to be."
(Ray Dalio, Founder of Bridgewater Associates, and Author of 'Principles')

I. Introduction
In a recent study, Hou, Xue, and Zhang (2020) investigate whether cross-sectional asset pricing phenomena documented in the finance literature hold up to currently acceptable standards for empirical finance. In their study, the authors implement a scientific replication of 452 asset pricing anomalies which are based on 111 original research papers. 1 Imposing the higher multiple test hurdle of 2.78, as proposed by Harvey, Liu, and Zhu (2016), they find that 82% of those anomalies fail scientific replication. This is shocking news.
We take Hou et al.'s (2020) study to a logical conclusion by addressing the question what is the fundamental problem in finance research? Considering the background of the agency problem, as elaborated in Harvey's (2017) study, Hou et al. (2020) argue that authors sometimes engage in specification search, selecting sample criteria and test procedures until insignificant results become significant (p-hacking) which could, in turn, result in an embarrassingly large number of false positives that cannot be replicated in the future. Ray Dalio, founder of some Wallstreet's hedgfunds giants managed by the company 'Bridgewater Associates', asks in his well-known book Principles for distinguishing proximate causes from root causes. Dalio defines proximate causes as actions (or lack of actions) that lead to problems, whereas root causes run much deeper: "You can only truly solve your problems by removing their root causes, and to do that, you must distinguish the symptoms from the disease." (Dalio, 2017, p.176).
The current study takes a novel view by arguing that p-hacking -often referred to as 'cherry-picking' -cannot be the root cause for the high failure rate of replicated academic studies, as documented in Hou et al. (2020). First of all, and most importantly, we note that cherry picking may be committed intentionally or unintentionally. Researchers intentionally suppressing evidence and thus deliberately deceiving the addressees certainly fall into a category that can be rightfully referred to as 'charlatans', whereas researchers that do not intentionally report research results that fail scientific replication do logically not belong in this category either. Moreover, in practice, it is perhaps difficult (or virtually impossible) to differentiate between those two groups. In fact, we can only observe the consequences and the consequences are unfortunately the same. We argue that irrespective of what are the underlying intentions, a hypothetical root cause could be that many researchers rely on traditional research methodologies that unfortunately are not applicable in financial research contexts due to the very nature of financial market data. 2 Correctly using incorrect methodologies would be alarming news because Hou et al. (2020) stress out that armies of academics and investment managers actively engage in searching for significant anomalies and with trillions of dollars invested in factors-based exchange-traded funds and quantitative hedge funds worldwide, the financial interest is overwhelming, and we argue, so is the hidden risk. Referring to the bankruptcy of the hedgefunds Long-Term Capital Management (LTCM), in which Robert Merton Jr. and Myron Scholes were founding partners, Taleb (2010, p.288) points out that the consequences of relying on wrong methods can be destructive: "…during the summer of 1998, a combination of large events, triggered by a Russian financial crisis, took place that lay outside their models. It was a Black Swan. LTCM went bust and almost took down the entire financial system with it, as the exposures were massive." 2 We note that another possibility for the high rate of failing replications could be that equity markets were in the process of moving towards a higher level of efficacy. However, we can rule out this argument because Hou et al. (2020) document that the large-scale replication failure is not due to extended samples.
The ultimate purpose of our study is to test whether research methodologies often used in traditional financial research, such as Ordinary Least Squares (OLS) or Generalized Methods of Moments (GMM), are applicable to financial market data. The rationale for setting-up our research design is straight forward: First (i), we note that most research in empirical finance typically relies on t-statistics derived from, for instance, OLS or GMM, for evaluating the validity of results. Second (ii), irrespective of (a) which type of research method is used, and/or (b) which type of t-statistics are used, any t-statistic can only be used for drawing statistical conclusions if and only if the kurtoses of the model variables do exist. 3 This is definitely not a trivial issue because if the kurtosis is either infinite or does not even exist, we are not in a research environment allowing us to draw conclusions based on (any) t-statistics because this metric would be inevitably sample-specific.
To explore our research question, we analyze whether or not the second moments of five key financial market variables are stable. Using a research approach based on realized variances, we define a financial market variable as stable if and only if the variance of the variance does exist.
Since realized variances are heavily fat-tailed processes, we follow a recent stream of literature and fit power law distributions to the realized variances of the following key financial markets: equities, commodities, currencies and cryptocurrencies. To test whether or not our power law null hypothesis is plausible, we employ hypothesis tests based on Kolmogorov-Smirnov distances, as proposed in the seminal paper from Clauset, Shalizi, and Newman (2009). Moreover, we also consider various subsamples, different types of data frequencies and simulation experiments. 3 The most often used method is perhaps the Ordinary Least Squares (OLS) technique used in different settings (time series regressions, cross-sectional regressions, panel regressions). Since OLS estimation requires some strict assumptions, Hansen (1982) derived the so-called Generalized Methods of Moments (GMM) estimator that relaxes many of the OLS assumptions. However, both the OLS and GMM estimators require that the kurtosis of the input variables exist. Moreover, in attempts to address dependency structures in the first and/or second moment of the (financial) variables used in the estimation procedures, various adjustments such as Heteroscedasticity Consistent Covariance Matrix Estimator (HCCME), Heteroscedasticity and Autocorrelation Consistent Covariance Estimator (HAC), or some type of bootstrapped t-statistics have been discussed in the literature (see White, 1980;Newey-West 1987;Godfrey, 2009). This study has some clear and fundamentally important contributions. The most important contribution is that we take the implications of the evidence documented in Hou et al.'s (2020) study to a logical conclusion by exploring if traditional research methodologies often used in financial market research fail to deliver reasonable results when applied to financial market data. As we assume that the vast majority of researchers does not intentionally report research results that fail scientific replication, the potential root cause for the high rate of replication failures in financial market research would then be how the data are processed.
From a broader perspective, our paper contributes to the literature on tail risks that appear to be a trademark of human-engineered systems. In this regard, the study of Clauset, Shalizi, and Newman (2009) is often-cited work exploring whether 24 real-world data sets from a range of different disciplines follow power law distributions. The evidence documented in Clauset et al. (2009) support Taleb's (2010) view that power law distributions govern many real word phenomena and help to better understand man-made phenomena. Another popular study in this stream of literature is the one of Gabaix (2009) documenting that a variety of variables such as income and wealth, the size of cities and firms, trading volume, international trade or executive pay, for instance, are governed by different power law processes. Our study contributes to this stream of literature by first (i) exploring whether the variances of five key financial market variables are governed by power laws, and second (ii) by identifying whether the second moments of the variances exist.
From the perspective of finance research, power law distributions are used to model the return variation of financial assets. Since power laws are one-sided distributions, it may be not surprising that most research uses the absolute amount of an asset return, that is, | | for modeling power law functions, as pointed out in Lux and Alfarano (2016). Based on the seminal paper from Mandelbrot (1963), early contributions in this stream of literature are Gopikrishna, Plerou, Amaral, Meyer, and Stanley (1999), Jansen and de Vries (1991), Mantegna andStanley (1995) andLux (1996). The studies from Gabaix (2009) and Lux and Alfarano (2016) provide detailed overviews on Electronic copy available at: https://ssrn.com/abstract=3856863 that literature. The current study extends this literature first by modeling the variation of asset returns using realized variances computed using daily high and low prices which incorporates more information than two arbitrary points in the data series (the closing prices). We will see later in this study that this is not a trivial issue. Additionally, we make use of a realized volatility measure based on daily data to compute monthly realized variances. Another novel feature of our study is that it also investigates the variance of the largest cryptocurrency market, that is, Bitcoin exhibiting a market capitalization in excess of $1 trillion as of April 29, 2021. In this regard, Fry and Cheah (2016, p.350) highlight that "from an economic perspective the sums of money involved [in cryptocurrency markets] are substantial". Obviously, studying the variation of cryptocurrency prices is both an important and timely issue.
The results of our study indicate that the daily variances of all five key asset markets are governed by power law processes. Statistically, we cannot reject the power law null hypothesis.
We show that our results are neither sample-nor method-specific. Notably, our findings strongly suggest that the variance of the variance does statistically not exist for any of those asset markets.
Paradoxically, the foreign exchange market is more prone to extreme events than the Bitcoin market.
Our findings have fundamentally important implications that cannot be swept under the carpet: First, due to the non-existence of the variance's variance, standard statistical analysis based on standard OLS or GMM inevitably leads to sample-specific results. Second, in 66% of synthetic samples, the sample variances are underestimated which results in inflated t-statistics, provided data samples are finite. As a consequence, our results cast doubts on the validity of methodologies often-used in financial market research.
And here is how our story unfolds: In the next section we describe the background. The third section describes the processing of the data, whereas the fourth section outlines the statistical model. The fifth section provides robustness checks and the last section concludes.

II. Background
In asset pricing research, the statistical significance of cross-sectional asset pricing phenomena is typically assessed using regression models fulfilling the purpose of adjusting asset returns for potential risk factor exposures. In this regard, the Fama and French factor models have received enormous attention in academic work and are often used as benchmark models. Recently, Fama and French (2018) proposed a six-factor model, that accounts for the momentum factor, given by, where , typically denotes the excess return of an equity portfolio i at time t, denotes the excess market factor at time t, and denote the size and value factor at time t (Fama and French, 1992;1993), and denote the profitability and investment factor at time t (Fama and French, 2015;, and denotes the momentum factor at time t (Fama and French, 2018).
It is important to understand that similar types of factors models are also widely used for assessing the significance of risk-adjusted returns in other financial assets markets, such as traditional foreign exchange markets or cryptocurrency markets (Lustig, Roussanov, and Verdelhan, 2011;Shen, Urquhart, and Wang, 2020). Hence, the same issues discussed here for the equity market apply to any other financial asset market also.
Following standard econometric modeling, we can stack the risk factors, including a Tx1 vector of ones demoted as , in a regressor matrix defined as, that has the dimension Tx7. Denoting the Tx1 vector of portfolio excess returns as and the Tx1 residual vector as , we know from standard econometrics classes that using Ordinary Least Squares (OLS), the estimated covariance matrix of the point estimator is given by where Furthermore, in standard econometrics it is typically assumed that , ~ (0, ) and, hence, is normally distributed also. A severe problem in this model framework arises if is sample-specific because if is sample-specific then will be sample-specific too, and critical values derived from the normal distribution will be meaningless. For instance, Fergusson and Platen's (2006) study provides evidence on that the unconditional distribution of daily returns appears to be remarkably close to the Student t distribution with = 3 degrees of freedom. Assuming that , ~ (3), it follows that , = 3, whereas , = ∞.
In turn, an infinite kurtosis implies that the variance is not stable. In this regard, Taleb (2020, p.50) vehemently stresses out that if the kurtosis does not converge, "the sample error is huge; or it may not exist so the measurement is heavily sample dependent. If we don't know anything about the fourth moment, we know nothing about the stability of the second moment. It means we are not in a class of distribution that allows us to work with the variance, even if it exists." Using a simulation experiment, we will later learn more on the implications of this issue.
Furthermore, Hansen (1982) proposes the Generalized Methods of Moments (GMM) estimation technique that relaxes some of the standard OLS assumptions. In this regard, the key point in GMM estimation is that the employed variables exhibit ergodic stationarity which, again, implies that the fourth moment of the variables must be finite. If the fourth moment is infinite, GMM is, in turn, as sample-specific as OLS.
While the Student t distribution could be an interesting approximation for the unconditional distribution of daily returns in equity market settings, it is probably not able to capture the extremely fat tails of variance distributions, especially the variances in cryptocurrency markets which appear to exhibit extremely high levels of uncertainty (Baur, Hong, and Lee, 2018). Taleb (2020, p.91) highlights that "there are a lot of theories on why things should be power laws, as sort of exceptions to the way things work probabilistically. But it seems that the opposite idea is never presented: power laws should be the norm, and the Gaussian a special case." Hence, in what follows, we model the variance processes of our five key asset markets as power laws and test whether these models reasonably describe the data generating processes.

III. Data
We downloaded publicly available daily data on the S&P 500, gold, crude oil, the exchange rate U.S. dollar against British pounds, and Bitcoin from finance.yahoo.com. Due to data availability, the data sample ranges between September 17, 2014 to March 31, 2021 for Bitcoin, and April 20, 1982 to March 31, 2021 for the S&P 500. Especially, the data include the highest daily prices, lowest daily prices, and closing prices for each trading day of the samples.

A. Realized variance
We compute realized variances for each asset market i, where = { , , , . . , }. Realized annualized daily variances are compounded in line with Parkinson (1980), that is, where , and , denote the highest and lowest price for asset market i on day t, , denotes asset market i's corresponding realized annualized variance at time t, and = 365 for the cryptocurrency market because this market allows for trading 24/7, whereas we use = 250 for any other asset market. Since the Parkinson (1980) estimator uses the price range of intraday asset prices, Chou, Chou, and Liu (2010) emphasize that it incorporates substantially more information than two arbitrary points in this series (the closing prices). 4 When computing the realized variance for crude oil, we exclude the observations on April 20, 2020 and April 21, 2020 because the lowest prices were negative on those days, and hence, the realized variance in line with Equation (3) is not defined. 5 In Table 1, the descriptive statistics are reported and Table 2 reports the share of the top 1% and the top 20% of the cumulative total of the distribution. From Table 1 in association with Table   2 we observe that the realized variance processes for all asset markets are heavily fat-tailed, which is not a surprising feature per se.
However, a surprising observation from these tables is that the variance process governing the U.S. dollar foreign exchange rate market -which obviously is the exchange rate market of the most important national currency -is considerably heavier fat-tailed than the cryptocurrency market. This is surprising because Baur, Hong, and Lee (2018) argue that Bitcoin returns exhibit an extremely high kurtosis with relatively more tail events compared to other assets, and therefore Bitcoin serves rather as speculative asset than as a medium of exchange. On January 27, 2012, the exchange rate U.S.$/U.K.£ ranged between the highest value corresponding to 1.57 and the lowest value corresponding to 0.64; that is, on the same day, the exchange rate dropped by 41%. For comparison, the largest daily drop in the history of the S&P 500 occurred on October 19, 1987 where the S&P 500 ranged between the highest index value corresponding to 282.70 and the lowest index value corresponding to 224.83 corresponding to a relative drop by 20%. We see that rare events have a considerably stronger impact in the U.S. dollar foreign exchange market than in the U.S. equity market.
Next, from Table 2 we observe that 1% of the largest observations in the variance process correspond to 73.50% of the cumulative total of observations, whereas the share of the top 1% of the realized variance processes of the other asset markets comprises between 18.34% (Bitcoin market) and 22.96% (Crude oil market) of the cumulative total of observations. Comparing these numbers with Table 3 in Taleb (2010, p.265) strongly suggests a Paretian tail with power law exponents close to 2.5 In fact, the variance of the U.S. dollar foreign exchange rate market considered here is the most extreme process in terms of its Paretian tails. Specifically, the traditional 80/20 Pareto distribution -which is the archetype of a power law process -suggests that 20% of the largest observations comprise 80% of the cumulative total of observations. As pointed out in Taleb (2010, p.235) according to its scalability, this suggests in turn, that 1% of the largest observations comprise about 50% of the cumulative total of observations. With respect to the foreign exchange market's variance considered here, this feature is even more pronounced. There is no other distribution class than power laws that allows for this type of extremeness in fat tails that we observe here. A fundamental follow-up question that arises here is then: If Bitcoin does not fulfill the requirements as being a medium for exchange or store of value due to its high uncertainty, how can the U.S. dollar be considered stable? Since the 80/20 Pareto distribution does not have a variance, any estimated tstatistic is consequently sample-specific. Next, we fit power laws to the variance processes of our five key asset markets and then test our power law null hypothesis.

C.1. Moments of power laws
To investigate the stability of our variance processes, we model the realized variances using the following power laws: where = ( − 1) with ∈ {ℝ | > 1}, ∈ {ℝ | ≤ < ∞}, and is the minimum value of realized variance that is governed by the power law process, and is the magnitude of the Electronic copy available at: https://ssrn.com/abstract=3856863 specific tail exponent. 6 Regarding , Taleb (2020, p. 34) observed that the tail exponent of a power law function captures via extrapolation the low-probability deviation not seen in the data, which plays a disproportionately large share in determining the mean. Using our model framework, it can be shown that the expectation of the variance defined as [ ] is given by and that the second moment [ ], or the variance of the variance, is defined as: Higher moments of order are analogously defined as: From equation (5), we know that the mean only exists for > 2, whereas the variance only exists for > 3.

C.2. Maximum-Likelihood Estimation
In line with White, Enquist and Green (2008) and Clauset et al. (2009), who found that maximum likelihood estimation (MLE) performs best for estimating power law exponents, we estimate the tail exponent as: where denotes the MLE estimator, is the number of observations and other notation is as before.
Figure 1 -5 plot the estimated parameters for depending on the value for for all five asset market variances. 7 A crucial issue is how to determine the corresponding values for and to accurately estimate the probability density functions. Clauset et al. document that it is common 6 We follow notation in Clauset et al. (2009). To keep our notations clear, we drop the index i denoting the respective realized variance of the individual asset market. 7 These graphs are often referred to as Hill plots.
practice to choose the value for , where beyond which is stable. From Figure 3 in Clauset et al. (2009, p. 670), we observe that this value corresponds to the saddle point in a / -graph. From our Hill plots it is evident that for most asset markets variances appears to be stable below a value of three. However, it is not clear which is optimal.

C.3. Kolmogorov-Smirnov test statistics
Determining the exact value for is, however, not a trivial issue. Clauset et al. emphasize that if one chose too low a value for , one would get a biased estimate of since one will be attempting to fit a power-law model to non-power-law data. On the other hand, if one chose too high a value for , one would effectively remove legitimate data points < , which increased both the statistical error on and the bias from finite size effects. To address this issue, Clauset et al. propose an approach that chooses the value of that makes the probability distributions of the measured data and the best-fit power-law model as similar as possible above . Since the analyzed data is non-normal, the authors make use of the Kolmogorov-Smirnov or KS statistic, which is common practice, which is the maximum distance between the CDFs of the data and the fitted model: where ( ) is the CDF of the data for the observation with value at least , and ( ) is the CDF for the power law model that best fits the data in the region ≥ . The estimate of the is the value of that minimizes D. Clauset et al. show that their proposed method gives excellent results in practice and outperforms other methods. Hence, we use Clauset et al.'s approach and report the corresponding estimates and for the variances of our five asset markets in Table 3. 8 8 We use the code plfit written by Aaron Clauset to estimate the for each asset market variance. Since the code does not provide the corresponding as additional output, we assess the corresponding directly from our Hill plots. The code is available at http://www.santafe.edu/~aaronc/powerlaws/. We thank Professor Clauset for making this code available.

C.4. Estimated power law exponents
From Table 3 it is evident that the power law exponent for the variance processes of the S&P 500, gold, crude oil, and the U.S. dollar is below 3. From Equations (5) and (6) we infer that the variances of the variances do not exist for those asset markets. This means, in turn, that t-statistics based on the estimator in Equation (2) and its derivatives will be, as a consequence, sample-specific. Given that ∈ {ℝ | > 1}, the 95% confidence interval for the relevant one-sided test is (1; 3.0883]. 9 Hence, we infer that we cannot reject the null hypothesis implying that the variances of variances for all asset market do statistically not exist. 10 This result may somewhat come as a surprise, given that Gabaix (2009) and Lux and Alfarano (2016) argue that the consensus in the literature is that the absolute amount of an asset return denoted as | | and modelled as (| | > ) = exhibits a power law exponent of ≅ 3. Interpreting the | | as measure of an asset's price fluctuation, and hence as a measure for the asset variation, ≅ 3 would imply that the variation of the variation exists implying that asset returns are not Lévy distributed. Using range volatility models, which according to Chou, Chou, and Liu (2010, pp.1273-1281 incorporate substantially more information than two arbitrary points in this series (the closing prices), the evidence in the current study does not support ≅ 3.

C.5. Implications of non-existing variances of variances of asset markets
The question arises what are the implications of non-existing variances of variances? Taleb (2020, p.50) emphasizes that if the kurtosis of a random variable does not exist, the second moment will be unstable. Instable second moments imply, in turn, that t-statistics will be heavily sample dependent.
Specifically, if the kurtosis does not exist, we are not in a class of distribution that allows us to work with the variance, even if it exists. In our framework we worked directly with the variances using realized variances. Hence, we can interpret the second moment in our model representation as the corresponding fourth moment of the return distribution. As the variances of our asset markets' variances do not exist, t-statistics are sample-specific.
We argue here that this issue could explain the enormous failure rate in replicating academic studies in financial economics. It is interesting to note that sample-specificity of financial data is obviously nothing new to some practitioners. In this regard, in a Bloomberg seminar covering the topic 'Safe Havens', Mark Spitznagel, hedgefundmanager at the company Universa, which gained an incredible return of 3600% in March 2020, a stressed out that "stock markets are non-ergodic". 11 Since non-ergodicity implies sample-specificity, our findings indicate that this appears to be the case for any other financial asset market also. 12 To harden our argument, we run the following simulation experiment: Using Equation (4) it follows that and one can show that In Equation (11), ( ) denotes the corresponding value of the power law function that is associated with the probability ( ). Employing our estimates = 2.58 and = 6.36 for the S&P 500 and a random number generator, giving us values between 0 and 1, we can use Equation (11)  Equation (2) it follows that if Hence, inflated t-statistics could be one of the potential reasons for why the majority of academic studies fails to replicate. Dealing with power laws where < 3, the popular Law of Large Numbers works too slow to work and given we are dealing with finite samples, we do not observe the mean of the distribution (Taleb, 2020).

D.1. Are the results sample-specific? A statistical replication.
Due to data availability, the data that we use in the MLEs vary between 2384 and 9821 daily observations for Bitcoin's realized variance and the S&P 500's realized variance, respectively. To explore the stability of our estimated power law exponents, we restrict all samples to include only the last 2384 observations so that our estimates across asset markets line up with the sample comprising the least number of observations. 13 The results reported in Table 4 show that the power law exponents are very close the figures reported in Table 3. Using the hypothesis test as discussed in section C.3, all power law exponents are statistically significantly below 3 implying that none of the asset market variance processes exhibits a defined second moment. The results of these robustness checks strongly support our previous evidence.

D.2. Are the results method-or sample-specific? A scientific replication.
While  (2007), by first (i) using a similar but not identical methodology to estimate the realized variance, and second (ii), we implement this methodology for a different population. Furthermore, one can argue that finance research typically operates with monthly as opposed to daily data. 14 To address these concerns, we retrieve daily data for the S&P 500 covering the period from March 4, 1957 when the original S&P 500 companies were added to the index until March 31, 2021. 15 We compute realized monthly variances as ∑ , , where , denotes the daily return of the S&P 500 on day j in month t. Specifically, assuming 22 trading days per month, we compute the realized monthly variances using non-overlapping squared daily observations. For instance, the realized variance for the first month in this sample is the sum of squared daily S&P 500 returns from March 5, 1957 until April 3, 1957, whereas the realized variance for the second month is the sum of squared daily S&P 500 returns from April 4, 1957 until May 6, 1957, and so on. This approach results in 732 consecutive (non-overlapping) realized monthly variances covering the March 1957 to March 2021 period and is plotted in Figure A.1. in the appendix. 16 Descriptive statistics are reported in Table   A.1. in the appendix. Using these data, we again implement the MLE as outlined in section C.
Strikingly, we find that = 2.56 which is virtually the same estimate as reported in Table 3. Further, the estimated = 26.83 suggests that 20% of the sample are governed by the power law process. 17 The KS-test results in a p-value of 0.6680 suggesting that we cannot reject the power law null hypothesis. We interpret this as strong evidence supporting our key results. 18

E. Conclusion
Recent research documented that the vast majority of studies fails scientific replication. Why is that?
While earlier research argued that 'cherry-picking' could be one possible explanation, we argue that 'cherry picking' may be committed intentionally or unintentionally. We do not believe that the vast majority of researchers intentionally suppresses evidence and argue that 'cherry-picking' is probably not the underlying root cause for this issue. In contrast, we argue that it is possible to identify whether or not research methodologies employed in a specific research environment are valid. Hence, a hypothetical root cause for the high rate of replication failures in financial economics could be that many researchers correctly use incorrect methods, that is, these methods do not work well, given the very nature of financial markets.
Furthermore, Lux and Alfarano (2016, p.5) 1987 andMarch 2020. 17 For 147 out of 732 observations ≥ is satisfied. 18 As an additional robustness check one might think of implementing Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models for the return series where one makes use of a tdistribution for modeling the innovation process in an attempt to account for fat tails. If the optimal degrees of freedom were less than 5, our results would be supported as it would imply an infinite kurtosis. Unreported results show that such a model implemented for the S&P 500 suggests instability because the sum of the point estimates for the variance equation is larger than one. Hence, this study does not make use of any GARCH-type models. of the distribution is faster than allowed by this family of distributions. Modeling the variation of financial asset returns using realized variances compounded via daily high and low prices which incorporates more information than two arbitrary points in the data series (the closing prices), and using MLE, our results strongly indicate that the power law exponent is statistically significantly less than 3 across different financial asset markets. We also show that our results are neither samplespecific nor method-specific.
We believe that our results have some important implications for the education system.
It is interesting to note that the well-known psychologist Jordan Petersen in his publicly available lecture on 'openness, intelligence and creativity', expressed his concerns for why students in psychology are not sufficiently educated in using power laws, respectively, Pareto-distributions, despite of the fact that creative production in any given domain are governed by power laws. 19 Peterson also highlights that power laws are "the inevitable consequence of multiple trades that are conducted randomly." 20 Assuming that financial markets, which serve the fundamental purpose of trading, would not be governed by power laws seems irrational. In this regard, Taleb (2020, p.91) stresses out that "there are a lot of theories on why things should be power laws, as sort of exceptions to the way things work probabilistically. But it seems that the opposite idea is never presented: power laws should be the norm, and the Gaussian a special case." Given the evidence documented in Hou et al. (2020) in association with the results of the current research, our findings suggest that power laws should be a part of the standard education in statistical methodologies used in social sciences. 19 Petersen emphasizes that the natural law governing this process a Pareto distribution and was studied in detail in the domain of scientific productivity by De Solla Price (1965). The lecture is available on Petersen's youtube channel: https://www.youtube.com/watch?v=fjtBDa4aSGM&t=0s. 20 ebenda. Tables   Table 1. Descriptive statistics   This table reports the descriptive statistics for the annualized daily realized variance for the S&P 500, gold, crude oil, the exchange rate of the U.S. dollar against the British pound and Bitcoin. The annualized daily realized variances for each asset market i, where = { , , , . . , } are in line with Parkinson (1980) computed as , = ( ) ln , − ln , , where , and , denote the highest and lowest price for asset market i on day t, , denotes asset market i's corresponding realized annualized variance, and = 365 for the cryptocurrency market because this market allows for trading 24/7, whereas we use = 250 for any other asset market. Publicly available daily data on the S&P 500, gold, crude oil, the exchange rate U.S. dollar against British pounds, and Bitcoin were retrieved from finance.yahoo.com.     This figure shows the Hill plots for the S&P 500 variance. The Hill plot shows the estimated as a function of defining the minimum value of the variance that is governed by the power law, given by the maximum likelihood estimator (MLE),

Metric
where denotes the MLE estimator, is the annualized daily realized variance of the S&P 500, provided ≥ , and denotes the number of observations for which ≥ is satisfied. (Note for improved visualization, the graph is cut off at observation which is common practice.) This figure shows the Hill plots for the gold variance. The Hill plot shows the estimated as a function of defining the minimum value of the variance that is governed by the power law, given by the maximum likelihood estimator (MLE), where denotes the MLE estimator, is the annualized daily realized variance of gold, provided ≥ , and denotes the number of observations for which ≥ is satisfied. (Note for improved visualization, the graph is cut off at observation which is common practice.)  This figure shows the Hill plots for the crude oil variance. The Hill plot shows the estimated as a function of defining the minimum value of the variance that is governed by the power law, given by the maximum likelihood estimator (MLE), where denotes the MLE estimator, is the annualized daily realized variance of crude oil, provided ≥ , and denotes the number of observations for which ≥ is satisfied. (Note for improved visualization, the graph is cut off at observation which is common practice.) This figure shows the Hill plots for the U.S.$/U.K.£ exchange rate variance. The Hill plot shows the estimated as a function of defining the minimum value of the variance that is governed by the power law, given by the maximum likelihood estimator (MLE), where denotes the MLE estimator, is the annualized daily realized variance of the U.S.$/U.K.£ exchange rate, provided ≥ , and denotes the number of observations for which ≥ is satisfied. (Note for improved visualization, the graph is cut off at observation which is common practice.) This figure shows the Hill plots for the U.S.$/U.K.£ exchange rate variance. The Hill plot shows the estimated as a function of defining the minimum value of the variance that is governed by the power law, given by the maximum likelihood estimator (MLE), where denotes the MLE estimator, is the annualized daily realized variance of the U.S.$/U.K.£ exchange rate, provided ≥ , and denotes the number of observations for which ≥ is satisfied. (Note for improved visualization, the graph is cut off at observation which is common practice.) Using ( > ) = ( ) = we create = 100,000 synthetic samples = 2.58 and = 6.36. Each sample has = 500 data observations. For each sample we compute the sample variance. Figure 6 plots the estimates for the sample variances across our synthetic samples.