How to become a better programmer
Posted by EspressoLover on 2018-08-20 21:08
Honestly, the answer to this question may just be go work for Google for a year.

Finance as a whole has some pretty horrendous habits, beliefs and practices when it comes to the discipline of software engineering. Personally I blame the tendency of our industry to always be fighting fires. A lot of shoddy hacks get rushed out, merged into git-master, then completely forgotten about until the brittle code fails in prod spectacularly six months down the line. Devs play second-fiddle to "front-office", who are always pushing for new revenue-generating feature. So code rarely gets refactored or cleaned up

Another issue is that products requirements are often vague and underspecified. The PMs don't know exactly what they need until the researchers run the analytics. But the researchers can't run analytics until the framework gets built. But the framework can't built until the PM can outline to the devs what's needed.

For all its warts, Silicon Valley does tends to be ahead of the curve in this area.

EDIT: s/SV/Silicon Valley/
The magic of the compound interest rate
Posted by EspressoLover on 2017-02-05 22:53
Manhattan ROI is being under-estimated because you're not counting rent, only capital appreciation. It's like estimating equity returns but ignoring dividends. Even if (land) rental yields averaged 40 bps, Manhattan wins in CAGR.

Anyway, there's a reason that the majority of very long-lived wealth is real estate. It tends to be much more robust against against both hostile governments and moron managers. Financial assets tend to be seized or devalued much more readily than real assets. Bank accounts are frozen, companies nationalized, currencies devalued, etc. Land reform does occur, but not as often, and compensation typically tends to be at least somewhat fair. Faceless bankers and speculators are less politically sympathetic than the local lord, who's family's has taken care of the peasants for generations.

Second it's hard for some gullible and none-to-bright heir to run a land-holding into the ground. The 13th Duke of York may mis-manage things, but the 14th Duke of York can probably recover as long as he still holds title. Land's value is pretty intrinsic and hard to alienate. Financialization dis-proportionately hurts the low-IQ, as many bankrupt professional athletes can attest. A few bad investments, some shady advisors, and over-leverage can wipe out even massive financial fortunes in a generation or two.
The magic of the compound interest rate
Posted by EspressoLover on 2017-02-07 00:11
Definitely agree, it's not a fair comparison at all. I though maybe using the Louisiana Purchase or Seward's Folly might be a little more representative of (American) land in general. Using some rough estimates I get capital appreciation CAGR of 5.5% and 6.1% respectively. So, Manhattan hasn't actually been *that* much better of an investment than the rest of American real estate.

But your point still stands about Amero-centricity. I'd guess European land in general probably did much worse over a comparative period. Most of the US went from completely uninhabited to settled. By comparison Europe was already relatively dense 200 years ago, so the T-0 prices were probably too high for significant capital appreciation. (Rental yields must have been better though). Plus 1860-1930 results in a substantial rise in wages relative to food prices, which is always bad for land appreciation. Which of course is the same period most of the aristocratic houses go bankrupt.

I'd still attribute that outcome to financialization. The Black Plague, 1330-1380, saw a similar labor/food price rebalance, but most of the noble houses survived. The difference was the availability of liquid credit markets. In the 14th century a noble family would just cut back their expenses. By the 19th, they could make up for falling incomes by mortgaging the real estate. If you're financially savvy this is great, if you're not it's like playing with dynamite. Unfortunately for the old money families, your great-grandsons a lot more likely to be the Duke of Marlborough than Baron Rothschild.

Regarding your specific examples, anywhere that went Communist pretty much is 100% loss both for land and financial assets. Tragically, pretty much the same holds true if you're in a persecuted group and live under fascism. Property rights aren't high on the priority list of totalitarian regimes. With the possible exception of art or jewelry, which can be smuggled, there's not really any investment that's robust against secret police.

I would guess Argentine landowners probably did better than Argentine financial asset holders over the past century. Not that the land did great, but just because there's been so many bouts of hyper-inflation, bank runs, defaults, currency shenanigans, etc. that financial assets have been nearly zero'd out multiple times. The reasoning's probably extensible to most other banana republics. I'm not sure about the losing powers in WWI/WWII, but it does seem like the industrialists made out better than the landowners in Germany post-1945. The Quandt and Thyssen families seems to have stayed massively wealthy from their corporate holdings, much more so than any landed junkers. Chalk a point up for financial assets.
Vol surface changes as underlying moves
Posted by EspressoLover on 2016-02-29 19:33
Don't know what horizon you're looking at, but one stylized fact to consider is that time of day effects are in play. For example in US equities, IVs are less sensitive to late afternoon price moves than at other times of day.
Vol surface changes as underlying moves
Posted by EspressoLover on 2016-03-03 01:52
You may want to consider fitting the time-series of IV against underlying returns. There does tend to be short-term localized persistence, with longer-term reversion to historical relationships. For example YTD-2016, S&P IV has tended to undershoot the underlying.

To be honest I'm not sure if the vol surface snapped at a single point in time reflects this or not. My experience here is with stat-arb on vol futs, so I don't know how much marginal information these time-series signals add to the surface. However, even with large moves the time-series fitted IV-underlying relationship does tend to be very stable.
Does there exist a so called roll Greek (not theta!)?
Posted by EspressoLover on 2017-09-21 19:07
Roll and theta are the same concept. They just differ by how you define "prices stay the same".

Say you're looking at monthly Eurodollar futures. So you have a 1 month contract expiring in October, a 2 month in November, etc. For simplicity let's also consider monthly theta, instead of daily. What does it mean in this context for prices to stay the same? One interpretation is that the October contract ends up at the same price as the October contract, the November->November, December->December, etc. Another way of viewing this is that next month's 1-month contract should end at the same price as today's 1-month. In which case the price of the October contract ends up at today's September price, November->October, December->November, etc.

Now if you extend it to daily, you might have 25-day contract today, which becomes a 24-day tomorrow. A 55-day that becomes a 54, etc. You don't have exactly the same expiry set day-to-day. So you interpolate between fixed point. A 55-day contract is 5/6 weight on the canonical 2-month contract and 1/6 weight on the 1-month. But in one day it will become more weighted to the 1-month. So, even if the curve doesn't move, this individual contract should slightly change prices to reflect it's change in weight. Like a wheel, where 1-full rotation is each monthly expiration, the contract "rolls" down a theoretically stable curve.

In fixed income, sometimes it's more natural to think about things in terms of their maturity, rather than specific securities. We care about "10-year treasury" yield and risk, not specifically CUSIP bond 9128282R0. If we wanted to compare our portfolio from 2 years ago to today, we wouldn't directly benchmark against CUSIP 9128282R0. We'd try to find two respective securities from each time period, that best correspond to "10-year treasury".
Sharpe Ratio - Occasional Trading Strategy
Posted by EspressoLover on 2017-10-26 08:13
> However if we null them, and don't include the data point,

This isn't the right approach. A strategy that trades everyday, caeteris paribus, has an inherent advantage over one that trades fewer times a year. Normally we think of diversification in the space of assets, but there's also diversification over time. If every trading day samples from the same distribution, then a 252 trading day strategy can achieve the same expected returns with 1/16 the volatility, drawdown, and VaR compared to a strategy with one day a year. Nulling the non-traded days ignores this disadvantage and pretends that the 252-day and 1-day strategy are equivalent. Reductio ad absurdum.

> If you include them at a zero, then it dampens the mean return an awful lot

This also isn't the right approach. At least in most cases. The question boils down to what's the investor's next best opportunity? Let's pretend your strategy *did* trade 252-days/year. Where would your typical investor be allocating money from to fund the investment? I'm guessing it's very unlikely that the opportunity cost is cash. Very likely the funding competes with allocation to some liquid risk-type asset.

The proposition is very simple. If you were selling a full-time strategy, you'd have to convince the client that your strategy is better than their next best opportunity. Now all you're saying instead is "my strategy's better than your next best opportunity 30 days of the year. But the other 220 days when it's not available just do what you were doing before. You're better off some days, and no worse off the rest of the time".

In short pick some standard "ambient risk asset". Probably the S&P 500, but maybe you can get fancier. Interlace the ambient returns on top of the strategy's censored trading days. E.g. if you're only tradable on Tuesday and Friday, the returns for the week become [SPY, Strat, SPY, SPY, Start]. With the full interlaced series in hand, calculate the standard portfolio statistics. Also to be serious you should add the transaction costs to toggle in/out of the ambient asset, but for SPY on a daily horizon that's gonna be de minims.
Sharpe Ratio - Occasional Trading Strategy
Posted by EspressoLover on 2017-10-31 21:06
> Mentally I want to treat them different, but not sure I should. Thoughts?

Short answer: As long as active days infrequently overlap, then two strategies are pretty much separate things.

Long justification: Let's say strategy A and B trade occasionally on certain days throughout the year. The approximate upper bound on their correlation is basically ||Intersect(ADays,BDays)||/||Union(ADays,BDays)||. E.g. assume if A is active for some 126 days of the year, and B is active for some 126 days, and the active days are randomly drawn independently. Then it'd be very unlikely that their long-term aggregated correlation signficantly exceeds 33%. (63 expected co-active days divided by 189 expected active days).

To see why, consider an A-strat that trades on Wednesday, and a B-strat that trades on Friday. Their days never overlap, so daily correlation must be zero. But is it possible that their aggregated weekly returns could be non-zero? Numerically, yes. Economically, not likely. If they had non-zero correlation, that would imply that A's Wednesday performance forward predicts B's returns on Friday. That's a clearcut violation of market efficiency.

I’m not denying that market inefficiencies exist. Otherwise I’d be a tennis instructor. But when they do they're usually rare, hard to find, and small in magnitude. For daily returns on liquid assets just stumbling on a signal with 5%+ correlation would be extremely unlikely. (And if not, consider it a mitzvah. Forget the rest of this crap, and just go print money with that thing.)

Hence the non-contemparenous portion of correlation is almost certainly puny relative to typical contemporaneous correlations. The correlation is closely upper-bounded by the proportion of overlapping days. Analogous arguments can be extended to other co-moments.

(Digression for pedants: This assumes all trading days within a strategy draw from the same distribution of returns. If there's heteroskedacity between different days, then the overlapping days can be more (less) volatile. In which case the upper bound is larger (smaller) relative to the disproportionate concentration of volatility on those overlapping days.)

You can approximate two strats as separate black boxes as long as 1) they both trade infrequently, and 2) their active days don't overlap significantly more than you'd expect from chance. If active days are drawn independently, two strats that trade with frequency proportion P, should overlap with frequency P^2. Hence correlation is bounded by P^2/(2*P - P^2). For small values, that approximates to P/2, which by our previous definition of P should be small. E.g. two infrequent strategies that each trade 12 randomly drawn days of the year, should have annualized cross-correlations of less than 2.5%.

OTOH, if 1) is violated and P is large, then maybe this isn't the right approach. If we're talking about strategies that trade over 50% of the time, then it's probably time to swap back to traditional portfolio analysis. If 2) is violated, then there's probably some deeper connection between the two strategies, that warrants more than a separate black boxes approach.

For example two strategies that each trade 12 randomly drawn days a year should only overlap a handful of days a decade. But if you find them overlapping six days a year, then there's probably some underlying mechanism that's tying them together. Even if the returns themselves don't appear to correlate, it’s probably worth a deeper investigation.
Sharpe Ratio - Occasional Trading Strategy
Posted by EspressoLover on 2017-11-02 18:34
@goldorak
> In my opinion, the use of correlations for risk management is a sure recipe for over-leveraging positions.

I'm not entirely unsympathetic to this criticism. But this issue goes far beyond infrequently overlapping strategies. Even a tenet as orthodox as "bonds diversify stocks" starts to look shaky if you think this way. Yes, bonds and stocks have negative correlation, but they have pretty strongly positive cokurtosis. Their prices tend to move in opposite directions but their volatility moves together. (Which can be seen by plotting VIX against TYVIX.)

For a risk metric sufficiently loaded on extreme moves, like CVaR or drawdown, the impact of cokurtosis starts to outweigh correlation. During periods of extreme stock turbulence, bonds are more likely to rise but they're also more likely to have extreme down moves. Bias your utility function enough on the latter over the former, and the Markowitz implied benefits of bond diversification shrinks or even reverses.

So yeah, you are making a very good point. One worth paying attention to. But it does open a whole can of worms that unfortunately takes the discussion way beyond the scope of the thread.

@rashomon
> For US equities this is not the case [that daily distributions are identically distributed]

Definitely agree. That was just a simplifying assumption for a toy model. But the correction is relatively straightforward as long as you have a reliable estimator of the time-varying underlying vol. (Which I think is pretty easily done for US equity indices and a little harder but still doable for US single-name equities). When calculating the proportion of overlapping days, just scale the set intersect/union cardinality by the weight of underlying volatility.

Practically speaking, I can’t really see the adjusted method producing substantially different results. Unless you’re talking about strategies that almost have to be specifically constructed to give funky results. For most normal cases it’d be pretty hard to imagine overlapping days having 50% more volatility than non-overlapping days. Most likely the difference would be substantially less than that. Going back to the original 12 day vs 12 day example, the upper bound on correlation would rise to 3.25% from 2.5%. In general, if P is O(small), then vol-scaled-P is very likely still O(small).
what’s the case for sharpe anyways?
Posted by EspressoLover on 2020-05-10 22:06
Optimal Kelly sizing is linearly proportional to Sharpe ratio. Therefore for a risk-neutral investor in an infinite period game without leverage constraints, logarithmic utility scales quadratically with Sharpe regardless of higher moments.

Optimal log-returns are equal to Sharpe^2. For example a strategy of Sharpe 0.5 can achieve 25% annualized log-returns at optimal leverage. Sharpe 1.0 can achieve 100%. Sharpe 2.5 can achieve 625% returns. Sharpe 0.1 can achieve 1%. Etc.

Another way to think about this is to consider the law of large numbers. Consider period to period returns of an asset as an i.i.d. process. (For any reasonably liquid and efficient asset, some sufficiently large period will have sufficiently small auto-correlation between periods.) In the long run, the investors log-wealth converges to a normal random variable. The variance of this "cumulative wealth" variable is linearly dependent on the variance of the assets' return variance. Any higher moments converge to zero.

In other words, for a sufficiently long-term investor without leverage constraints, all risk metrics besides Sharpe become irrelevant.
what’s the case for sharpe anyways?
Posted by EspressoLover on 2020-07-11 21:00
Well, just to illustrate I grabbed daily return data on SPY going back to inception (1993). Realized annualized Sharpe is 0.48, so it's a pretty good example. Annualized return is 10%, so getting to 25% requires adding 150% leverage.

2.5X SPY has a number of major drawdowns. Nothing as serious as 99% of capital. But you would lose 85% of your capital in the GFC, 80% during the dot-com bust, and 70% to Covid. But keep in mind that those losses are fully recovered by 2012, 2006, and (nearly) summer 2020 respectively. 94% of rolling 5-year periods are positive, and the median return is 242%.

That being said, I don't advocate full Kelly Sizing as a practical manner. In particular, I think most people drastically over-estimate their strategy's Sharpe ratio. Mostly because of a combination of over-fitting, over-confidence, performance decay and the Peso problem. Fractional Kelly, with the fraction somewhere between one fifth and one half depending on an honest and introspective Bayesian assessment, is usually prudent.
AdaBoost in allocation/parameter fitting
Posted by EspressoLover on 2015-11-05 09:02
AdaBoost doesn't work in a regression context, which you're almost assuredly interested in when predicting forward returns or most anything else in a trading context. Sure you can hack the regression problem into a classification problem in some way, e.g. predict next tick up/down, +/-sigma moves, etc. But why not avoid introducing needless complexity and use gradient boosting instead?

I will say that in my experience most of the traditional classification-centric ML techniques usually have lower performance when adapted to regression. Particularly in the low signal/noise environment found in financial data. Layering a shrinkage estimate with a validation set seems to be pretty effective at fixing this problem. Indicator, hinge and logistic loss functions really don't penalize overconfidence in the same way as square loss.

Hence many state-of-the-art ML systems tend to generate way to large alpha magnitudes. That means taking on unacceptable risk or t-costs. Shrinking in one way or another the estimates with a traditional regression approach can help neutralize this effect.
Dynamic PCA
Posted by EspressoLover on 2015-12-01 01:18
> In-sample and out-of-sample the r-sq was through the roof. Residuals were non-existent. Then I looked at the stability of the coefficients and it was clear that it was not worth trying to reap the benefit of the explanatory power.

Best guess to what was happening here: as sampling frequency increases correlations start to fall to 0. Price-discreteness and microstructures effects dominate over smooth stochastic diffusion. At sub-second intervals, you're mostly modeling which stocks are ticking together. Particularly thick-book, low-priced stocks. The dynamic eigenvectors are proxying large-scale portfolios that are currently rebalancing.

E.g. say [IBM, MSFT and AAPL] are highly correlated on a low-frequency basis. But currently there's a large rebalancing portfolio that's concentrated in IBM and MSFT, but not AAPL. The execution algo is probably sending orders at the same time for IBM and MSFT. AAPL will re-converge, but it will take time for price-discovery to disperse across symbols. Sampled at high-enough frequency you'll see a high cross-correlation at that time for IBM-MSFT, but low for AAPL-IBM/MSFT.

A simple way to make money off this might be to check for eigenvectors that persistently form at certain times of day or at common rebalance times (e.g. on the hour, end of month, etc.). Large portfolios often trade on fairly predictable schedules. By the time the dynamic eigenvector appears in your Kalman filter its probably too late to monetize it. but if you can reliably predict a similar eigenvector at the same time tomorrow, then you can get in front of it. Particularly if you can predict its directional bias.
Dynamic PCA
Posted by EspressoLover on 2015-12-01 00:59
@Nonius

Quick and dirty approach. Roll a N-window PCA for [[T, T+N],[T+1,T+N+1], ...]. For a large enough N relative to sample time, the period-to-period change in PCA eigenvectors should be sufficiently smooth. It should be trivial to map the eigenvectors at time T to its counterpart at time T+1 simply by looking at the cross-correlation of projections in the common dataset [T+1, T+N]. Even if eigenvector ranks change, just follow the map. You could even get a little fancier by setting a correlation threshold where you assume an eigenvector has "dropped out".
Constructing a minute-by-minute volatility curve
Posted by EspressoLover on 2016-01-25 23:37
Have very little experience in options, so just throwing this out there. Is there any reason not to consider time smoothing as well as surface smoothing? I.e. option contract X had a liquid quote 2 minutes ago, and the surface hasn't significantly changed since then. Seems like you should put some weight on that previous quote, even if it isn't the smoothest surface interpolation.

> [A]re there any features of commodities that would make vol surface construction and interpolation very different to, for example, equities or rates? 

Again hardly any experience with options, but speaking from experience with futs, commodities tend to have much more frequent discontinuities across expiries. E.g. the Jan NG contract can trade highly disconnected from the Nov and Mar contracts, in a way that you'd be unlikely to see in the ED contracts.
Constructing a minute-by-minute volatility curve
Posted by EspressoLover on 2016-01-30 05:00
Given that OMMs are picking people off on delta, is there much benefit to incorporating high-frequency alpha on the underlying? Seems like it's very common to hit stale option orders after the underlying moves, so there should be some potential in predicting a move at least a few milliseconds before it happens. At least to get in front of other participants. But haven't heard much about OMMs focusing on alpha in the underlying. (Then again I really know very little about OMM).

On the flip-side, sounds like front month options may not be a bad way to get cheap execution when you don't need a lot of liquidity. Even if you can't turn a profit from picking off stale contracts, you may be able to get delta exposure at much lower cost than in the more efficient underlying. Managing the greeks, even if small, could be more headache than it's worth though.
Intraday Volatility Modeling
Posted by EspressoLover on 2016-02-03 06:28
Hawkes process

http://users.iems.northwestern.edu/~armbruster/2007msande444/report2.pdf
Intraday Volatility Modeling
Posted by EspressoLover on 2016-02-05 01:42
Have I used the exact model from the paper? No. I just wanted to link a demonstrative example of applying Hawkes to intraday financial returns. If desired, seasonality can be trivially stacked on top of a Hawkes vol model. Though to be fair to the authors, ACD on high volume, liquid products should proxy for a lot of intraday seasonality.
Constructing a minute-by-minute volatility curve
Posted by EspressoLover on 2016-02-16 19:28
@radikal.

Ah, I see. Thanks for clarifying, that makes more sense.
Market Making Illiquid Parts of a Futures Curve
Posted by EspressoLover on 2016-04-01 06:38
Say you're looking at a futures curve like Nat Gas. (Nothing particularly special about NG, just thought it's a nice archetype.) Volume is heavily concentrated in one or two expiries at the front of the curve. But there's still liquidity and regular volume for a dozen or so expiries further out in the curve. They might trade at 1-10% of the volume of the front-month contract, but they still have pretty good liquidity. Usually no more than a few ticks wide spread, with pretty good size on the touch throughout the day.

It would seem that market makers are pretty active in these contracts and quote pretty competitively, at least given the relatively low volume. What are the common approaches that liquidity providers here are taking? Based on this thread, I would guess it's not altogether too different than quoting front month options.

That is make most of your edge when the front contract moves and you pick off resting orders in back contracts. Lean your quotes in the direction of keeping higher order curve factor and expiry-specific exposure to a minimum. Maybe collect some spread edge when the front month is looking stable. Does this reasonably describe how most market participants are operating?

The main component I'm unsure about is how exchange supported calendar spreads change this equation. Obviously a cal-spread has much lower net exposure to curve level/PC1, at least relative to higher order curve factors. I'm not sure if the calculus of spread quotes would significantly changes the outright markets.
Market Making Illiquid Parts of a Futures Curve
Posted by EspressoLover on 2016-04-05 08:17
@ Patrik

> I'd assume [machines] live more in the statistical world in terms of how they make decisions... Winter gas vs summer gas can be almost like 2 independent commodities (depending on storage levels etc). At higher frequencies that may not be so important, i.e. you're literally executing legs pretty much at same time, but for the market maker example above it's a big risk.

I think this an insightful point. In the "statistical world" when you're trading on tight electronic quotes most of your counter-parties are other bots. Getting picked off is probably much more of a risk than dealing with "natural flow" like producers looking to hedge. I think the principals are generally the same: get paid to take risk, keep exposures contained, be cognizant of hidden factors accumulating in the portfolio. But in the electronic world you're dealing with substantially more toxicity, particularly with regards to price discovery in the liquid contracts.

W.r.t to the NG winter vs summer, that's an interesting observation. This may be going off on a tangent from the original point, but playing around with the data I've found that the NG curve behaves different around different times of the year. Front-year summer vs winter contracts seem to move more tightly together during the summer, with maximum disconnection peaking around November. But even during November the longer-dated summer contracts are still trading with betas of 0.3 or higher to the front winter months. If you're market making in the former that gives you a little more freedom, but you can still be picked off quickly if the front contract moves a few ticks.

@ HockeyPlayer

Ahh, that seems to to explain a lot. Great point, thanks for pointing that out. I was looking at consolidated quotes and assuming that the contribution of implied calendar liquidity was negligible. Bad assumption. I'm going to have revisit the data...

I'm guessing the matching engine updates implied quotes across contracts atomically, which changes the nature of the game. If most of the liquidity is coming from spreads, it's probably directly or indirectly chained to the liquid front month. If that's the case and front ticks up, that probably propagates a change in spread-implied quotes down the curve. In which case picking off stale quotes, or even hoping to reprice an outright quote after a tick is hopeless. Though you could still try to pre-anticipate front month ticks with some sort of alpha.

Just thinking out loud, it may also be the case that spread implied quotes might cause back-curve prices to overreact to front month price changes. On the sizable majority of futures curves the "true" beta of price changes from the back to the front month is well less than 1.0. There may still be opportunity to pick off quotes in the illiquid back, but it might be after, not before, the front month price tick propagates.
Estimating correlation of predictor
Posted by EspressoLover on 2016-04-21 00:16
Well,

Corr(X,Y) = Covar(X,Y) / (SD(X) * SD(Y))

Since, you're taking unit variances

SD(X) * SD(Y) = 1 =>
Corr(X,Y) = Covar(X,Y) = E((X - E(X)) * (Y - E(Y))

Since, you're de-trending the returns and predictor:

E(X) = 0 & E(Y) = 0 =>
Corr(X,Y) = E(X*Y)

Then you're trying to find the best statistical measure. If you do one long concatenation then

Corr-Stat-Concat(X,Y) = Sum( Xn * Yn) / (# of obs)

If you do each stock individually then:

Corr-avg-of-stocks(X,Y) = Sum( Sum(X-stock-n * Y-stock * n) / (# of obs in stock) ) / (# of stocks)

Assuming you have equal number of observations per stock these two results are mathematically equivalent. If you don't have equal number of observations, the per-stock averaging gives more relative weight to low-observation stocks.

Basically if you're dealing with zero-mean variables, correlation can be averaged linearly across multiple series. If you decide not to force zero-mean then the two aren't equivalent. (But since we're talking about stock returns, mean should be very small relative to sDev. The residual effect should have a tiny impact.) The concatenated version will treat each stock's returns/predictor mean as the global mean across all stocks. Whereas the per-stock-average version allows each stock subset to fit their own mean. By pre-treating the data and de-trending at the stock level, you would be doing the equivalent of the latter.
Forecasting Methodologies
Posted by EspressoLover on 2016-06-08 18:58
Deep (recurrent for time series) nets.

http://arxiv.org/abs/1407.5949
Forecasting Methodologies
Posted by EspressoLover on 2016-06-08 20:29
Don't disagree at all. In fact I'm pretty skeptical that deep nets would work all that well on asset price prediction. The signal-to-noise ratio is just too low. It's too easy for the auto-encoding to blow all it's statistical power on randomness.

But c'mon man. There's plenty of time for tweaking boring Kalman filters over the next 30 years of sitting in a soul-crushing office. A master's thesis is a last hurrah to do something totally impractical, but still badass cool.
Forecasting Methodologies
Posted by EspressoLover on 2016-06-10 01:10
@radikal [Sorry to go off on a thread tangent.]

By speed, I assume you mean evaluating an already trained model. Large-scale neural nets, especially deep nets, are basically unworkable in a low-latency context. But have you investigated any of the research on model compression? (Link to an example below). My best luck has been with training large ensembles during learning. Then use something like LSH or very pruned nets to get the latency to something acceptable.

https://arxiv.org/pdf/1504.04788.pdf

Forecasting Methodologies
Posted by EspressoLover on 2016-06-11 02:37
@radikal

Thanks for the links. Agree that the limits of compression probably are beyond O(10us) HFT applications. Even with a fast NN library like FANN, that's only enough clock-cycles for ~20k connections.

@jslade

Deep learning is definitely over-hyped. But I think you're throwing the baby out with the bathwater. There's strong theoretical bounds on the compactness of deep nets. For certain types of problems that makes shallow learners effectively useless, even if they are universal.

The perennial problem with the ML hype cycle is that everybody forgets the NFL theorem. Once some new method X makes a breakthrough on problem Y, the entire community leaps to the conclusion that it should be used everywhere and that everything else is now obsolete.

http://nicolas.le-roux.name/publications/LeRoux10_dbn.pdf
Forecasting Methodologies
Posted by EspressoLover on 2016-06-13 19:32
@jslade - "I'd be curious what kinds of real world problems you're referring to"

Definitely not an esoteric math thing. Deep learning's comparative advantage is when the problem has layers of abstraction. This is why it does so well in image classification. E.g. detecting a certain type of animal in an images requires building up recursive layers of representation. Extract the edges from raw pixels. Then convert edges to curves and counters. From there try to map to simple body parts: eyes, mouths, paws, ears, antenna, etc. Finally if those body parts are arranged the right way, then pick out which animal. (Check out the first link below for an example of this with image classification)

Directly mapping raw pixels to a kitty is not impossible. But there's so many different ways that a cat can be oriented in a picture that a single hidden layer or shallow learner ensemble grows exponentially with the number of pixels. First mapping to a simplified intermediate layer, like edge and contour detection, compacts the parameter space by discarding mostly irrelevant data like noisy variations between neighboring pixels. (Second link is the best non-hype-y paper I could find related to layers of abstraction).

All that being said, I'm still generally skeptical that deep learning has that much to offer quant trading. Just don't think that there's actually that many layers of abstraction to markets. Sometimes you see something like it shoehorned in by some egghead who watched Pi too many times. "At Fund X we use a hyper-advanced system of different trading models depending on a classification of 1 of 67 different market regimes." Almost every single time, those systems tend to be overfitted vaporware. The second major problem is that deep learning is really really susceptible to adversity in the distribution. (Third link). A DNN trading system without some sort of clever mitigating kludge would probably get chopped up by adverse selection.

http://engineering.flipboard.com/assets/convnets/yann_filters.png
http://arxiv.org/pdf/1502.04042.pdf
https://arxiv.org/pdf/1511.07528.pdf
Forecasting Methodologies
Posted by EspressoLover on 2016-06-13 20:03
Re: Meat Consumption [sorry il_vitorio for completely chopping up your thread]

The problem with vegetarianism is that it's a lifestyle choice that mostly suits the young. The older we get the more important meat consumption becomes. Basically leucine consumption is really critical to prevent age-related muscle loss. And the number of plant-based proteins high in leucine is basically zip. Maintaining high lean muscle mass is really *really* important to keeping a lid on advanced age mortality.

http://www.nature.com/nrcardio/journal/v8/n4/images/nrcardio.2010.209-f4.jpg
http://digitalcommons.wku.edu/cgi/viewcontent.cgi?article=1296&context=ijesab
http://jn.nutrition.org/content/130/11/2630.full
Formula for Optimal Portfolio of 2 Assets when No Shorting Allowed?
Posted by EspressoLover on 2016-08-01 17:54
With two assets the only local maximum is the value from the formula you posted. Basic convex optimization would select the boundary of the constraint. (If the optimal point falls in the constraints, then obviously use that). E.g. if the formula says -34% A/134% B, then the optimal constrained solution would be 0% A/100% B.
Formula for Optimal Portfolio of 2 Assets when No Shorting Allowed?
Posted by EspressoLover on 2016-08-01 21:29
The issue with negative returns isn't from the boundary conditions, it's because the formula you're using isn't defined for non-positive assets. Consider the example you posted. The formula outputs an optimal unconstrained negative weight for Asset A. Does that make sense? Asset A has positive returns and Asset B negative returns. Shorting Asset A to buy Asset B would result in negative expected returns.

Obviously simply buying 100% Asset A and 0% Asset B would be superior to that, so the formula's clearly erroneous in defining optimal allocations under these conditions. As another example consider, two 0-return assets (with 0 risk-free rate), the formula isn't even mathematically defined. (The problem comes from the denominator becoming non-positive).

If you want to properly handle negative return assets, just invert the asset. So in your example transform asset B to asset B-prime:

Rb-prime = -Rb = 22%.
Corr(A, B-prime) = -Corr(A, B) = 16%.
StdevB-prime = StdevB = 10%.
Wb-prime = Wa - 1
Wb-prime is constrained in [-1, 0]

Then solve normally and inverse the transform. Wb = -Wb-prime
Formula for Optimal Portfolio of 2 Assets when No Shorting Allowed?
Posted by EspressoLover on 2016-08-01 22:49
Sorry, typo in my original formulae:

Wb-prime = 1 - Wa
Wb = -Wb-prime
=> Wb = Wa - 1

So the unconstrained optimization is 27% Wa, -73% Wb. The constraint boundary is 100% Wa, 0% Wb.
Formula for Optimal Portfolio of 2 Assets when No Shorting Allowed?
Posted by EspressoLover on 2016-08-02 00:58
First Example

The constraints on Wb-prime is [-1, 0]. So once you enforce the constraint

Wb-prime -> 0.
Wa = Wb-prime + 1
Wa = 0 + 1 = 1
Wb = 1 - Wa = 0

Constrained solution is 100% Wa, 0% Wb.

I think you may have an error in your solver program. 100% Wa has a Sharpe of 0.0476. (Ra/StdevA = .01/.21).

90% Wa/10% Wb has a lower Sharpe of 0.0325 by my calculations:

E(Return-portfolio) = .9 * Ra + .1 * Rb = 0.6%
Stdev(Portfolio) = Sqrt((.9 * StdevA)^2 + (.1 * StdevB)^2 + 2 * .9 * .1 * StdevA * StdevB * Corr)
= 18.4%
Sharpe(Portfolio) = 0.0325

Second example

Inverting the asset is technically a hack as it shifts the unconstrained solution domain. The optimal weight formula solves within the domain of [Wa, Wb] such that Wa + Wb = 1. By inverting the asset you're solving in the domain of [Wa, Wb] such that Wa - Wb = 1.

Once you enforce the constraint, in most scenarios the two solutions project to the same boundary. In this example the assumption fails.

Another approach you can take is by checking the denominator of optimal weight formula. If the denominator is non-negative (as is the case here), the formula's properly defined. Even if the asset returns are non-positive. If the denominator's negative, there's a singularity in the optimization.
Formula for Optimal Portfolio of 2 Assets when No Shorting Allowed?
Posted by EspressoLover on 2016-08-05 06:36
1) Yes, meant the second case. I think you have a mis-calculation. Denominator is positive:

Ra = .35
Sa = .15
Rb = -.02
Sa = .1
Sab = .72 * .1 * .15 = -.0108

Ra * Sb^2 + Rb*Sa^2 - (Ra+Rb)*Sab =
.35 * .1^2 + -.02*.15^2 - (.35-.02) * -.0108 =
.0035 - .00045 + 0.003564 =
.006614 > 0

2) The optimal weight formula forms a hyperbola with respect to the asset returns. Try it yourself with the above example. Calculate the optimal weight while varying Ra from -25% to 35%. You'll see the plot forms a hyperbola with a singularity around .032. As Ra approaches this value from the right, the optimal (unconstrained) weight goes to +Infinity.

For anything at or below the singularity, the optimal weight is still +Infinity. In fact anytime the denominator is negative the optimal value will be +/- Inf.
Implied volatility after acquisition
Posted by EspressoLover on 2016-10-28 23:19
I'd say that even the framework of how you're approach the problem is incomplete. Especially at short-tenors, a significant proportion of volatility is driven by noise trading. The corporate structure itself is probably insufficient to determine how the new entity behaves. You also have to take into consideration the buy-side.

Even more succinctly, I'd be willing to bet money that under the exact same merger terms, the new entity's volatility behaves a lot more like A if it keeps A's ticker. And more like B if it uses B's ticker.
Machine learning for pre announcement index rebal
Posted by EspressoLover on 2017-04-09 23:13
Seems like engineering a good feature set is much more important than model selection here. I'd just start with a random forest. RF's are pretty much the closest thing to a free lunch in machine learning. It might not be the optimal model, but if your features work at all, they'll almost definitely work in an RF right out of the box. If not, back to the drawing board.
Machine learning pipeline for trading
Posted by EspressoLover on 2018-05-15 23:47
When applying ML to any problem domain, it's always useful to keep the no-free lunch theorem in mind. For every world where X is the right approach, there's another situation where X is exactly the wrong approach. Sometimes those worlds are absurdly Kafkaesque and don't match reality at all. But you should always be able to articulate, where, when and why your favorite approach breaks down.

In the spirit of NFL, you need to think about what kind of inductive biases are true of the application that you're working on. Without at least some domain knowledge, a blind ML approach is almost never going to be useful. In my experience, when working on financial signals, there's a recurring set of stylized facts that are usually true. The below is my goto list (in order of how universal they seem to be), with the caveat that YMMV depending on your specific problem.

* Markets are mostly efficient, and the signal-to-noise ratio when predicting forward returns is *very* low. Make sure that whatever model you're using is extremely robust to noise in the dependent variable. Luckily there's pretty extensive literature on this topic.

* Returns are mostly normal-ish. At least enough that minimizing MSE is almost always the best approximation for the model's MLE. MSE is stable, tractable, easily computable and data-efficient. Deviating from it isn't worth the minor gain to MLE. (This doesn't apply to data errors, like zeroed prices, which you should clean up before modeling.) (Also keep in mind this is an entirely separate topic than risk management, where deviations to normality *are* important)

* The unconditional prior on returns should be zero. It's almost always worth it to spend the degrees of freedom to hold datapoints out-sample, then single OLS a shrinkage coefficient on top of the fitted signal. This is particularly true of ML models like trees and nets, which are optimized for classification. Remember in classification, there's no penalty for overconfidence. Whereas in trading there's almost always *very big* penalties for overconfidence.

* The correct param set for the subset of "special" tradable points (like when signal is above the cost threshold), is usually in close proximity to the unconditional param set. MSE is a clear-winner and we shouldn't deviate it from in in the objective function. It's better to take an EM approach. First unconditionally fit your params. Then use the net signal to score the specialness of each point. Re-fit giving higher weights to more special points. Re-score with the re-fitted params. Repeat until you converge.

* Most finished trading systems tend to be a collection of mostly orthogonal sub-components. Therefore large magnitude signals tend to actually not be that special, as they're mostly driven by the random coincidence of the orthogonal sub-signals.

* With regards to the OP's question, don't think too hard about trading a single signal in isolation. Very likely it's going to be mixed with a bunch of other signals, plus some spiffy monetization logic. We're more interested in something that "plays nice" as a building block. Linear models trained with tractable objective functions almost always play nice. Sometimes it's worth it to deviate, but be aware of "downstream costs".

* Within a market, the correct parameters for separate instruments tend to live in proximity to each other. Therefore it's usually better to fit a market-wide model with cardinality of the entire dataset. Then boost instrument specific models on top of the market model's shrunken out-sample prediction.

* Less liquid instruments and periods tend to have high predictability. Therefore it's appropriate to weight MSE in proportion to liquidity or capacity.

* That being said, the less equal the weights the lower the effective cardinality of the data set. If you're data-starved and living near the former end of the bias-variance spectrum, sometimes flattening the weights costs less bias than it frees up.

* Param sets change over time, but in a slow, continuous way. Therefore it's usually better to fit a model using longer periods, then boost using shorter more recent periods. Alternatively you can weigh more recent points heavily using something like exponential decays on the weights.

* Shorter horizons are more predictable than longer ones. Optimal long horizon parameters tend to live in close proximity to short-horizon params. Often it's best to fit with the shortest horizon that makes sense. Optionally boost, shrink and/or stretch to get a long horizon model.

* Interactions between features tend to be pretty shallow. Deep models don't tend to work well in quant trading. Well thought out feature engineering usually guarantees minimal interaction. Consider spending more time on feature engineering if you find yourself getting big gains from depth. Also consider using randomized features as a sanity benchmark.
Machine learning pipeline for trading
Posted by EspressoLover on 2018-05-18 04:23
> How you overcome trading costs with R-squared below 5% intraday?

Well, let's just play with a toy model. Let's say we're looking at a 1-second signal on FB. FB's open-to-close volatility averages about 1.25%. Simplifying to assume that returns are i.i.d. that's a 1 bp for 1-second volatility. If our signal has 2% R-squared, its standard deviation will be 0.14 bps. At $182 that's a magnitude of $0.0025 in dollar terms.

Let's say that we're taking liquidity by crossing the spread, bid-ask spread is almost always 1-tick, and we pay zero fees or commissions. Therefore our one way transaction costs $0.005. Relative to our alpha, that's a 2-sigma threshold. Let's simply again and assume the signal is normally i.i.d. distributed. With 23,400 seconds in a session, we'd expect to get 532 profitable trading opportunities per day with an average net profit of $0.0009.

If we have good execution we can do even better. Start with evaluating the signal continuously in real-time rather than at fixed time slices. That gives more opportunity to "catch" a profitable opportunity. If we execute on BYZ and get $0.0015 in rebates for taking liquidity, then the profitability goes way up because we can trade a lot more and net larger profits. Or if we can get significant price improvement from NBBO, either by providing liquidity or hitting dark liquidity.

> From my experience in sub second frequency the "winner" take all the cake.

I generally agree. Some exceptions I can think of though... Different operations have "jitter" between their systems, even if they're modeling exactly the same phenomenon. Your coefficient and model are never going to exactly match mine. So there will always be cases where Alice's signal triggers, whereas Bob's does not. This is more the case when the alpha tends to "drift". If you have an alpha that "jumps", then most of the time the state instantly changes to the point where the opportunity is obvious to everyone. At which point latency wins.

It also depends on the monetization strategy on top of the signal. Taking lit liquidity is definitely biased towards winner take all. Once the signal triggers, then the trader is going to move the price instantly. But for liquidity providers, many participants can simultaneously be trying to fill in the direction of the alpha. Since the space of all providing strategies is much larger and higher dimensional than liquidity-taking, there's often more room for multiple niches, even with identical alphas shared by participants.
Machine learning pipeline for trading
Posted by EspressoLover on 2018-05-18 22:22
Well to start the mechanics of placement. Do you improve, join or post away? Will you take the liquidity of a dying level to be the first to post in the opposite direction? How deep do you layer the book? How do you trade off queue-holding at deeper levels versus using your capital at more fillable prices? How do you divide your quotes between venues? Do you use mid-point pegs? Do you post with ALO?

Then you have to consider that a limit order has an entire life cycle. Unlike an IOC, the order lives for a long time, so there's the continuously re-evaluated decision about whether to cancel, modify or keep it alive. It's not just a one-time calculation of profitability? It's this whole recursive decision process- maybe I'll expect to lose if I get filled in the next iota, but if I don't I'll likely move up in the queue by X. But then what's the likelihood that I wind up canceling anyway...

You also have to manage adverse selection and conditionality. (Taking does have an element of adverse selection, especially in the presence of latency competition, but it's definitely a lot less than providing.) Not only do you need an alpha model, but you also need a toxicity model. Even the alphas may need to be conditioned on your order's position. In addition you have to take into account your pre-existing inventory, and estimate how long it takes to exit positions. (This isn't a problem without adverse selection, because the unconditional drift is zero.)

Finally you have to consider market impact. With a fast-decaying take strategy, you just swipe all the liquidity as fast as you can. But if you're a decent size relative to the market, then putting out a giant visible limit order will push the market away from you before getting filled.

That being said, remember that many high-dimensional optimization problems have low effective dimensionality.
Machine learning pipeline for trading
Posted by EspressoLover on 2018-05-17 18:12
> What kind of R-squared can I realistically expect?

It really depends a lot on context. Shorter horizons, less liquidity, thicker books, higher t-costs, less developed markets, more event-driven sampling and noisier price measures all increase R-squareds. In general R-squareds above 5% intraday (time-sampled) and 1% interday definitely smell. That's usually a sign that there's some sort of overfitting or lookahead bias leaking into the model. Or that your price metric has a pathology, like bid-ask bounce. On the flip side you may still have a great signal with a much lower R-squared than the above.

> Are you saying multiply the fitted signal by a factor less than 1?

Yes, I'm suggesting holding out some of the dataset. Let's say you're using kNN regression, which is heavily overconfident in the precense of noise. The signals would be way too large and you'd over-trade like crazy. Say you have 10,000 points, keep 1,000 in reserve and use the rest to train kNN. Then generate out-sample kNN predictions for the reserved points. If you apply single OLS, you'll get a coefficient between 0 and 1. That tells you how much to shrink the kNN predictions by. If your kNN model spits out +20 basis points, and your shrinkage coef is 0.25, then your net output would be +5 basis points.

> I have a hard time imagining how to fit multiple instruments in the same model.

It's easy to see this approach with something like SGD. There's a fitness landscape of parameterizations, and the "height" of each point is goodness of fit. Our dataset is just a finite sample from some the Platonic "true model", so MSE is a noisy measure of a point's height in the landscape. Depending on our flavor of SGD we generally have some sort of scheme where we start with larger steps to avoid pits of noise, until we end up in the neighborhood of a local maximum. At which point we start taking smaller, more careful steps to try to find the exact peak.

Now think of a multi-instrument and and single-instrument fitness landscape. The landscape for AAPL is similar, but not exactly the same as the landscape for all S&P 500 stocks aggregated together. However we have 500 more data points for the S&P500, so our estimates are much less noisy. It makes sense to start by looking for an optimal point on S&P. After finding it we can be pretty confident AAPL's optimal point is in the neighborhood. We get a much better starting point this way. The gradient in the proximity of a local max is much less noise driven. In this way we significantly reduce the empirical risk from a small single-instrument dataset.

> if interactions between features are shallow, why don't we treat them as separate sub-signals and just focus on one feature per model?

Well, first off I'm distinguishing "interaction" from "multicollinearity". (For simplicity, I'm talking about linear OLS, but most of this analogizes to other supervised learning techniques.) If you have non-orthogonal features, then regressing each one individually produces a worse fit then regressing the features together. A toy example is some pairs-signal between X and Y. If X moved down last period and Y moved up, we'd predict X's price to go up. However if X and Y both went up, then there's no divergence and we'd have zero signal. If X and Y rarely diverge, then single-regressing X or Y's last move would offer little predictive value. But by regressing the two together we effectively filter out the non-divergent moves, generating a much stronger siganal.

I'm using interaction in the sense of "interaction term", i.e. non-additive influence on signal. Volume is a good example. We may expect certain events to predict price more or less depending on if they were accompanied by unusual volume. In effect the regression would be Y ~ X + X*Volume. Whereas volume as a standalone term (i.e. Y ~ X + Volume) would be insignicant. This is a 2-depth interaction because it involves two features. Image recognitition is a "deep learning" problem because you can't just build an additive model of individual pixel values, or even small combos of pixels.

> Why don't we put the sub-signals in a single model in the first place?

There's plenty of reasons you want to segregate features and fits into separate sub-signals. First is just practical. You have different researchers working on different problems. Keeping the teams modular with minimal overlapping concerns is easier if each group delivers a separate semi-finalized alpha.

Second is that if feature set A is orthogonal to feature set B, then there's no gains to fitting them together. So, why complicate things? Even if they're non-orthogonal, that dependence often just compresses down to their net alpha. Order book features often are co-linear with relative value features, however that's largely an artifact of liquidity providers leaning on RV signals. If you fit an order book alpha, then fit a RV alpha, then regress the two together to get combo weights, you'll often end up with a net signal that's basically the same had you just regressed all the features together.

Third you usually have different Bayesian priors on qualitately different categories of features. Simple example: signal X includes hundreds of features, many of which are spurious and unstable. Whereas signal Y has a few rock-solid features. If you're doing LASSO regression on X it's likely that you'll need tight regularization. Once you throw the two together, you'll probably overly shrink Y's features and underly shrink X's. In this case you want to keep your hyperparameters pooled separately.
Sampling methods
Posted by EspressoLover on 2018-07-14 19:10
Unless computational resources are constrained, just EM resample.

Start with some base sampling scheme (fixed or tick) and fit the signal. Using the expectation from that signal, tag all the points where you'd have an actionable trade. (Probably something like signal flips while exceeding some t-cost threshold.) Now you've got a new sample set. Fit the signal again on that new sample. Then repeat and repeat. Eventually you'll converge to some point where the signal is hardly changing at all.

Also keep in mind, when evaluating continuously you need to be careful about how you define actionable trade points. There will be certain market data events where the price moves away faster than your latency window. These should never be included in the training set, otherwise the signal will likely be overconfident.

It's hard to imagine this problem being non-convex for anything but the most insane signals/markets. So, you're pretty much guaranteed to converge to the globally optimal sampling scheme.
smart processing for noisy option chain snapshots?
Posted by EspressoLover on 2018-08-12 00:31
Not an options guy, but do have a lot of experience with the pitfalls of unreliable data. Let me throw out an alternative. Have you considered just buying raw book data directly from the exchange. Erroneous prices or asynchronous updates are very rarely an issue with exchange book data. Even between venue, the clocks are probably going to be synchronized to no more than a few milliseconds.

Obviously this is a lot more expensive than buying repackaged vendor data. But from a statistical learning standpoint, having a few months of clean data is often worth more than years of unreliable data. I would at go through the exercise of calculating how much quantity of data you could get if you spent your budget on book data from the exchanges. Even if not, sometimes just getting a small sliver of exchange data is worth it to benchmark vendor data against.
Statistical significance for a subset of a non-normal distribution
Posted by EspressoLover on 2018-09-10 19:33
Maybe you're putting the cart before the horse. Just because the underlying distribution is skewed and leptokurtic doesn't mean that the sample statistics deviate significantly from normality. Unless the sample size is very small (less than 100 independent points) or very non-normal, CLT probably means that you can use plain ole' T-tests.

At the very least, I'd get a handle on the issue by bootstrapping the relevant sample stats. (I'm assuming that you primarily care about the difference between the population means.) Histogram out the values you get from bootstrapping. If you can't really eyeball a significant non-normality from this, then you're probably fine relying assuming CLT applies
Statistical significance for a subset of a non-normal distribution
Posted by EspressoLover on 2018-09-11 18:22
> Can we actually assume i.i.d. considering that these are sample and a sub-sample? However, I'll try that tomorrow.

I would suggest just comparing the sub-sample with the complement of the sub-sample. That eliminates any issues of overlapping sample points.

The null hypothesis is that the distribution of the sub-population is different than the distribution of the population. That's true if and only if the sub-population is different than the complement-sub-population. So comparing sub-sample and complement tells you the same thing.

#drawdown

I just want to add here that estimating drawdown is a whole 'nother bucket of worms. Unlike mean/variance/skew, it's not a population summary statistics. It's a time-series property, because the sequence of returns matters. In which case coming up with a tractable, analytical statistical test is a lot harder.

Plus drawdown has all sorts of pathological issues in terms of its mathematical behavior. For example a shorter-sample will always have lower expected drawdown than a longer sample, even if the returns are i.i.d. drawn from the same population. Now add on top that you want a nonparametric test, which is really hard to do with time series.

Personally, I'll use drawdown to give me a "gut feeling" about how a strategy behaves. But I'd just pass in terms of using it any formal statistical sense (like testing the hypothesis that one series has larger expected drawdown than another series). I think as long as you estimate mean, variance, skew, kurtosis, auto-correlation, heteroskedasticity and regime effects that pretty much captures everything relevant unless your time series is super-weird.

> assume that this is a process for signal validation, involving the actual prices and P&L quality is not the right approach because it will mask the signal quality.

If you're talking about signal fitting, not portfolio/risk management, then just use R-squared. Technically least-squares is MLE for normal distribution, but the target variable has to be really skewed or leptokurtic to meaningfully change the results. Very rare for returns, even VIX-type returns to be affected.

You can try for yourself, cap your dependent variables at three-standard deviations from the mean and re-fit the least-squares model. I'm willing to bet this fitted model and the vanilla model have 90%+ correlation with each other.
PCA and continuous futures time series
Posted by EspressoLover on 2018-10-18 19:25
@deeds

In some sense On-The-Run/Off-The-Run spread kind of does what you're suggesting.

The reason we only use a three-factor model, isn't because there's no additional structure outside those eigenvectors. It's because the PCA approach itself is not capable of recovering that additional structure. One reason is the random matrix noise is too large relative to the magnitude of the smaller eigenvalues. I'm pretty sure the 21 year coupon strip still co-moves with the 22 year coupon strip even after accounting for level/duration/skew. But the noise from the other 28 years washes it out.

The second reason is if the structure itself does not conform to projection onto fixed maturities. OTR spread is a specific example. You can't just come up with a static weighting of spot rates. The components of the spread changes over time as individual bonds rotate out of on-the-run status. Another example would be an intra-curve momentum factor. Again the weights on this factor change dynamically over time, so PCA will never recover it.

PCA is one tool in the toolbox. And its capabilities and limitations have been tested again and again in this context. It's not going to give you anything new and surprising that hasn't already been discovered a thousand times over the past three decades. But it is a very good way to build a starting framework in a way that streamlines more exotic analysis.
US Futures Liquidity Execution Limits per trade
Posted by EspressoLover on 2019-04-11 16:59
The max you can fill at one time is mostly just going to be the touch size. ES almost always quotes one tick wide. Hidden liquidity isn't a significant proportion of the book at any given time. Swiping the ask (bid), especially when the touch size is large, will almost always result in a new bid (ask) level formation at that price.

You can probably juice more by using a limit order larger than the touch. E.g. if the ask is 600 contracts, then send a buy limit at the price for 1000, and let the remaining 400 rest as the new best bid. You'll be at the front of the queue, so the probability is high that you get filled. Be aware though that this exposes you to adverse selection, particularly if your resting order is large. So, in some sense you're still "paying" more in T-Costs than you are on the marketable shares.

That being said, do you have a time window where you can wait for the best opportunity, or do you have to fill at an arbitrary time?

If it's the latter, then your answer's pretty much just what the average touch size is. Unfortunately, that's gone down a lot. As the price of the S&P rises the tick size represents a proportionally smaller bid-ask spread. Therefore the paper you read is likely to be out of date unless it's very recent. The current average touch size is around 150 contracts.

If you can fill within an arbitrary window, then the answer is pretty much whatever the largest average touch size is in that window. Just sit and wait until you see a big ask before firing off your buy order. There's some complexity over the online process of knowing when you're actually seeing the largest touch, but you should mostly hit somewhere close to it everyday.

However if you approach it from this angle, be aware that you're exposing yourself to book pressure drift. In the short-run, market direction's highly sensitive to the relative size between the bid and the ask. If you're selling when the bid is very large, then most likely the market will move against immediately after your fill. Therefore this approach imposes additional TCosts beyond the normal spread.
Two needles in a haystack
Posted by EspressoLover on 2019-08-18 03:01
Sounds pretty similar to classic anomaly detection. In which case your best bet for someone with practical experience is fraud detection at a transaction processor.
Where to find inspiration for signal ideas
Posted by EspressoLover on 2020-06-25 16:21
> Our collaboration process to increase R^2 for inspiration include watching videos on xvideo.com

Ahh... So that explains the web browsing habits of a former co-worker of mine...

Anyway, regarding OP's question, I'd take a step back. Who are the ultimate consumers of the signal? There's going to be a big distinction based on that. A market making system will potentially want to use very different alphas than a directional liquidity taking strategy or an execution algo for for large positions.

It sounds like you're fitting alphas on a dataset of arbitrary time slices. Which is fundamentally a rebalance-based approach. At that horizon, it's usually better to think in terms of discrete events that trigger trading actions. You can identify candidate sets that constitute potential events, and specifically fit your signals at those points. Rather than random points in time. If you're providing liquidity, then those events are obviously times when liquidity gets filled. But even if you're taking, there's probably a narrow range of events that constitute the vast majority of your triggers. Things like a narrowing of the spread, new level formation, large trade at the touch or a tick in the index futures or leading instrument.

Also, what makes you so confident that the limitation is the lack of signal? 9% out-sample R^2 on 1-minute returns seems pretty decent to me. (Depending on liquidity, microstructure, tick size, book thickness, volatility, etc.) What would you be happy with? Do you have benchmarks against competitors? Are you sure this is the bottleneck? Are you sure there isn't lower-hanging fruit by reducing latency, or smarter order placement, or better risk management, or lowering trading costs, or a wider instrument universe?

Now to actually answer your question... It seems like the biggest thing missing from your model is accounting for price discovery that happens in other instruments. Say, you're trading NQ, then you definitely want to be incorporating the action in ES. And most likely the bond futures, VX, and even the cash equities market if possible. You can start by looking up the academic literature on the Epps effect for to help get started.

You mention book pressure, but that means very different things to different people. At the most basic level, you're just taking the difference between the mid-price and weighted mid-price. And that works great, but you can definitely get a lot more sophisticated. If you have order book data, you can qualify the individual orders that make up the queue. Depending on the instrument, the deep book may have predictive value.

And most importantly the historical evolution of the book is as, if not more important, than the instantaneous state of the book. For example a market's that quoting 500x500 then has a market buy order for 400 lots is a lot more bullish at that instant then a market that's been quoting 500x100 for the past ten thousand milliseconds. This kind of segues into a third source of signal in that you can profile the order flow to derive signal. I'd check out some of the archives at the blog Mechanical Markets for some more ideas.
mean reversion and ornstein uhlenbeck in trade time
Posted by EspressoLover on 2020-06-25 16:32
@grisha

The continued use of last trade as a metric of price is pretty much just a holdover convention from the bad old days when an electronic LOB didn't exist or wasn't easily visible. A lot of the foundational academic literature on microstructure was done in the 80s and early 90s, when trade prices were the only real historical series available.

Plus I think there's maybe some stickiness from the fact that "last traded price" is a lot easier to explain without getting into gritty details that define mid-price. So, anything civilian-facing, like Yahoo Finance or Bloomberg, will just use the convention that "price" is synonymous with "last traded price".

In practice, I've never heard of a serious HFT operation that treats last traded price as the fundamental price metric. For any reasonably liquid market, mid-price or weighted mid-price seem nearly universal.
forecasting volatility with high frequency data
Posted by EspressoLover on 2021-04-14 20:05
To a first order approximation, high frequency volatility scales within the arrival intensity of sqrt(trade size). You can certainly get fancier, but 4 times out of 5 that approach will work good enough.
forecasting volatility with high frequency data
Posted by EspressoLover on 2021-04-16 20:22
> but the value in other sources of data, derived from non-quant,

Interesting. Any chance you could share more color on what kind of non-quant data you're looking at or how you're integrating it with traditional market data? Certainly understandable though if mum's the word because it's secret sauce.

paper request
Posted by EspressoLover on 2015-11-20 11:23
What would be the general community interest in setting up a private torrent tracker, so we don't have to keep requesting/sending the same paper N! times?
how to use "Machine Learning" to build trading system?
Posted by EspressoLover on 2016-03-25 16:51
> This article makes some points against it: http://www.priceactionlab.com/Blog/2016/03/machine-learning-rediscovered/

> "Attempting to discover trading algos via machine learning is a dangerous practice because of the problem of multiple comparisons."

Nope

> But when many independent tests are performed, the error gets out of control and significance at the 5% level does not suffice for rejection of the null hypothesis.

Nope

> In practice, machine learning works by combining indicators with entry and exit rules 

That's not how it works at all.

> [T]his process... was abandoned in favor of other more robust methods, such as the testing of unique hypotheses based on sound theories of market microstructure and order book dynamics. These approaches also led to the discovery of HFT algos for market making and statistical arbitrage.

That's not even close to being right.

how to use "Machine Learning" to build trading system?
Posted by EspressoLover on 2016-03-26 07:17
@Mat001

I don't disagree with the assertion that multiple hypothesis require higher p-thresholds. What I disagree with is that this is in anyway a problem related to machine learning. If anything this is far more of an issue in classical statistics than ML. One of the central tenets of ML is that computer cycles are cheap, and statisticians who derive closed-form solutions to weird distributions are expensive.

It's easier to measure error by simply sub-dividing the data umpteen ways, re-running the entire fit process on each sub-division, and directly sampling out-sample performance. You can go hog-wild on an insane number of parameter and meta-parameter degrees of freedom, without worrying about multiple hypothesis testing. A cross-validated ML system always tests a single hypothesis: the out-sample error rate. Goodness-of-fit is completely abstracted from the fit algorithm.

In contrast with classical statistics, you're only fitting once on the entire data set. You derive some sort of probability distribution for the parameter estimate errors. Then you use that (possibly with some Bayesian prior) to determine the significance of the fit. As the fitted model becomes increasingly complex, the fragility and intractability of the significance test gets of out control. At a certain complexity, it's almost guaranteed that some subtle assumption is violated. The p-values become increasingly meaningless. Good luck trying to come up with some a prior significance test on the parameters of a random forest.

Of course, overfitting bias isn't impossible with ML. That is, if you as the researcher are repeatedly data-mining the same data. I.e. if you try some method on the data, look at the result, try another method, and repeat until you achieve an acceptable result. But that's nothing specific to ML, the very same pitfall is present in classical statistics. In fact, in ML if you were foresightful, you could encode this meta-logic in your system. If you plan on using methods A, B and C, picking the best performance, then simply apply that in cross-validation. Voila, you've now just eliminated the multiple hypothesis bias with a simple wrapper function. Applying the same logic with classical statistics in contrast is much harder, largely because of the possible dependence between the estimation error distributions of methods A, B and C.

> AFAIK, everyone is doing it this way... If you know of successful ML applications please let us know.

ML algorithms usually don't work out of the box in quant trading for two reasons. First the EMH means that the signal to noise ratio in financial markets is extremely low. For example deep learning is awesome in a lot of contexts, but its extremely hard to get right in a market context. If I'm trying to predict which pictures are of cute cats, the relationship between the raw pixel values and image classification may be extremely complex. But it still exists and the former strongly predicts the latter. We know this because humans can correctly classify cute cat pictures 99%+ of the time.

In contrast, even the best possible alpha still predicts far less than 1% of the variance of major asset returns. (Except at extremely the short horizons.) The corollary guarantees that the vast majority of the information of any indicator is worthless to predicting returns. Deep learning heavily relies on encoding the structure of the independent variables before even trying to predict the dependent variable. In trading that makes it easy to fall into equilibriums where the encoder thinks it's doing a great job, but it's still entirely missing the relevant 1% of information.

I use deep learning as an example, because it starkly lays out the contrast between traditional ML problem domains and quant trading. But the same issues still apply to some extent in nearly every ML model. Which brings us to the second challenge using ML in quant trading: almost all algorithms are designed for classification not regression.

With good reason. Classification is a simpler problem domain than regression. Often something that works well in classification will fail miserably in regression, or at least need some major tweak or layer added to it. Trading is fundamentally about the expected return to an asset. If I'm considering buying hog futures today, I want to know how much I can expect to make. Classifying whether an asset goes up or down could be useful, but frequently isn't. 80% of options go to zero, a classifier would tell you that selling puts is the best investment ever.

You frequently see people, like the guy in the video you link, try to kludge a solution by using a classifier to determine exit/entry points. That is a very bad approach. It stems from day traders erroneously thinking in terms of exit/entry rather than expected returns and risk. A properly designed alpha system should be completely abstracted from the trading logic. To work with ML methods, that frequently means making changes to get them working in a regression context. That's not a trivial task.
how to use "Machine Learning" to build trading system?
Posted by EspressoLover on 2016-03-28 21:01
@levkly

1) You want to make sure that there's little to no correlation in residuals across separate CV bins. Think about keeping contiguous points in atomic groupings. That is make sure every point in an atom ends up in the same bin. For example let's say you're training 24-hour returns on the previous 24 hour returns. You don't want 10:00 and 11:00 from the same day to be in different bins, they're going to have heavy residual-correlation because they share the 23 out of 24 hours. If cases like this frequently wind up in separate bins, then CV error will significantly under-estimate true out-sample error. In this particular case you probably want to atomically group at the granularity of month or coarser. If looking at a pure intraday strategy with no overnight returns or indicators you can group at the level of trading day.

2) Depends on the strength of the signal you're fitting (weaker signals requires more history) and its horizon (shorter usually allows for less history). Most times it's better to use more history than less. Yes regimes change, but explanatory power from larger training sets usually outweighs the difference. Plus regimes may also change in the future, so training across different historical periods can increase robustness (at the cost of reducing regime-specific fit).

3) I say if you have the development time and computing resources treat history length and retraining frequency as another meta-parameter to be selected in cross-validation. Try multiple history lengths and rolling re-train schedules and select what works best.

As with anything YMMV. Frequently the answer is highly dependent on the data and model being used. Developing an intuitive understanding of both is is really one of the most important aspects of quant research.

@ElonMust

"[Expected return] is not what trading is fundamentally about..."

Whatever, dude. This statement is so incorrect on a basic level, that I'm not even going to bother disputing it. I'm trying to help you, by pointing out how something you read is mis-leading. If you want to ignore what people in the industry actually do, and instead read the next Seeking Alpha article about Fibonacci sequences, then go for it.

Every major quant shop in the world abstracts out alpha and monetization. RennTech, KCG, PDT, Citadel, Jump, Teza, AQR, Two Sigma, Tower, and HRT all start by building models that predict expected return. But you could ignore what the best trading shops in the world do, in favor of a blog written by a charlatan who's only expertise lies in selling worthless, overpriced software to gullible day traders.
how to use "Machine Learning" to build trading system?
Posted by EspressoLover on 2016-03-29 10:32
> If I have 10 years of historical data I divide it for example for monthly bins, I have 120 bins.

Well you have 120 "atoms", but that doesn't mean you have to use 120 bins. There's tradeoffs to how many bins to use in CV, particularly when it comes to the variance of the estimate of out-sample error. You should google this topic and check it out for yourself. But say you decide to use 5 bins, and your data starts in 2003. Well you could assign Jan-2003 to bin 1, Feb-2003 to 2, Mar-2003 to 3, Apr-2003 to 4, May-2003 to 5, June-2003 to 1, July-2003 to 2, etc.

> In the first post you told that every time you perform CV you choose different bins (random) otherwise you will burn your data with more then few retries.

I think I must have mis-communicated my point. Data gets "burned", in a sense, anytime you or your software makes a choice about your model or model parameters based on the results from a previous analysis on the same data. Any time you interact with the data more than once, you're potentially biasing your error estimates. Binning the data in a different way doesn't really fix this problem.

Say you decide to try a linear regression, look at the results. You're disappointed so try a Gaussian process instead. Results look good and you decide to go with the GP. In some sense you're probably biasing your error. If the Gaussian process results had been unacceptable, you probably would have tried something else, so the performance of your final model is upwardly biased by this sort of online selection process.

That being said if you try linear regression then decide to try a GP you have two options. You could just try a Gaussian process directly and compare the results. Or you encode the selection between the two within each run of the CV. On certain runs, regression might get selected, so this captures some of the noise associated with model selection. That being said there is still some final bias because you did look at the aggregate performance of regression.

Obviously interacting with the data is inevitable in any sort of serious endeavor. Given that, it's a good idea to keep some of the data in total reserve. Don't use it, backtest it or even look at it until you've completely decided on a final model that you're happy with. Then even if you are introducing bias in your error, you're not too worried because there's still untouched data left to get a true out-sample estimate.
how to use "Machine Learning" to build trading system?
Posted by EspressoLover on 2016-03-30 03:07
@Lebowski

Thanks, man! It's nice to know the effort's appreciated.

1) If you're targeting some sort of serial correlation, I'd definitely advocate making sure your "atomic groups" are significantly larger than the correlation horizon you're looking at. This not only minimizes how much the training sets are chopped up, but also avoids potential correlation in the residuals across the bins.

The latter happens because consecutive points tend to have very similar long-run history. If you're looking at lagged returns from the previous month, consecutive days will tend to have very similar independent variables. Dividing your data set into every odd and even day doesn't really produce uncorrelated data sets.

Let's say the longest horizon loopback your using is a week. If you atomically group at the level of 3 months the effect of binning is pretty minimal. Only 6% of your points will touch any history outside the bin. If you had a lot of history, you could set atom size to 1 year, and that falls to 1.5% of points. Assign atoms to bins, not individual points. E.g. if Jan 2010-Mar 2010 is one atom, then every point in this range always gets the same bin value.

2) The chopping effect becomes less, not more, of a problem with more bins. Remember you're training on the complement of the bin. Say we have 100 points and are using 2 bins (disregarding atomic grouping for the example). [1,3,5,7,...] forms bin 1, so you train on points [2,4,6,8,...]. Every point is 'chopped' in the sense of missing its neighbor.

Now say we go the other extreme and use one-against-all (i.e. 100 bins). Bin 57 comprises [57], and its training set is [1,2,3,...,56,58,...]. Out of the 99 training points only 2 are chopped in the sense of missing their neighbors.
how to use "Machine Learning" to build trading system?
Posted by EspressoLover on 2016-03-31 04:44
Well I have very little background in options, so take what I say with a grain of salt. But here's how I would approach that problem. In trading, your actual objective is to maximize PnL, but that's a noisy, discontinuous, largely intractable function to optimize. So instead we use MSE because it's much better behaved and in most cases closely proxies the thing we care about.

But you want to think about how you might modify the objective function to more closely align with what you care about. I think the easiest solution for that type of problem is to use weighted MSE. If there's some method or algorithm that uses MSE, it's usually trivial to adapt it for weighted MSE. You want to put a higher weight on the points representing options that you care about. There's a number of metrics that you might use to scale weight: volume, log-volume, OI, near-moneyness, inverse of spread as a percent of price, etc.

You'd probably want to play around until you find something that looks intuitively correct. But the jist is that far-OTM illiquid, .04delta options are going to have much less of an influence on parameter training and model scoring than the important contracts.
Any paper/book recommendation for mean reversion algorithms (preferably commodities)?
Posted by EspressoLover on 2016-06-17 12:13
This is kind of the authoritative textbook on statical time series analysis. There's nothing trading-specific, but knowing the fundamentals is much more important. For trading-specific stuff just go to scholar.google.com and search for "pairs trading" for an idea of how this type of strategy typically works.

Finally, nothing specific on books but two quick observations. Are you sure the daily prices are snapped at the same time? If you're comparing the price of Heating Oil at 15:00 to the price of Gasoil at 16:00, you're going to observe a mean-reverting process that doesn't exist. Second be very mindful of what your transaction costs estimates will be. Too high and you'll likely never get any trading opportunities. Too low and you'll probably fit a strategy that works in backtest but not in produciton. Best of luck.
Janet Tavakoli, D̲e̲c̲i̲s̲i̲o̲n̲s̲
Posted by EspressoLover on 2016-08-27 21:10
> there are still people who looks on ratings after this usa mortgage backed CDS crisis?

As much as the ratings agencies get criticized I still have trouble imagining an alternative world without them. There's mountains of investment capital that basically needs to be rock-solid safe. Bank deposits, insurance reserves, money-markets, pensions, etc. Without ratings agencies, what's to stop portfolio managers from taking a spin at the roulette wheel. Just bet all those bank deposits from widows and orphans on Wild Bill's Oil Explorers. There's a major principal-agent problem. For the PM, it's heads I win, tails the depositors lose.

It's not like the customers will discriminate against the cowboys. If Joe Sixpack is buying homeowners insurance, he may be able to compare on a lot of metrics. But he's certainly not doing sophisticated credit analysis on reserve portfolios. Yes, when the clients are institutions, rather than individuals, they can be sophisticated. But then you're back to the principal-agent problem for the downstream clients of those institutions.

The only real workable system is regulation requiring the certification of investment safety by neutral and trust-worthy third-party accreditors. For those accreditors to be at least reasonably neutral and trust-worthy, they have to use some sort of transparent and objective metrics. No objective measure of something as nuanced as investment soundness is 100% reliable. And as long as the metrics are transparent then they're probably game-able. So you're always going to get people structuring the crappiest investments possible in just the right way to get just past the line of rating accreditation.

Every-time it blows up, you get a chorus of critics saying "Tsk, tsk. Should have known better. The writing was on the wall." And it's a pretty shitty flaw in the system. But honestly, what else can actually be done? What is the viable alternative?
paper request
Posted by EspressoLover on 2017-05-02 01:02
> Moreover, does anyone have more similar recent papers published by this same quant team ?

Whole bunch of their older research reports are publicly available here:

https://www.google.com/#safe=off&q=deutsche+bank+site:quantresearch.info
paper request
Posted by EspressoLover on 2017-11-23 16:07
@purbani

It's on SciHub. (Always worth checking for these things)

http://sci-hub.bz/10.1080/17415977.2016.1178257
paper request
Posted by EspressoLover on 2018-03-20 14:06
A little bit off-topic for this thread...

But does anyone have a PDF or scan of Modernist Bread by Nathan Myhrvold. I have Modernist Cuisine and Modernist Cuisine at Home, if anyone wants to trade.
paper request
Posted by EspressoLover on 2019-08-26 00:19
Sci-hub, my man.
Best books of 2020
Posted by EspressoLover on 2020-11-25 00:52
The Master and Margarita - Picked this up on a whim, without really knowing anything besides the recommendation of a few lit nerds here and there. It honestly may be favorite my novel ever. Everyone should read this before they die. I wish I could get amnesia, so I could go back and re-read it for the first time.

2666 - Bolano's an amazing writer, and the obvious successor to Borges. He makes the most mundane, irrelevant shit come across like it's the most fascinating narrative since the Iliad. But goddamn, is this book brutal. It makes American Psycho look like Chicken Soup for the Soul. I got about two thirds of the way through and had to take a break. Not sure if I can finish.

Treasure Island - My daughter and I started listening to this, but had to stop because the pirates got too scary. (I'm aware of the irony here.) Anyone have good suggestions of audiobooks for precocious pre-schoolers?
Best books of 2020
Posted by EspressoLover on 2020-11-27 21:27
Thanks guys. Great suggestions. Got Starcatchers for the post-Thanksgiving break, and put a library hold on Alice. Cool
Best books of 2020
Posted by EspressoLover on 2020-12-01 15:07
@Strage

I read the Karpelson translation, just because it's what my local library had. I don't know how it compares to others, but I quite liked it. The punchiness of the prose kinda reminded me of Douglas Adams.
patterns, methods, techniques for real-time trading system client
Posted by EspressoLover on 2015-01-07 22:33
I'd start by reasonably spec'ing what you want. For example you asked if STL maps are an appropriate way to store order books. That depends on a number of factors. How performance critical do you need that component to be? Do you need to track the entire queue per level, or is price/qty/count fine? Do you need to track only the top-X levels or every point? Thinking about your requirements ahead of time will potentially save you a lot of needless work. The best shops use highly-optimized data structures that store full details for the entire book. But that represents orders of magnitude more man-hours than what's needed for an okay-speed, top 5 level book.

My 2-cents is that if you're getting a third-party API datafeed and executing through a broker API, then your network latency is much higher than anything you'll incur from processing. No need to bend over backwards trying to pick the most efficient data structures. I'd say write something that's simple and works, then profile production if performance becomes an issue.
MongoDB for tick data
Posted by EspressoLover on 2015-11-15 23:24
> The following presentation on MongoDB and Market Data of AHL may be of some interest

The speedup numbers they were getting from NoSQL vs SQL are unrealistic (x500). They must have either been using a crappy SQL engine, or their tuning/queries were just awful. My guess is they could have gotten nearly the equivalent speedup, with 1/10 the work, from simply migrating to Postgre and spending a few hours reading the manual. Most likely some engineer at Man just hosed his employer to get Mongo experience on his resume.

When it comes to these types of ambiguous technical decisions, the grumpy neckbeards are usually right nine out of ten times.
On web scraping
Posted by EspressoLover on 2015-11-20 11:14
Instead of Tor, why not just use a VPN? PIA is like $5 a month, and has essentially an essentially infinite pool of IPs, just rotate across the servers. Like DrGrumpy said, don't abuse the Tor network.
D-Wave Quantum Computer Works (Claims Google)
Posted by EspressoLover on 2015-12-10 23:24
http://www.technologyreview.com/news/544276/google-says-it-has-proved-its-controversial-quantum-computer-really-works/

I keep suspecting that this D-Wave thing is our generation's cold fusion. But it seems like the evidence keeps pouring in that it's legit.
Why Agile sucks
Posted by EspressoLover on 2015-12-10 23:45
[Delete double post]
Why Agile sucks
Posted by EspressoLover on 2015-12-10 23:49
Well I doubt that I'm going to make any new general points about Agile. But specific to quant trading, Agile tends to be a good approach. We're not making monolithic software products with a final shipping date. The requirements beyond "make money" are incredibly vague.

Quickly deploying a minimally-viable strategy and learning from its live performance is invaluable. I doubt there's a single successful hedge fund or prop shop running a major strategy that was fully conceptualized before its first trade. Or even getting a basic research platform up and running, then figuring out what features are needed based on the data. YAGNI is a pretty good rule to live by in a low signal-to-noise environment.
Why Agile sucks
Posted by EspressoLover on 2015-12-14 02:44
Several of the brand-name HFT shops have developers with seven figure comps. GETCO back in the day was known to PnL slope equal amounts to strategists and developers (20%/20%).
Scalalab Fslab, functional/quasi functional languages
Posted by EspressoLover on 2016-01-14 00:31
R is already a pretty functional language. Functions are first-class citizens, closures are fully supported and the standard higher-order function set (lambdas/map/reduce/filter/curry/etc.) already exist or are trivially easy to implement.

> I tried building things to replace slowpoke Matlab/R

R's really not slow as long as its vectorized. Neural networks are one of the few things that R is slow at, because they can't be. But you can always throw SNOW on top of your R instance. In most cases adding spinning up more instances in the cloud is probably cheaper than porting code from R to a faster language.
Deep learning
Posted by EspressoLover on 2016-01-29 05:25
Even in chess, centaurs (human-computer teams) still exhibit quite an edge over pure machines. I'd imagine that a deep learning AI is much more "fragile" than the brute force exhaustion AI's in chess. Centaurs should have an even larger and longer-lasting advantage in Go.

Scalalab Fslab, functional/quasi functional languages
Posted by EspressoLover on 2016-02-19 05:30
Why not Haskell? It's blazing fast, easily hooks into C, and the degree of abstraction allows for rapid prototyping in research. Downsides are the learning curve and lack of the great numeric library you get with R/Matlab/Numpy.
Java for finance
Posted by EspressoLover on 2016-02-26 03:42
It's not just finance, Java/JVM is on the upswing everywhere. Mostly a combination of Hadoop and Java8.
Scalalab Fslab, functional/quasi functional languages
Posted by EspressoLover on 2016-03-14 20:09
Yeah, but then you have to use Matlab, which is, by far, the worst of the major numerical languages. Not to mention the licensing headaches if you want to cluster anything.
R vectorization
Posted by EspressoLover on 2016-04-27 21:53
Vectorizing code isn't really that different than functional programming. R can be written in an imperative paradigm, but it's not suppose to be. Good R code should "feel" a lot like Haskell or Lisp.

Make variables immutable. Keep functions under 4 lines. Remember functions are first class citizens. Take heavy advantage of the "indexing" builtins (head/tail, which, order, '['). Learn the idiosyncrasies of all the *apply (tapply especially) functions.
R vectorization
Posted by EspressoLover on 2016-05-03 10:31
The fatal design flaw in the R language is that there's no scalar data type. Every data object is either a vector or a list. In C if you want to pass one number to a function, you just pass the few bytes needed to deal with that number. In R, a single number is a one-length vector. So handling a single scalar requires all the overhead of an entire vector. It'd be like if C++ required every function using an int to use a std::vector instead.

Try if yourself:

> object.size(3)
48 bytes

"Vectorization" isn't any fancy application of the CPU pipeline. It's simply a way to minimize the (unnecessary) overhead of dealing with a vector at every function call. A naive for-loop has to "re-vectorize" every time you index a single element. The *apply functions avoid this by being implemented in R's C internals, where they can deal directly with the raw primitives.
R vectorization
Posted by EspressoLover on 2016-05-05 10:01
@jslade

Although R suffers from the standard loop-reinterpretation overhead, that's not the main drag on performance. On non-vectorized problems, R still runs an order of magnitude slower than Python or even Ruby. The problem with R is that every scalar has to be represented as a heap-allocated and garbage collected vector with 40 bytes of vector overhead. If you write a function like:

for i in 1:length(n) { sumTotal += increment(n[i]) }

You have to call a heap-allocation, vector constructor, vector destructor and garbage collection N times. CPython may re-interpret the loop internals N times, but it still saves a lot of performance by simply passing n[i] as a scalar onto the stack. This paper has a good summary of these issues:

http://r.cs.purdue.edu/pub/ecoop12.pdf

"There's fancy interpreters... you effectively have to write vectorized code"

That's actually not true with modern interpreters. PyPy has very good performance on loops. In most programs the vast majority of loop-interpreter overhead is heavily concentrated in a handful of loops. A simple heuristic of pre-compiling "hot loops" after ~20 or so traces tends to be extremely effective and robust across a wide range of code. In most programs this results in avoiding 90%+ of loop re-interpretations, while only having to pre-compile <10% of the source code. The below essay on the design of PyPy has a very good summary of these and other issues related to interpreted language performance:

http://www.aosabook.org/en/pypy.html
Data frame specific storage format...R & python
Posted by EspressoLover on 2016-05-14 20:11
>  if the csv reader/writer gizmos in R and Python weren't so abysmally slow.

If you're dealing with pure numeric data: scan() -> matrix(). As long as you're using gz'd data, it's very fast. read.csv sucks mostly because every time it reads a row it stores strings rather than numerics. If you pre-specify the type during reading (like scan does), the parsing itself is very little overhead relative to pure binary data.
Java C/C++ local inter-process communication
Posted by EspressoLover on 2016-06-29 12:25
How structured is the data? And how coupled do you need the two codebases? Are you just handing off a big chunk of plain-old-data between the two systems? If you're just sending a whole bunch of numeric and text fields in standard layout, I'd consider just using Unix pipes and a plaintext CSV representation. It's definitely not as efficient as binary, but both C++ and Java are pretty f'in fast at text formatting/parsing.

If it meets your performance requirements, then plaintext piping is way more maintainable and lower overhead then having a dedicated message broker, dealing with clients/servers, etc. If that's too slow, I'd suggest looking into Redis pub/sub. With ZeroMQ/RabbitMQ you're dealing with more headaches for features your probably don't need in this context, like persistent queues.

If your data's really structured, or you really need tight coupling between the C++ and Java code, then Avro. But if you're just going for repeated passes of the same object, then this is way more overhead then you need. Coupling two separate codebases in completely different languages makes maintainability challenging.
to my high freak friends.
Posted by EspressoLover on 2016-07-25 21:20
Regardless of the approach you take, but particularly so with busy-spinning, you should consider scheduler affinity to pin a process to a single core. The latency reduction can be pretty substantial.

https://news.ycombinator.com/item?id=4893866
The D Language
Posted by EspressoLover on 2016-07-31 16:40
Using a singleton for a database connection is definitely a code smell. There's a number of good arguments why you should treat db conns as instances on a per-object basis:

1) Mocking and unit-testing is much easier to handle.
2) What if you want an application with multiple simultaneous connections?
3) Some database drivers are not thread-safe out-of-box and you're more likely to get in trouble with the singleton.
4) Dependencies between components are less transparent. That makes the architecture a lot less maintainable.

Not that the approach is always inadvisable. Particularly if you're rolling out a quick and dirty solution. But I'd say singletons for DBs (or configs) are at best a convenient evil.
The D Language
Posted by EspressoLover on 2016-08-05 05:57
Fair enough.
Does anybody here use MPI?
Posted by EspressoLover on 2016-08-18 22:57
I'd recommend considering whether you really need MPI. The interface really operates at a pretty low-level abstraction, requiring a lot of boilerplate code and tedious debugging. Alternative frameworks often "just work" in a way that MPI doesn't. The lack of fault tolerance in particular can be really painful.

I'd say go for Hadoop if you can frame it as map-reduce. Spark if it's easily parallel but requires iteration. If you really need a messaging-oriented design, then nine times out of ten RabbitMQ+cluster-orchestration is going to be the better choice. If you're starting from a fresh codebase Erlang's not a bad option either.

The real compelling use case for MPI is if you want to leverage fancy HPC hardware or configurations like Infiniband or exotic topologies. However if you do go with MPI, regardless of the specific implementation I'd highly recommend scheduling with Slurm.
OneTick
Posted by EspressoLover on 2016-10-11 23:04
Throw out MongoDB from the candidate set. Bringing MongoDB into a tech stack is like starting a coke habit to enhance workplace productivity. It seems really effective at first, and for a small subset of users it continues to work out fine at long term equilibrium. But most people will find that, the once minor annoyances become increasingly problematic, while the benefits seem to evaporate. At which point, extricating it from your life is painful, messy and difficult.

On the flip side, why not consider Postgres? It's fast as hell, very well documented and tested, and can easily scale up on WORM-like market data. It has minimal time-series specific support, so certain queries definitely run less efficiently. But you can buy a hell of a lot of EC2 instances for the cost a kdb license.
How to build a software library?
Posted by EspressoLover on 2016-10-21 21:52
I'd recommend picking up a copy of Code Complete, which is pretty much the flagship text on software architecture. If you have time after that I'd read The Pragmatic Programmer and Clean Code.

Finally, a piece of tangential advice. If possible I'd write as much of the library *not* in your firm's internal language. If you can, put as much logic as possible in some major language(s), e.g. python, Java, etc. Then just keep the internal language interfaces as close to a thin wrapper as possible. Finance's strange obsession with internally developed languages is one of the worse cultural pathologies of not-invented-here syndrome. I say this for a number of reasons.

Major languages will have much fewer bugs and better documentation, so it's less of a brainfuck when you're chasing down some bug you can't figure out. They have much better libraries and tooling. If you need to do X, you may just be able to import a python module, rather than building basic functionality in the internal language. Finally major languages are going to have long-term support. Python isn't going anywhere, and updates and patches will be continued to be released until the sun burns out.
Big Data and Deep Learning, a technology revolution in trading or yet another hype?
Posted by EspressoLover on 2016-10-26 00:58
@jslade

At the risk of rehashing our previous debate about the merits of deep learning... While I'm pretty skeptical of whether deep learning has any real application for trading, the general hype is still pretty justifiable. It's definitely arguable whether the deep-net victories in supervised learning represent a categorical improvement, or just a reflection of more invested effort and tuning.

But the killer app isn't classification, it's fantasizing. As quant financiers, we generally (with good reason) ignore the generative in favor of the discriminative. However it's hard to argue that deep-nets aren't a quantum leap forward when it comes to fantasizing highly structured data-sets. Not only are there significant improvements in the distributions, but sampling on deep nets is really easy.

In ten years, it's pretty feasible that some variant of recurrent-nets will be able to generate spit out mediocre sitcom episodes or formulaic pop songs on demand. That's a *really* big deal.
Probabilistic Graphical Models
Posted by EspressoLover on 2016-11-04 23:40
I'd be curious, from those who've had good luck with PGMs, what the context was. My general experience has been pretty disappointing in alpha/trading type problems. Financial returns are just too noisy. You just don't get that much out of the hidden state. The conditional distribution of the latent variables never really differs that much from the unconditional diffused state. So you add a lot additional complexity for little improvement over OLS or some other simple, direct model.

But then again, I'm willing to admit that my experience isn't universal, or that I'm missing some important technique or insight.
database capacity & where you store (home PC or cloud)
Posted by EspressoLover on 2016-11-05 23:53
Store the data local to where you need it. If you're doing research on machines at home, then store the data at home. If you're using EC2 for research, then store the data on AWS. That cost estimate seems way too high. I can't see that data-set being more than 500 GB after compression. That's only $15/month on S3.

S3 transfer IN is free. As long as you're doing research at AWS, then why would you need to have any significant data transfer OUT. Major data transfer should really only involve colo->cloud (live capture to market data archive). So that's covered under free IN transfer. I can't imagine an application where the trading system needs local access to the entire historical data set. Cloud->colo should only be small caching transfers. Maybe a few dollars a month in transfer prices at most.

If you're going to go home server, then just get a file server with the bare minimum on CPU, memory, everything besides disk. This is just a file server for WORM data that's primarily sequentially accessed. You don't need anything like a traditional DB. Get a cheap 4U+ like an old-gen 2900. It's cheaper to buy more smaller disks and RAID10 over them. Since it's in your house, rack-space isn't a constraint. Pay up for SAS reliability, because swapping out failed disks gets old fast. But you don't need anything more than minimal rotation speed. A single user running backtests hardly uses any I/O. Backup to Glacier, as that's by far the simplest, cheapest reliable method.
Probabilistic Graphical Models
Posted by EspressoLover on 2016-11-07 01:33
Cool, thanks. That makes sense.
Storing EOD data PostgreSQL
Posted by EspressoLover on 2016-11-26 17:57
FDAX's schema is great, if you decide to go forward. But I'd ask yourself first, do you even need a database? If so, why? You could end up doing a lot less work by skipping points 4) and 5). Just store the raw (compressed) CSVs, and use the scripts from 2) to perform the calculations on demand. If you need point 3) just use GNU join. Pipe the calculator scripts directly into the consuming application. Yes that means you re-run the calcs on every access, but processor power and SSD access is *way* cheaper than developer effort. No need to deal with any of the headaches of a persistence layer.

I'd really doubt if you need SQL. Do you need ACID transactions? Indexing? Scalability? Probably not. It's a WORM dataset small enough to fit into L3 cache, with a max of under a hundred dozen simultaneous readers. When in doubt - KISS.
Order Book Visualization
Posted by EspressoLover on 2016-11-30 20:30
Anyone know of a decent, open library for order book visualization. Particularly one that has good support for highlighting the specific position of (real or simulated) orders across time. I'm more concerned with support for historical playback rather than live trading.
Order Book Visualization
Posted by EspressoLover on 2016-12-01 13:20
Thanks for the suggestion. I'll give it a try. It may not be perfect, but still seems to have some potentially useful tools.

Ideally in my head, I'm visualizing the level book view that trading platforms give you. Then inside that some indication of the queue structured. Particularly the FIFO/matching position of the orders that are tagged as relevant. Put that all on top of a Netflix-style playback scroller, so you can arbitrarily rewind, slowdown or jump to specific times. Maybe have the option to select events you want highlighted on the scroll bar. E.g. bid/ask changes or trades over a certain size.

Finally you don't want to miss some 2ms flurry, so switch the option to scale playback either to real-time or event activity. But also if you're in one time space you want to be aware of the passage of the other, so use some sort of thermal color indicator to indicate how fast things are moving. Like if you're in event-time, and events are coming in every 100 microseconds, the real-time indicator should be blazing red hot.

Okay, at this point I'm just fantasizing. But man, this sounds awesome. Someone should build this! Wouldn't even be that hard on top of Qt. I wish I had an intern...
Order Book Visualization
Posted by EspressoLover on 2016-12-05 16:22
@radikal

You may have a point here. Reading logs isn't that hard. Replay visualization probably doesn't add that much more insight than a few well-thought out R/pandas commands and plots. Thanks, for the insights.

@darkmatters

Awesome. That looks pretty promising, and seems to potentially cover a lot of what I'm looking for. Thanks!
Big Data and Deep Learning, a technology revolution in trading or yet another hype?
Posted by EspressoLover on 2017-01-06 10:43
@rashomon

> You can have a regular statistical model "fantasize" by resampling with appropriate weights

Well, I'm assuming we're talking about generative models here. Remember many of the common models are purely discriminative. Regression, trees, SVM, random fields, vanilla nets, etc. There's no way to assign weights to points in X.

But even compared to other generative models, deep learning still has a sizable advantage. Particularly with very high-dimensional data, like images or speech. While you can still theoretically apply Gibbs sampling or really any MCMC algorithm, the process becomes computationally intractable. In high-dimensions most of the probability weight lives on a thin shell. The proportion of random steps that miss goes to 1.0, and the mixture time is O(2^d).

In contrast RBMs contain some nice properties which make (approximate) sampling dirt cheap. Those properties can pretty much be extended to any other type of deep-net.
Checklist - Open Source
Posted by EspressoLover on 2017-01-27 15:30
Redmine. There's a plugin for recurring tasks. There's no explicit hosting service, but there's a free AMIs on AWS. A t2.nano instance is fine; 10 GB is more than enough storage. So you're only talking about ~$6/month.
Yahoo Finance
Posted by EspressoLover on 2017-05-17 21:36
Anyone know what's up with their historical data CSV page? The original URL (icharts.yahoo.com) has been unresponsive this entire week. It's seemed to move to a new format (query1.finance.yahoo.com). However this now requires a GET param cookie (?...&crumb=[key]). Can't figure out how to get this working in curl. So...

1) Does anyone know if the icharts.yahoo.com is permanently discontinued? The page says "undergoing maintenance", but there's no other announcements from yahoo.

2) Anyone figure out a hack for the crumb= parameter to get it working outside the browser?

3) Any other alternatives for free historical interday data, which gets updated daily? I'd get a running CSI subscription, but I'm a cheap bastard.

Yahoo Finance
Posted by EspressoLover on 2017-05-17 23:46
Thanks! That worked for me.

Here's a shell script version in case anyone's interested

Yahoo Finance Bash Script

Edit Addendum: Heads up, the new CSV sheet appears to have reversed the time order from the previous one. Also some symbols now include rows with null values if you go back far enough. So make sure your parser handles these conditions.
Learning Linux
Posted by EspressoLover on 2017-09-04 21:50
Mint vs Ubuntu effectively reduces to the question of GUI: Unity vs Cinnamon. That's pretty much just a question of personal preference. There's no single right answer.

My suggestion, before going through the hassle of installing, try each one in a VM. Get a feel for how you like each desktop interface, then decide based on that.
any time-series database opensource projects with any inertia?
Posted by EspressoLover on 2018-01-08 11:30
Without knowing your specific use case, I'm 95% certain that Postgres will work fine for you. Unless your data's huge or your queries are very bespoke, you probably don't need explicit time series support.

Seconding, @jslade, there isn't a good open source TSDB. Which is curious, because it really isn't that hard to design a good one from an architectural standpoint.
What stack are you using for Logs/Monitor/Alerts
Posted by EspressoLover on 2018-01-08 11:53
I'm curious what other NP'ers who are running automated trading systems are using when it comes to logging, monitoring, and alerts. I'm poking my nose in this topic, since I want to upgrade my current setup to something shinier. I haven't really put much effort on this side of things. Up until now, I pretty much get by dumping output to stdout, piping to log files, then just regularly checking things with grep/sed/awk by shelling into the production machine.

However, I have a baby at home and am doing a lot of trading in a different timezone. So, I'd like to make it easier to step away, plus offload some of the responsibility to a non-technical person on my team. It'd be interesting to hear what other solutions people are using in this area. Particularly any good open source or relatively cheap software that can just be plugged in and turned on. It's hard to do research in this area, since everything's so web-dev focused. Off the top of my head here's a rough outline of what I'm looking at (critique or suggestions definitely welcome):

- Log in application to syslog (instead of stdout)
- Logstash for sync'ing logs from prod to archive
- Nagios to let me know if the server blows up or quoter dies
- Logstash/Splunk to pub/sub trading events from the quoter output
- Pagerduty to blow up my phone in case shit hits the fan
- Some sort of web frontend for easy monitoring: refresh PnL, positions, trades, other strat-specific stats.
- Bonus points if that frontend could also plot intraday PnL, etc. Unfortunately can't really find any good type of project that does this out of the box. Would be nice if Graphite or Kibana could be easily shoehorned into doing this...
Monorepos vs Multirepos
Posted by EspressoLover on 2018-03-03 23:22
Curious to hear what the phorum has to say on the topic of monorepos vs multirepos. I.e. do you put your org's entire codebase into a single VCS repository? Or do you subdivide into modular projects and libraries that are separated into independent repositories? Maybe you split the difference and use some of federations of multiple repos. There seems to be a growing field of tooling to allow for tighter coupling between multirepos?

I've always been a subscriber to the multirepo camp. I think it's a natural extension to the Unix philosophy of self-contained systems with small surface area that do one thing and do it well. I think it also promotes modularity and good software design hygiene. You're forced to think about projects as standalone products, rather than just bespoke sub-systems. It also just seems to fit in better with git and its workflow. Why have a tool that allows for light-weight, flexible repos if you're just going to throw everything in a single behemoth.

However these days, it seems like a lot of smart people and companies are on board the monorepo train.
Google, Facebook, Twitter and Netflix all seem to use monorepos. The arguments are fairly compelling. Easy dependency across the entire org's codebase. Less overhead when making changes to an external facing API. Single view to access all code at once. One-button builds and environment setups. A lot of this you can get with smart tooling around multirepos, but at that point are you just wasting your time?

There's a lot already written on this topic all over the internet. But I brought it up for discussion here, because I'm curious to get a finance-specific viewpoint. There's a lot of considerations that may be specific to our industry, or at least more pronounced:
- Proprietary and confidential code. If I have a contractor writing a market data parser, I don't necessarily want to give him access to source code for all the alphas.
- A lot of code that gets written for some tentative purpose, used a few times then thrown away. The traditional software industry benefits from more focused development. Here's the product specs, now write something filling that. Finance people want to quickly test and iterate through strategy ideas that start nebulous and morph into something completely different.
- A lot of code that gets written in weird languages, niche frameworks and bespoke stacks. There's way more freely available tooling and dependency managers for your typical webdev.
- Fast release cycles. A lot of patches that get done when things are on fire.
- Dozens of other considerations I'm sure I'm missing.

So, what are people's experiences and opinions on this topic. Even if it's just half-baked, I'm definitely interested in hearing any perspective.
Need an excuse to learn Fortran
Posted by EspressoLover on 2018-04-04 14:57
+1 to Patrik's suggestion

Fortran can definitely have higher numerical performance than C. (Or at least naive C not optimized by some guru.) But most everything where Fortran outperforms already has pre-existing libraries like BLAS. Fortran is pretty terrible from a language standpoint, and writing it is definitely a lot more painful than even idiomatic C. Before proceeding, I'd also profile to verify the assumption that the performance bottleneck actually lives in the numerical computations.

Another approach might be to consider just offloading the foreign language calls as a REST API, and just run as a separate process. Obviously this doesn't work if you're calling a thousand times a second. But if you can refactor to batch the calls, then network latency becomes de minims as it's amortized.
Becoming a Low Latency Java Developer
Posted by EspressoLover on 2018-05-10 16:28
With the caveat that many smarter and more successful people would disagree with me... why the fuck would you pick Java for low-latency in the first place? Yes, it can be tuned to be nearly competitive with C/C++. But the whole raison d'être for Java over C++ is the streamlined transparency. Garbage collection, memory safety, the JVM, WORA, simplified builds, everything's an object, autoboxing, etc.

That stuff is great, because it lets you disregard a lot of mental overhead, and just have the language/runtime take care of it. E.g. no need to worry about X gets freed because the GC handles it automatically. But when you're trying to squeeze out microseconds, those abstracted convenienced become major pitfalls. Not that you can't develop around them to achieve low-latency, but you have to spend a lot of effort doing so. E.g. how do I avoid triggering the GC in the middle of the hotpath or how do I make sure my JVM isn't interpreting bytecode during critical sections.

Rather than reducing the developer's cognitive burden, these feature are now increasing the design's complexity. You have all the challenge of low-latency programming in a bare-metal language, plus the headache of wrangling Java's bullshit into a context it was never meant for.

Yes, C/C++ is inherently unsafe. But if you really need Java-like safety and memory integrity, I'd say go with Rust. The abstractions in Rust are much more simpatico with the needs of the typical low-latency application.

As to your question about what to study, if you don't have a CS background, I'd suggest starting with the lecture notes or textbook of intro courses in algorithms, OS, compilers, and architecture. Yes, there are specific tricks and hacks to achieving ultra low-latency that won't be covered by these general classes. But 90% of latency is just good design principles, and awareness of how computers actually work. Without this foundation, you're not going to understand the logic behind the hacks anyway.
Becoming a Low Latency Java Developer
Posted by EspressoLover on 2018-05-11 09:42
I am not a Java expert, but my understanding is that C4 is far from a silver bullet. Pauseless GC is only achieved through encapsulating all objects in read barriers. (The original Azul product had to use custom hardware because of this.) This leads to all kinds of funky performance issues. Code that should intuitively be very fast can run slow for very unintuitive reasons, often during unpredictable corner cases. Allocations also become a lot more expensive with read barriers. Finally keep in mind that while C4 GC is pauseless, the Zing JVM as a whole still has pauses.

I'll stand by my original thesis. If you're already stuck in a JVM codebase, then use Zing. For low latency, it's almost certainly low-hanging fruit. But if you're not, or you have the resources to redevelop, then just use Rust or C++17. The point of a GC is so you don't have to think about it. Even with Zing, we still have to think about GC (and its downstream effects) *a lot*. In contrast, modern-day smart pointers and RAII have pretty much made non-GC memory management painless.
Becoming a Low Latency Java Developer
Posted by EspressoLover on 2018-05-29 18:32
IMHO...

Leaving language design aside and just talking about performance. If your 99%-latency constraints are in low milliseconds, then Golang works great. But if you're aiming for tens of microseconds, I don't see any viable alternatives outside Rust.

Go's GC is fast but pauses are still on the order of 1000 uSec. (Plus you give up heap compaction for this speed.) Go's allocations are also opaque and bassed on esoteric escape analysis. In fact allocation isn't even defined in the language standard, and is compiler dependent. In Rust everything gets allocated on the stack, so reasoning about Rust code performance is pretty much as straightforward as C++. (Not saying this is a free lunch, with Rust you pay for the borrow checker with horrendous compile times.) Then there's the callstack shenanigans that come along with Goroutines. (Which is great for lightweight threading, but sucks for deterministic latency on deep callstacks.)

But the biggest issue is that cgo involves too much overhead to be called in the hot path. That means you've got to throw away your entire C library or any third-party C library you may be using. Any low-latency code path has to be pure Go. In contrast with Rust it's super easy and fast to go back and forth between C/C++. Since Rust has no runtime, it can be seamlessly embedded in C programs and vice versa. Outside of very weird hacks, if you want Golang anywhere, you're required to launch from a Golang main().

Agree about floating point math. Math heavy subroutines should always be in C/C++/Fortran, to take advantage of ffast-math, CPU targeting, and SIMD.

That being said, I don't really know if Rust is worth it. C++17 pretty much gives you all the features being used to sell Rust. As long as you mandate stdlib smart pointers, you get the same memory safety. The only thing really missing is modules. cargo and "go get" are indeed way way way better than whatever two-bit nonstandard options exist in C space.

But the real strength of Rust is the ability to replace C/C++ piecemeal. The best approach is to play around with trying one or two new components or subroutines in Rust, and seeing how it works out. That's going to give you a very intuitive feel for where it shines and where it doesn't. And it's something you can't practically do with Golang, JVM, .Net, etc.
What stack are you using for Logs/Monitor/Alerts
Posted by EspressoLover on 2018-05-30 19:33
Finally got around to doing this properly. It turned out to be a fun little project, and thought I'd add my personal retrospective. Thanks so much to everyone who contributed in this thread. All of the suggestions were great, and even when I didn't use the specific tools, seeing the case studies really helped me build a mental framework for this domain.

Approaching the Problem

Given the Cambrian explosion of tools written by the hordes of hipster devops at SV unicorns, getting started can be overwhelming. Plus add in that the categories between projects can be fairly nebulous. There are lots of products that are both complements and substitutes to each other. There are lots of products that span categories in weird ways, so that if you use X you might not need Y, but if you replace X with Z then you definitely need Y, but can drop T. It makes it hard to plan out a full stack.

That being said, as someone who was a total neophyte I really thought this podcast was a good way to get into the mindset of people who do this for a living. James Turnbull also wrote a book (Art of Monitoring), which was good, but promoted his hobbyhorse project too much.

An important question to keep in mind: are your servers pets or cattle? If you're monitoring a handful of co-located trading boxes, then its definitely the former. And that makes your needs very different than the typical web startup using an elastic cloud. Another is what metrics and processes do you really care about? It's easy to get feature bloat trying to track everything under the sun, but probably not worth the effort. Finally push vs pull-based is one of the big ideological divisions in this space, so at least be aware of the difference and the pros/cons.

Circuit Breakers

This was outside the scope of my project, but I just want to echo goldorak here. Monitors and alerts are no substitute for safe behavior inside the application. When a box is touching real money, it absolutely needs builtin safeguards. Millions can be vaporized in milliseconds, and the best dashboard in the world isn't going to fix that. Bugs are going to crop up and do crazy shit in well under human reaction time. Monitoring only exists to help the system operator cleanup after the fact.

Logging

Ultimately decided that ELK was probably overkill. My chosen solution was a lot more rudimentary. Log output to stdout, save a single compressed flat file per each quoter instance, then stash the individual files in S3 tagged by date, market and server. Anything that needs to be formally structured (like market data or trading activity) can be pulled out in an ETL pipeline at end-of-day.

I think logstash is a great product, but don't think you really need it in a trading context. First you're not running continuously. So, Real-time ingestion isn't really necessary. Formal, persistent and structured data probably is only required after market closes for the day. Second, you probably do everything in a single binary, and don't need to collect data from a bunch of disparate sources and micro services. Third elasticsearch is good when you have a deluge from a bunch of undifferentiated hosts. Something weird might happen and you have no idea where it came from, so need the ability to carefully search everything. In contrast if you want to know what happened with MSFT yesterday, you already know which quoter traded it. Its as simple as grabbing the specific log file from S3. (Or the production machine, if it's before end-of-day rotation.)

Monitoring

Prometheus for the backend, Grafana for the frontend. I'd recommend Grafana hands down. It just works, and produces beautiful, responsive dashboards. Dead-simple to setup.

At first I was hesitant to go with Prometheus. My intuition was that I really need something that was check-based or event-based, rather than metric-based solution. I looked at Nagios/icinga/sensu. Nagios is terrible, and no new project in the 21st century should be using it. The other options are big improvements, but are still working off nagios' core design flaws. Ultimately I think event-based is just a flawed paradigm. Almost all the solutions suffer from some combination of painfully inefficient setup, heavyweight resource usage, ill-defined data models, confusing separation between collection and storage, poor scalability, and/or unreliability.

In Prometheus "everything is a metric", which seems unintuitive at first, but is a more powerful data model. For one that means everything drops out into nice time series that you can throw up on a Grafana dashboard. That makes diagnosing problems and understanding behavior a lot easier. It also means that you don't need to mantain separate pipelines for continuous metrics and discrete events.

With Prometheus you get a pretty rich metadata model and query language that lets you turn metrics into events and vice versa. Sync'ing data between servers is dead simple. Nagios style checks can easily be done with the pushgateway just by piping your pre-existing scripts to curl.

That technique is also what I used to switch over my pre-existing check scripts into the monitoring framework. Rather than adding explicit scrapes to the quoter binary, it was really simple just to have crontab periodically scrape each instance's stdout logs and send the relevant metrics to pushgateway.

There are a few trading specific things that Prometheus isn't designed to natively deal with. E.g. I don't care about alerts that occur outside market hours. The good news is the metadata and query language is flexible enough to handle this. But you may have to scratch your head for a second, and read the docs. Overall Prometheus takes 30 seconds to get running, but there's a decent learning curve for its deeper features.

Metrics

The thing I like about Prometheus is that both the time-series database and collection agent live in the same system. If you use Graphite or InfluxDB to store metrics, then you probably need some sort of scraper or collection agent like statsD. Consolidating these two functions together simplifies the stack, and allows for richer metadata, and makes collection easier.

What you give up is having a professional-grade TSDB. You could use InfluxDB for long-term persistent storage of data, and even heavy duty research. Like, I wouldn't use Prometheus for backtesting, but I might for InfluxDB.

But for me it was helpful to divide data into two buckets. Use Prometheus to store ephemeral but real-time metrics. Recognize that these are only used for production monitoring, without the need to keep clean or maintain for more than a day or two. Then ingest long-term persistent data into a separate structured solution at the end-of-each day. I think this is a better approach, but you do give up some cool capabilities by foregoing a unified system that ingests in real-time.

Alerts

Pagerduty pretty much just works. I'd definitely recommend, although you have to pay $10-30 per person per month. It's got pretty much every possible feature you'd need. It routes to phone, email, SMS, and slack. It integrates natively with Prometheus, and pretty much all the other monitoring solutions. Straightforward setup. Allows you to pre-define on-call schedules, and set escalation policies or arbitrary complexity. Only criticism is that maybe it comes with too many superfluous features. (I don't really need to see a chart of notifications broken down by phone vs email over time.)

Anyway, unless money is tight, forget about rolling your own alerts scripts. Just use Pagerduty.
Next tech boom
Posted by EspressoLover on 2018-06-29 20:11
Self-driving cars are the wildcard. It's the only tech on the horizon that has a realistic chance of having a meaningful economic impact. For all the hype surrounding apps and mobile and social networking, nothing has actually had any real affect on total factor productivity since email and spreadsheets.

Information technology did substantially grow productivity in the 90s, at almost the same rate as the first and second industrial revolutions. Unfortunately it proved much shorter lived than its antecedents. After the initial wave of wide-scale PC adoption, we've returned to pre-computer productivity trends. The info-tech industry is basically just making new toys rather than new tools at this point.

Self-driving cars could really buck this trend. In the US at least 5% of the labor force could be put out of a job. Not just truck drivers, deliverymen and bus drivers. It also means less auto manufacturing (most people can dispense with owning a full time car), fewer body shops and mechanics (less accidents), fewer ER docs/nurses (less car accidents), a much smaller car insurance industry, less road construction (less traffic), and even less housing construction (people can just move further away, and sleep during their commute).

That being said, I've lately heard a lot of skepticism from insiders about whether the tech is anywhere on the horizon. It might be a lot like how speech recognition seemed deceptively close in the 1980s. It was easy to get the accuracy from 0% to 95%. The threshold for usability is 99%, so it seemed like success was imminent. But the last mile ended up being way harder than prior work.
scipy.optimize.minimize?
Posted by EspressoLover on 2018-07-03 23:23
Assuming the objective function is backtest PnL?

If that's the case your fitness landscape is probably non-smooth. That's because as parameters change, the objective function will vary discontinuously. Trades are included/excluded discretely, so if a param changes by an epsilon, then either the objective function changes by zero or suddenly jumps to a different value. That means any algorithm that uses the gradient would be unstable.

(Aside: Theoretically, you could define your own function for deriving a smooth gradient. The way to do this would look something like change the param enough until the trade set was altered, then interpolating to an epsilon step. This is very likely more effort than it's worth.)

Unless you're compute constrained, probably just stick with Nelson-Mead. Trading parameters have such huge generalization error, that the slight accuracy differences in finding exact local maxima is irrelevant. The only potential annoyance with NM is setting the termination tolerances. You could also try Powell or COBYLA which are also gradient free, but often they require more hand-tuning than NM

The easiest way to handle constraints is just to bake them into the objective function. I.e. if an evaluated point steps outside the bounds then add a penalty of infinity. This isn't the most computationally efficient approach, and you need to make sure your initial point starts inside the constraints. But again, its the route of least headaches.

Also most trading parameter optimizations are non-convex. Therefore you should probably use spicy.optimize.basinhoppin to find the global maximum, not just whatever local maximum is in the neighborhood of your initial point.
scipy.optimize.minimize?
Posted by EspressoLover on 2018-07-04 18:08
I’d consider just ditching scipy.optimize all together. Given how expensive evaluating the backtest is, you need to parallelize and utilize all the cores at your disposal. That’s not really easy to do in python. You’re also outside the realm of traditional optimization and pretty far into metaheuristics. scipy.optimize doesn’t give you a lot in this domain.

An aside on the dimensionality, I’d guess that the parameter space is effectively low-dimension. That is the objective function really only varies along some much smaller manifold embedded in the full high-dimensional space. For any given param-set all that really matters is where it projects to on the manifold. There’s lots of reasons this might be the case: some parameters are irrelevant, some parameters basically have the same effect, some group of params only matter because of their L2 norm, etc. So the search space is actually much smaller than it appears, but we only know how to look in the larger space.

With low effective dimensionality, random search tends to perform pretty well. The space is too large for grid search, but even a small number of random points in the embedding space will tend to pretty closely approximate grid search on the manifold. Random search also benefits from being brain-dead easy to parallelize, even across multiple machines. A final benefit is that you’ll probably get better generalization error.

Overfitting tends to be a bigger issue with peaks in the fitness landscape that are narrow and steep. In one sense, we want to bias towards local maxima that are high-volume, as well as high-height. Random search naturally does this because it’s more likely to land in high-volume regions. So, I’d keep random search as your baseline to beat.

Beyond that the next logical approach would be simulated annealing. Which is basically what basinhopping does, but you want to use a library that you can parallelize across cores, and probably even machines. Again, more compute time is going to trump better algorithms. If you really want to get into the weeds of metaheuristics you could try the various flavors of iterated local search, tabu search, or particles swarms. Honestly though, I doubt whether the potential improvement over random search would be large enough to justify your time.

EDIT: Based on your last post, I'd also maybe suggest starting by using the params you get from the individual fits. Then use those points as seeds into the starting values simulated annealing or some other hill climbing algorithm. In fact I'd use three sources of seed points: completely random points, fully fitted points, and individual strategy fitted points with randomized portfolio level params.
scipy.optimize.minimize?
Posted by EspressoLover on 2018-07-05 19:35
My last post was made in a race condition with @strange. Given the followup information, I'm a lot less confident that he's dealing with an effective low-dimensional situation. Seems more likely to be a situation of (nearly-)orthogonal subspaces along the axis of each sub-strategy.

I do agree with the others, I'd definitely start by fitting strategies individually, then using something like traditional portfolio optimization to combine. The possible gains I see from pooled fitting are 1) pushing the individual strategy parameters to be more anti-correlated with each other. And 2) transaction cost reductions by pushing strategies to interally cross each other.

Both aren't possible with individual optimization because the sub-strategy parameters are already fixed by the time you optimize around any portfolio effects. Kempf/Memmel is definitely a clever heuristic to gauge the impact, but even if it turns up negative, pooled fitting might still produce the above gains.

That being said the portfolio-optimal params probably live in close proximity to their individual optimal levels. So, your best bet is probably starting off with individual fit params. Then doing something like iterated local search on the pooled param set.

Another approach that might work is doing something like coordinate descent. Start off with the individual fits, then do the portfolio fits. Then start with the first sub-strat. Adjust that strategy, and only that strategy's params, with the objective of portfolio performance. Leave all the other sub-strat params fixed. Then do the same with the second sub-strat. And so on, repeating until convergence.

This approach makes it easier to get stuck in local minima and saddle points. You give up improvements requiring coordinated changes between strats. But that being said you drastically reduce the dimensionality at each optimization step, and still get the opportunity to reap most of the plausible pooled parameter gains.

> Everybody talks about this manifold, but who has ever seen one?

Absolutely disagree, how else would you describe an autoencoder? How could image recognition even be possible unless the problem was effectively lower dimensional?
Project Software
Posted by EspressoLover on 2018-08-03 21:57
Redmine's pretty underrated. At its core it's an issue tracker. But also includes support for calendars, Gantt, wikis, time tracking, git integration, todo lists and and forums. Plus a whole galaxy of plugins for pretty much anything else you'd need. It's fairly intuitive and simple to use interface, is rock solid, and has been around forever.

Biggest downside is that it's not as visually pretty as Asana or Trello. Looks like it'd be at home in Windows 95, no slick single-page web app, etc. (It still runs fast and feels responsive, plus doesn't have the typical Web 2.0 page size bloat). Also there's no hosted option, but there are pre-installed AWS images which easily run on a micro instance.

IMO the biggest problem with Asana is that it's overly "todo" oriented. There's no good workflow for the task that I have no firm plans for, but still want to write down if/when someone gets around to it. Like a known bug we've decided to live with for the time being. Or wishlist feature that sounds cool but is only worth implementing when resources are less constrained.

With an issue-tracker like Redmine, I can just spawn an issue and file it away. With a todo-list like Asana, the damn thing sits in your face everyday, and distracts you until you're compelled to get it over with. Business needs often mean prioritizing the 20% that needs to be done to reach 80% functionality, and using Asana/Trello this way just goes against the grain.

I've mostly settled into a polyglot philosophy here. Using different tools for different purposes seems to work pretty well, even though it does add another layer of challenges to keeping things organized. Trello as a type of visual board to help everyone keep the big picture in mind, Asana for active tasks and collaboration because it's a joy to use, and Redmine as a type of mailroom filing cabinet for long-term tracking.

IMO it's better to use the best tool suited for a particular workflow then keeping everything in the org under a unified system. If you need to mix together different systems, either for different purposes or different teams, it's worth it. You give up on having a single source of truth, but the bigger risk is having people not use any project management system at all because it doesn't feel natural.
charts timeseries
Posted by EspressoLover on 2018-08-13 16:55
+1 for Grafana

Look up some of the more popular cryptocurrency dashboards, and my guess is you can find a lot of the functionality you're looking for.
go and rust
Posted by EspressoLover on 2018-09-16 00:13
@gax

That's a fair point, and the team at Clang has done some really impressive stuff with clang-analyzer and clang-tidy. But the nature of C++ is that it's taken some really smart people and some Herculean efforts to get something that falls well short of what you get out of the box with the Rust compiler.

Clang's static analysis runs much slower than the C compiler, produces hard to decipher output, makes a lot of false positives, and still doesn't cover a lot of what should be obvious issues. This is largely because of the preprocessor and the way C++ implements templates (which is philosophically basically an extension of the preprocessor).

From a pure functional perspective, you always can abstract the preprocessor. You don't even need to compile down to IR or even AST. You can just expand the text and SFINAE the templates. But practically now you have something disconnected from the original source. Naively you have to duplicate analysis on every single #include directive, which makes things run unusably slow. As well figure out which imported text represents "user code" and which represents "external code".

Now all of this shit can be mitigated by various clever tricks. And Clang pulls out all the stops. But at that point you're well beyond just treating the preprocessor as a black box. You're really getting into the weeds of routing around all the design mistakes of the preprocessor and templates. Essentially you're hacking a module system under the hood. And even then things aren't so simple, e.g. you #include the same file multiples times, with completely different effect depending on what's #define'd at directive time.
go and rust
Posted by EspressoLover on 2018-09-14 11:25
Go was originally meant to be a C++ replacement, but everyone's pretty much concluded that it doesn't quite cut it as a systems language. It's something to use when you need something a little faster, closer to the metal, better at concurrency, or more prod-ready than python, JVM, or CLR. But you don't want to deal with the bullshit, mental complexity, verbosity, slow compile-times, platform-dependency, and shitty package management of C++.

Everything in systems world comes down to memory management. Using a GC is a non-starter. (To be fair this isn't a consensus opinion, just ask ESR) Even if you make it optional, like D, nearly every single library will wind up having some sort of GC dependency. Rust's borrow checker is fundamentally the best approach, and is basically the direct application of modern PL theory.

That being said Rust may die from the Haskell curse. If a system is beautifly architected but cognitively inaccessible to the majority of programmers, it’ll never get widespread adoption. In particular validating complex data structures with the borrow checker can feel like abstract algebra.

C++17 is nearly on par with smart pointers and move semantics. But safety isn’t enforced by the compiler. (And the preprocessor makes third-party static analysis toothless.) Without that a large codebase is just going to naturally accrue undeclared unsafe sections. Which kind of defeats the purpose of having the guarantees in the first place. Plus even if you do religiously follow best practices, Rust still is safer in certain regards, in particular with concurrency and uninitialized data.

The biggest juncture will be how well modules are designed and received in C++20. The fact that we’re still using makefiles in 2018 is fucking insane. If modules actually work and are used, then we can finally get a decent package manager. It also means that compile times (which have been one of the biggest negatives of all the 0x++ features) will finally start getting faster.

Long-term my hope is that if WG21 gets modules right that’ll build enough buyin to deprecate the preprocessor entirely. With just AST and no text expansion, then enforcing/determining safety on large segments of code becomes much easier. Ultimately the preprocessor has been the most intractable design flaw from the original C++ spec. It’s really at the root of all evil we associate with the language.

Or maybe, we’ll just all end up using javascript for everything, right on down to the O/S
What stack are you using for Logs/Monitor/Alerts
Posted by EspressoLover on 2018-09-23 05:34
@prikolno

Thanks for the very insightful and detailed post. A lot of wisdom from the trenches. Gonna have to digest some of the more philosophical points a little bit more.

But I did want to mention something, which is potentially very convenient if you don't already know...

> you don't want to drop your timestamp calls into the kernel,

On modern versions of the linux kernel, clock_gettime() syscalls never leave userland thanks to vDSO. The overhead is something on the order of 50 nanos depending on your specific setup.
ZFS/dragonfly vs mongo
Posted by EspressoLover on 2018-11-01 20:16
> thoughts on using a … filesystem rather than a db with all its special APIs? 

For batched WORM data like market data, you don't need any ACID guarantees or transactions. Right off the bat that's a major reason to prefer flat files. Every single language, tool and environment can read files out of the box, filesystems are rock solid, POSIX gives you a ton of capabilities, the structure is completely transparent, directory trees can be ported anywhere (including S3), and compression tools are easy to use and highly efficient.

The biggest reason to move market data out of flat files is secondary indexing. E.g. say you slice your data into the format /[date]/[symbol] Then it’s easy to get all the data for one symbol across a given date range, but painful to sequence the data across every symbol for a single date. Databases obviously can fix this problem. So can indexed file schemes like hdf5 or parquet, but you give up a lot by dropping the POSIX, portability, and stability that comes with native flat files.

But that being said, storage is pretty damn cheap. There’s nothing inherently wrong with duplicating data, as long as you’re hygienic about keeping a single source of truth. You can easily slice and store the same data multiple ways corresponding to different indexing schemes. The marginal cost of a GB is like $0.25 on even the highest end drives.

> ZFS + distributed filesystem

ZFS is really cool technology, but its experimental and unsupported status in linux probably doesn’t make it worth it over ext4.

Most run-in-the-mill needs are more than met by a linux NFS server on 10 Gbe with RAID10 and a couple dozen SSDs. Unless the I/O rate is super-high, or a single instance of the app requires more than one rack of nodes.

If you’re scaled out of NFS, and talking about filesystems distributed over multiple hosts, most of the technology is pretty lacking. Lustre’s the only decent option, but requires custom kernel support. GlusterFS is just terrible and unreliable. And I’ve heard similar things about using Ceph on the filesystem level. Fundamentally, I don’t think it’s possible to make a decent userland filesystem. If you do need that kind of scale, you’re probably better off following Spark’s philosophy and move the computation to the data, rather than the data to the computation.

> I noticed several years ago that AHL was using mongo.

That AHL presentation was so ridiculous that part of me suspects it might be satire. The underlying dataset’s a terabyte. Small enough that their data distribution platform could literally just be USB thumbdrives. Yet for some reason they need a 17 node cluster of high end servers backed by Infiniband SAN.

I’d highly discourage considering Mongo. Googling will review no shortage of issues and shortfalls. There’s maybe a very narrow technical justified use: you can confidently predict that you’ll never ever need even the slightest of relational logic, your data is structured but that specific structure is constantly changing, and you don’t care about performance, stability or integrity. Otherwise, it’s very easy to be seduced into using it because it’s so easy to use in dev, whereas all its warts don’t appear until you’re stuck with it in prod.

There’s maybe a broader business justification that explains its current popularity. That's the stereotypical tech startup with an Instagram-esque growth rate, tons of funding, but a constant shortage of manpower. No need to waste any time because the learning curve is so short and there’s virtually no DBA effort required. As long as you’re flush with VC cash it’s easy to throw more hardware at the problem and horizontally scale out. Eventually you’re going to regret picking Mongo, but by that time you'll have grown into a unicorn with deep reservoirs of engineering talent.
kafka vs zeromq
Posted by EspressoLover on 2018-11-17 03:45
Why not include RabbitMQ in the running? Seems like a potentially happy medium, where it would be easier than Kafka to adapt a pre-existing ZeroMQ system.

> off-site geo-awareness + complex failover logic.

Have you just considered keeping zeroMQ but adding a service discovery layer (etcd, consul, or zookeeper)? Depending on your specific needs, geo-location and high-availability can probably be handled inside service discovery in a way that's agnostic to the underlying message layer.

Any advancement on compiled vector oriented languages
Posted by EspressoLover on 2018-11-21 04:25
If you're not latency constrained, why not just run python in production? Google, Facebook, Spotify and a lot of other major tech firms run python servers. With PyPy performance is reasonably good, probably better than whatever network latency you're dealing with. Containers and/or virtualenv make it nearly easy to deploy as pushing a compiled binary.

Alternatively if you already have low-level code (market data parsing, order routing, account management, etc.) in C++, is there any reason you can't just split into multiple services communicating over IPC. Something like protocol buffers, avro or even just JSON depending how rich you need the interchange to be. If latency's not an issue, then this shouldn't be a problem. Especially if the services all run on the same host.
Any advancement on compiled vector oriented languages
Posted by EspressoLover on 2018-11-21 21:05
I really wouldn't enjoy being stuck in a python codebase at that scale either. In software everybody's situation is different in weird, subtle and unpredictable ways. Who knows what kind of idiosyncratic constraints someone's under.

But in general, there's very little reason in this day and age to be locked into any one specific language or framework. In 2018, there's tons of fantastic tooling and infrastructure that makes refactoring into services simple, safe and straightforward. It's almost a stereotype of Silicon Valley that a company gets launched with a half-baked Rails app then later down the road gets refactored into a polyglot service-oriented architecture.

I'm not even a fan of super-isolated, single-function microservices. But if someone's tired of their aging, ill-suited codebase, then there's no reason they're permanently stuck with it. In the aforementioned python codebase from hell, wait till you're banging your head against the wall working on some sub-component where python's a really bad fit. Re-write it in Visual HaskellLang.js instead of hacking on the pre-existing python. Replace the python module with a dumb client which wraps the API calls to REST, RPC or message queues depending on what works best. Then just roll the existing app and the service together in Kubernetes.

Repeat until it's no longer frustrating working on the original python app. Or until it's been refactored out of existence. Besides migrating to containers, which you probably should be if you haven't anyways, there's no overhead to this approach. The dev just spends his time refactoring instead of polishing a turd. Ops just deploys containers instead of artifacts. End-users see the same frontend and APIs. The primary reason this strategy might now apply is if the original codebase is so tightly coupled that there's no clean way to slice it into sub-components. In which case there's major issues with the underlying software engineering. Switching languages would just be re-arranging deck chairs on the Titanic.

This definitely doesn't apply if you're doing anything HFT or latency-critical. Anything that's ever called in the hot-path needs to live on a pet (not cattle) server, run bare metal, avoid IPC, and have a meticulously tuned API. But let's be realistic, well less than 1% of the code written on Wall Street will ever be invoked in this context.
Any advancement on compiled vector oriented languages
Posted by EspressoLover on 2018-11-21 21:43
But Julia's also dynamically typed. Even though they tend to be correlated, we should remember that typing and interpreted/compiled are two different things.

Going back to the original motivation for the question, there's an inherent tradeoff. Static typing is good for large, complex, long-lived systems with a lot of interconnected pieces where runtime failure is expensive. But dynamic typing makes research easier by facilitating rapid prototyping and flexible coding standards.

If the goal is to unify research and production code, then you inevitably have to come land somewhere on this spectrum. But in its defense, python3's typing module does provide a pretty nice system in this context. It makes it pretty easy to barf out untyped research code, then gradually annotate with it with typing as it gets promoted along to full production. (Though to be fair my understanding is that Julia takes a similar approach.)

Otherwise the alternative is to go with the flow of research and production having in many ways fundamentally different technological needs. Often the best tech stack for one will have major drawbacks for the other. But if that's your philosophy, you're going to have to accept the inevitable burden of re-writes when logic graduates into production. Either way there's no silver bullet, and there's always some bullshit that comes with the research/production dichotomy.
Database management system fit for text mining?
Posted by EspressoLover on 2018-11-27 21:47
I don't do much in text, but seems to me that you should be using Spark. It's raison d'être is exactly what you say, keeps data local in memory through successive transformations. Most of your pre-existing SQL logic could probably be ported over as long as you use the dataframe API (which you pretty much want to anyway for performance.) Your custom functions that can't be easily expressed in SQL, can be done with UDFs, which pretty much let you use any arbitrary Java/Scala/Python/R code.

You give up ACID transactions from moving away from a DBMS. But it sounds to me like you're using this in a research context for WORM data that's loaded in ETL batches. Not as a production system. On Spark's plus side, you get pretty seamless horizontal scalability, so if need be you can just spin up more VMs to make the process go as fast as need be.

I'm not a text guy, so take this with a grain of salt, but you may also want to consider Elasticsearch for the underlying data layer. If most of your queries are searching and parsing text, then performance is going to be a lot better than Postgres. But if you're doing a relational logic, like joins, need consistency, or have significant writes relative to reads, then stick with SQL. 

Any advancement on compiled vector oriented languages
Posted by EspressoLover on 2018-11-28 19:04
@jslade

You've hit peak curmudgeon. I half expected your post to end with an invective against the integrated circuit and paean to the unsung virtues of vacuum tubes.

@rickvik

> matlab, which is faster by orders of magnitude

Matlab's relative speed comes from using MKL. OpenBLAS is quickly caching up, and the relative difference is well less than orders of magnitude. Even if not, MKL is now a free product, and you can build Numpy against it. Numba+MKL will have similar performance to Matlab, because they both compile down to pretty much identical assembly. Personally it would piss me off to pay insane licensing fees to Mathworks based on performance they're getting for free from Intel.

While we're on the Matlab hate train, regardless of its numeric or ecosystem merits, the language itself is objectively terrible. It's what happens when a bunch of scientists decide that PL research is for wimps. There's a reason it's one of the most dreaded languages right up there with Visual Basic, Cobol and Perl.
Any advancement on compiled vector oriented languages
Posted by EspressoLover on 2018-11-29 03:54
keeping it all together
Posted by EspressoLover on 2018-12-09 16:51
Disagree with others. DVCS is a game changer. The only real upside to the centralized approach is better handling of binary assets. But those should live in artifact repositories, not source control, anyway. When you move to git, what you realize is that commits and branches are super light-weight and local. With svn the tendency is only to commit "version changes" with sprawling footprints. With git you tend towards a separate commit every time you change a few lines of code. This just isn't feasible with svn, because each commit changes the repo for everyone else. With git, it doesn't matter if you break the build, because it's only local until you push.

Having much finer granularity on commit history enables all kinds of productivity boosts. git revert essentially becomes Ctrl-Z in your local workspace. You can "git log -p | grep" to effortlessly to exactly when, where and why some change was made. git bisect is literally a one-button solution to diagnosing bugs. Not to mention source control functionality stays completely available even when you don't have an internet connection.

Same story with branching. With svn, branches are a pain in the ass, and you probably only use them for major version changes. With git, I'll use local branches as ways to isolate even the smallest change sets. Let's say I'm working on adding some feature X to the codebase, when I notice some orthogonal refactoring Y that I want to do. Using git branch, you can easily toggle back and forth between each change, keeping the workspaces single-focused and the changelog isolated. If you're collaborating with Alice on X, you can push to that branch to her, then collaborate with Bob on Y. Neither has to worry about the tasks outside their purview.

Specific to your question, it also makes lightweight experimentation simple. You can fork a "skunk works" repo to add some experimental features. Keep it as long-lived as you like, merge downstream changes from master as needed, and selectively promote changes back into master. You can keep master hooked into a CI pipeline, so that you don't have to worry about untested experimental fork changes accidentally polluting the stable codebase.
keeping it all together
Posted by EspressoLover on 2018-12-09 19:01
On the topic of organizing the research process, I'd say there's two major separate challenges. One, is code stability and maintainability, while keeping experimentation low overhead. Two, is data provenance.

Code

From a software engineering perspective, research is fairly challenging. The vast majority of code produced in a research context gets thrown away or never used again. Most regular software is produced against a relatively fix spec. I.e. there's a product spec that calls for X. When we write code to do X, it's likely it or some future version of it will stick around for the life of the product. Against that it's justifiable to keep strict requirements in terms of software quality. Test coverage, coding standards, documentation, maintainability, code review, backwards compatibility, etc.

This isn't really what you want for research. The median line of research code gets written once, used a couple times in the same environment by the same person who wrote it, then forgotten about in a few days. Stability is a much lower priority than making it easy for researchers to quickly experiment with ad-hoc solutions without a lot of formal overhead. Then there's another twist, in that some unpredictable subset of research code will eventually be promoted into production systems.

Dealing with this isn't simple. It's easy to get lazy and avoid good software engineering standards by pretending that core production code is still research. Vice versa, once you've been burned you may go overboard with formal requirements effectively shutting down innovation. Plus keep in mind that in most orgs there's a power struggle between researchers and engineers. At the end of the day it takes honest actors with good judgement to decide when, where and how to vary the standards between different parts of the codebase.

One thing that does help is at least being explicit about it. Keep written standards for different levels of code, with everyone on the same page, and be clear about when code graduates from one level to another. Alice shouldn't have the gut feeling that this is still informal experimental code, while Bob is shipping it in a critical system.

I prefer to keep the division simple, two levels: "skunk works", which is the wild west, and "core", which should always be production safe. Obviously only core should ever be called by core. But even once something's starting to get used across different places in skunk works it should get promoted. Once a sub-project is used outside a single research team, or revisited outside it's original sprint, or grows past a few thousand LoC, or splitting into multiple layers of abstraction, then it should probably be promoted.

YMMV. Depending on your use cases and org's personality a different approach is justifiable. Maybe more granularity than just two levels, or different guidelines around promotion, or some other variation. I don't think the details are as important as just articulating a coherent philosophy that you can justify.

Data

Most research is just churning out all sorts of intermediate datasets and derived parameters. Some of which get used further downstream to make more datasets and parameters. Some are getting pushed right into production. Some are getting put in front of a human researcher who's trying glean an insight or make a decision. It's often not really clear what the end-goal is when you're actually generating the data what it will wind up being used for. The way the data's generated depends on all kinds of subtle structure and logic.

Data isn't like code. It's not self-explanatory. It's just a blob of bytes, and how exactly we created those bytes is not inherently represented inside the data itself. The challenge is to keep a provenance of how the data was generated and what it actually represents. I.e. metadata. Making metadata useful is really tough, especially when the data comes from complex transformations. Metadata could potentially be much higher, dimensional than the underlying data itself. For example fitted alpha coefficients, you might have to track all the parameters used in cleaning and pre-processing, the version of libsvm used, the random seed used in the fit, the date range, the symbol set, all kinds of hyper parameters, etc.

The less data you need to provenance the better. Ideally the only canon would be ground truth data (e.g. raw capture logs), and code that's sub specie aeterni. Derivations of data (including parameters) are done on the fly as needed, and discarded after they're finished. That doesn't mean that you can't keep derived data cached, but it's treated as scratch work, rather than a canonical source of truth. As soon as there's any question about where it came from (e.g. was this made with the most recent version of the library), you just discard and regenerate, rather than trying to investigate the origins of the current dataset.

There may be certain barriers why this doesn't work. One is computational restraints. If it costs $500,000 in computer time and takes three weeks to fit some parameters, then generating on the fly won't work for you. Another is if the one-button regeneration isn't practical. Maybe at some point in the pipe a human actually needs to use their judgement to make a decision. Or your current software doesn't have the hooks for it (although something like Apache Airflow should make this easy, even if you're wrapping a bunch of disparate clunky systems). And most of the time actual production parameters should be stable, and not reset because of some minor commit in the fit library.

Even then, it's still helpful to focus on making the surface area for provenance small. If you're compute constrained, only canonize the most upstream transformation that's past the compute barrier. Anything downstream derive on the fly when it's cheap. The fewer artifacts that need metadata, the simpler schema you can use. Millions of of artifacts in canon are going to require a machine readable schema with every possible dimensional included to be on the safe side. But for a single production param set, the metadata can just be a plain English changelog. The strategist can just use her personal judgement about when a refresh is needed.
Ray
Posted by EspressoLover on 2019-01-06 17:32
I think Spark's the right choice. Ray's basically designed to be Spark++ in the same way that Spark is Hadoop++. That being said it doesn't really have any lower "software engineering" overhead than Spark, and it's a much less mature, much less widely used product. The main point is to make distributed tasks have less overhead and support more nested dependencies. Basically bridging some of the gap between workflow that's currently in MPI because it doesn't cleanly fit into the RDD/DAG framework.

One thing to consider is maybe all you need is a cluster scheduler, instead of a full-fledged distributed computing platform. The relevant question is how compute vs. data intensive are your individual task workloads. If it's the former, you can basically abstract away any considerations about data-locality. Just pick some centralized store (S3, NFS, Redis, etc.), launch the tasks, grab the inputs, then write back the outputs to the datastore. If your tasks do a lot of compute, and don't shuffle that much data, then the bandwidth+IO inefficiency is de minims.

In which case you can treat slave nodes as fungible resources. Just launch tasks on any node with room as needed. (Plus maybe some resiliency support if your cluster's large enough that the probability of node failure during a job is more than epsilon.) There's plenty of options in this space: Slurm, Mesos, even Kubernetes these days. And after initial setup, they pretty much "just work" as a transparent layer. Set it and forget it.

But if your workflow requires data locality awareness. Then that's a whole 'nother can of worms. Unfortunately there's really no way to just abstract away the distributed layer. You'll always have to spend some mindshare on how the underlying system operates. Either because the platform restricts you to a non-generalized compute paradigm, like MapReduce. And/or because it gives you enough rope to hang yourself, like Spark lineages growing unbounded inside iterative algorithms

Distributed computing is hard. Even on a theoretical level. Trying to pick one platform to rule them all is probably a futile effort. It'd be nice if a single option would cover all possible usage cases that we could imagine. But it won't happen. The better approach is to figure out the requirements of your current workloads and select the best framework(s) suited to that. With full awareness that the choice may need to be re-evaluated in the not-to-distant future.
Apache Pulsar
Posted by EspressoLover on 2019-08-02 22:16
Well, I'd say being hard-nosed practical and gravitating to proven and familiar technologies is rarely a fault in our line of work. Smiley
Apache Pulsar
Posted by EspressoLover on 2019-08-02 17:34
All I can say about Apache Pulsar, is I've heard only good things about it vis-a-vis Kafka. Using Bookeeper to abstract out the state from the broker layer seems like an unmitigated win. It's just a fundamentally better architecture. The biggest downside I see is that Pulsar has about two orders of magnitude fewer contributors than Kafka. I'm always hesitant to put a software project that has a high risk of dying at the core of my infrastructure.

To hijack this topic on a bit of a tangent (sorry maggette)... I've increasingly found myself reaching for a service mesh in places where my knee-jerk impulse is a message bus architecture. There's a lot of hard-earned distributed systems wisdom in the end-to-end principle. To the extent that state and logic can be moved from the core of the network to the endpoints of the network, that almost always reduces the brittleness of the system.

The modern tooling around service meshes makes it easy to replace a lot of the functionality of message brokers in a more peer-to-peer way. To start, it seems like at least half the message buses in the wild pretty much are just used as a form of service discovery. Instead of finding each other, producers/consumers just subscribe to a pre-determined topic. But Istio, or even just vanilla Kubernetes makes this a trivially easy problem to solve. Some other reasons to use a message broker:

* Load balancing. Envoy and/or HAProxy now does a much better job of this than Kafka/RabbitMQ
* Fault tolerance: Kubernetes+Istio can easily spin up new consumers/producers if one dies and handle failover.
* Encryption/Authentication: Handled by Envoy automagically
* Monitoring/Tracing: Istio definitely excels beyond any of the message brokers here.
* Buffering on bursts of data: Client/server requests with exponential backoff (which you get for free with Envoy) often achieves the same goal as middleware buffering for many workloads.
* Asynchronous requests: Most languages have great async libraries nowadays. Plus gRPC and/or Istio can help here.
* Streaming: gRPC/HTTP2 is now quite good at high-performant streaming
* Fanout: Probably the weakest pillar of the service mesh approach. There are some solutions, but definitely nothing as smooth or elegant as message brokers, I'll admit.

Anyway, I don't want to oversell the point I'm trying to make. There's still many many cases where a message bus is the right architecture. But I do think that people tend to have a cognitive bias to reach for a message broker before considering other solutions. It's very tempting to abstract everything into a single core system where all the state and logic can be centrally administered. But most times you tend to pay for the simplicity in the design phase, with a lot more headaches in the maintenance phase.
zeromq and friends for low latency systems communication
Posted by EspressoLover on 2020-06-02 04:04
Low latency means very different things depending on who you ask. To a web guy, low latency might mean 100 milliseconds. To someone doing custom FPGAs, it might mean 100 nanoseconds. If you're the former RabbitMQ is fine, if you're the latter it definitely is not.

The real question is what's the underlying reason you need low latency? Are you trying to achieve latency supremacy in an open market? E.g. Capturing first position in the queue or latency-arbing between venues? Or are you simply trying to hit some pre-defined threshold? Like rebalancing at 1-second intervals or taking advantage of a last look window as a liquidity provider (you mentioned this was FX)?

My opinion is if it's the latter, then you pretty much know your SLA requirements already. Or at least it shouldn't be too hard to figure them out. Once you have those, most of the major stacks have publicly available benchmarks (including the ones you're looking at). So just pick the best fit that clocks in under your SLA.

But if you're living in the Hobbesian jungle that comes with playing the latency supremacy game, then you probably have to go back to the whiteboard. I can't say for sure how competitive your market is, but unless you have some sort of monopoly, I'd assume you have competitors measuring tick-to-trade latency in the tens of microseconds. (Be careful just looking at the spacing of events in the data feed. That's not an accurate reflection of latency requirements since it incorporates both client- and exchange-side latency. Plus if I recall a lot of FX ECNs coalesce events in the datafeed.)

If you're aiming to be the fastest gun, I think the architecture's a non-starter. The problem is that you're introducing unnecessary hops between machines in the tick-to-trade path. It doesn't matter what the transport-layer looks like. The guy you're racing against is running the whole stack not only on the same machine, but the same thread on the same binary. Secondarily if it's multi-venue, you likely need microwave uplinks to truly compete. Or at the very least colocation at each separate datacenter. Finally if the ECNs expose native protocols, you won't be able to compete while running FIX.

Anyway, it really all depends on what your underlying objectives are. Low-latency is certainly a rabbit hole that you can keep digging further and further to eek out increasingly marginal gains. So, it's important to define a clear picture of what constitutes acceptable performance before you begin.
zeromq and friends for low latency systems communication
Posted by EspressoLover on 2020-06-03 16:14
One approach you may want to consider is off-loading some of the latency-critical, low-level logic to the edge nodes. Especially since you already have the facility to run a full trading stack in-core.

Think about it this way. Right now you have your edge nodes which handle market data parsing and order routing. Behind those sit your strategy engines, which contain heavy-weight logic and make all trading decisions. But I'm guessing there are certain times when the effective trading logic is simple and latency critical. That could be offloaded to lightweight fast-responders that live directly in-core inside the edge node.

The analogy here is autonomic reflexes in the human body. There are certain actions that are so important and time-critical that the nervous system has built in machinery without the latency of conscious processing. Similarly there may be times your strategies are following simple but latency critical directives.

Say you want to buy 1000 lots by joining the best bid. If the price moves, you want to quickly move your order to stay with the touch. That's an easy-to-implement directive, but managing it through the slow strategy-engine path means that you'll wind up with worse queue positions and fills. So instead of directly managing those quotes, the strategy engine asynchronously tells the edge system that its directive is to quote 1000 lots at the best bid until filled or the directive is countermanded.

In a lot of ways this gives you the best of both worlds. You can identify the small subset of the most latency critical logic and move it into the edge nodes. Yet you still have complex, heavy-weight, frequently changing strategy code running on segregated machines. You don't have to worry about the performance or reliability of freqeuntly-changing strategy code brining down the entire edge node. Managing strategies as services across the network is a lot cleaner from a devops perspective.
zeromq and friends for low latency systems communication
Posted by EspressoLover on 2020-06-04 19:17
One thing I think is worth doing is reading through the technical documentation for ITCH or similar system. Not because you want to replicate it, but rather because the system architects thought a lot about the problem domain.

Beyond even the protocol considerations there's some fundamental questions that you should think about if you haven't already:

* What do you actually need represented at client-side? Do you need the full order book? Or is a price book, possibly with N top-levels sufficient?
* What kind of time granularity is needed the client-side? Do they need the full unaltered history with every update or are they fine with throttled updates during bursts or even periodic snapshots?
* Do you intend to distribute snapshots or deltas in the messages? Deltas are smaller and faster, but you need a reliable recovery system, because one dropped packet corrupts the book forever.
* Along those lines, how do you intend to deal with recovering from missed messages? Consider everything from a single dropped packet all the way to a 30 minute network partition.
* How do you intend to deal with late-joiners? Do you re-broadcast the full history or consolidate a snapshot? Even if you don't intend to support late-joining, sometimes client systems will need to be bounced in the middle of the day.
* What's the average messaging rate? Peak messaging rate?

ITCH-style multicast works very fast and is pretty reliable when setup correctly. With high-end hardware and correctly tuned settings, packet loss should be under one in a hundred million. That being said the biggest risk of message loss isn't the network stack, it's the client systems themselves. If someone pushes some slow strategy code that can't process peak market data rates, then the best network stack in the world won't make a difference.

To second @doomanx, ZeroMQ is rock-solid and a lot easier to work with than multicast. It won't be the part of the stack to lose messages. That being said with a bunch of heterogenous client systems, sooner or later one will fall-over during peak activity and need to recover. Conceptually this shouldn't look any different than a late-joining client that starts fresh.

ITCH solves this by broadcasting deltas, then having an out-of-band request-reply snapshot service. A late-joiner or recovering client starts buffering the delta stream. Then it requests the latest snapshot over a TCP session. The snapshot is stamped with the sequence in the delta stream. All the buffered deltas *after* that stamp are applied, and the client is then current. This is definitely an approach to consider. You combine a low-latency fast broadcast (say ZeroMQ), with a slow but reliable out-of-band system to let clients catch up (say RabbitMQ or Kafka or even a REST API).

Another option is to bake recovery/durability directly into the ZeroMQ system. That certainly simplifies things by having to only manage one system. You can use the Majordomo pattern as a broker with different levels of durability, but that does add latency overhead. One thing to be aware of is that if recovery involves replaying the entire day, then you need to persist *a lot* of data in the broker. ZeroMQ is not optimized for that much durability. ITCH gets around this by only snapshotting the orders that are currently open in the book. However, you need to build this yourself. No messaging protocol will do this out of the box, since it requires awareness of how to construct an order book.

Finally an option is to just flatten the representation to the point that you don't need historical recovery. Obviously this is impossible with a stream of order-book deltas. But maybe the clients don't need anything beyond the top-5 levels in the level book. Then you can easily fit the full snapshot in a single message packet. If a buffer overflows or client disconnects, then it instantly has an uncorrupted view of the book on the next message in the stream.

Worst case a client loses some history that's its dependent on for trading logic. But if you append a sequence stamp in each message, then the client's aware of this and can act accordingly. Losing a message in a normally functioning client should be a rare enough event that it the occasionally censored history doesn't make a meaningful difference to strategy performance.
administering a busy loop
Posted by EspressoLover on 2020-07-29 16:28
IMO, the best approach is the simplest design that's inside your SLAs. So step 1 is to formally define your requirement. Step 2 is figuring what the easy approach is for your runtime environment. For example the the Java approach will probably cut against the grain in Golang.

One thing that was a little ambiguous: the time-dependent parameters. Do you need to use their time-synchronized value, or can you always use the latest real-time update? So for example, say params-X are injected at T+5 and params-Y at T+25. Then you're evaluating the buffer snapshot-Q at T+24. Because of race conditions, params-Y have already been ingested. Are you required to defer back to params-X because they were active at T+24? Or is it fine to use params-Y, even though they're not formally active at the time stamp? This seems pretty pedantic, but it actually has pretty deep implications on the architecture.

I mostly agree with @nikol and @steamed_hams. Two threads with shared memory is the way to go. Assumptions: Linux C++ with standard library, latest-real time injection is fine, the SLAs are 99th percentile O(10uS) and O(100ms), you're using kernel bypass (i.e. no syscall overhead), param injection times are uncorrelated with buffer update arrival times, and the param struct is O(1KB) or can be updated in atomic chunks of O(1KB). This is all pretty typical for low-latency auto traders, where trading parameters get periodically updated by a human operator.

A ring buffer is an elegant lock-free solution. But to be honest, I think you'll be just fine with mutex locking. Modern Linux uses futexes underneath pthread_mutex. Acquiring an uncontested lock takes less 50 nanoseconds or less. Not enough to meaningfully change performance. Lock contention on the hot thread should be extremely rare. The injection thread can do all its parsing work outside the lock, build a prototype param struct, then only needs to hold the lock to copy the pre-constructed object.

You're maybe talking about ~500 nano contention window per injection. Even if injections occur every 10ms, we'd only expect high-latency lock contention once per 20,000 updates. I.e. not enough to impact 99th or even 99.99th SLA.
Jd database?
Posted by EspressoLover on 2020-08-05 15:11
I'll second @maggette in being a Spark proponent. Especially with the workflow he recommended of using S3 as a backing store. A really nice feature of that is that you're totally cloud elastic. You arbitrarily scale up or down on dirt-cheap preemptible VMs. Run it on Kubernetes with auto-scaling groups, fire off a single command on your local machine, and all the compute allocation is provisioned elastically and transparently. Zero machines to sysadmin. All cattle- no pets.

A lot of people's impression of poor Spark performance comes from the older versions. Modern Spark with Kyro, Tungsten and Catalyst is actually pretty competitive with traditional SQL databases. (And then consider that you can buy 5 preemptible worker cores for the cost of 1 database core.)

If you're getting bad performance make sure to check your query plans and also to add cache() and broadcast() at chokepoints. Also evaluate your RDD partition scheme and move shuffle operations to as late in the query plan as possible. Another thing to keep in mind is that you can always call out to pipe() and run sub-components of the pipeline in other software. For example, I use pipe() to construct order books inside an optimized C++ binary, as that task is much faster than in relational algebra. In the limit case, you can treat Spark as a very convenient cloud-native cluster scheduler.

Embarrassingly, I have to admit I've never used the legendary kdb. Both philosophically and practically, I strongly prefer open-source. But also, every time I see the price tag I just think of how much hardware one could buy instead. If your dataset is under 10 TB, you can easily fit that into the memory of a single cloud instance that costs way less than a kdb license. I can't imagine that vectorized R isn't nearly equivalent in performance, let alone a much more expressive language. But that's just my uninformed opinion. Having never used kdb, what am I missing here?
What does your process monitoring look like?
Posted by EspressoLover on 2020-09-09 14:12
Went through a similar process as you, the details of my journey are here. Still pretty much using the same setup that I left it at the post. If I could distill one piece of advice: pay the $10 a month for PagerDuty. It's way more reliable than rolling your own alerts service.

https://www.nuclearphynance.com/Show%20Post.aspx?PostIDKey=188059
gpu for trading model research
Posted by EspressoLover on 2020-09-18 14:42
Have you exhausted the possibility of squeezing more efficiency out of the existing process? I'd double check to make sure that you're using a parsing library optimized for large datasets. You mentioned using R, fread() is several orders of magnitude faster than read.csv().

On a shitty MacBook, I can parse and load CSVs close to a rate of 1 GB per second. You can provision a 224 core machine for $2.50 an hour. Even avoiding any sort of multi-node map/reduce clustering, you should be able to parse a 1 petabyte dataset in about 90 minutes.

While benchmarks aren't perfect and everyone's workload's different, this is the most fair and comprehensive comparison between the performance of different big data systems. It includes a lot of GPU-based benchmarks. Take a look. It's useful because it should indicate how far away your current process is from near max efficiency. It also should indicate how much gains you can expect from moving to an alternate paradigm.
Hybrid FPGA
Posted by EspressoLover on 2020-09-23 18:56
I've been looking to tighten up the latency for an HFT strat. Of course, FPGAs are always a potential answer when it comes to this question. Rather than trying to scrape the software stack at O(10 uSec) it's tempting to just nuke it from orbit down to the O(1 uSec) that come with FPGAs

I don't have the manpower to re-write the entire quoter in Verilog. But the vast majority of latency-critical events seem to be easy to infer at eval time. Things like new level formation, a large trade, or a tick in the index futures. Most of the complex logic can be pre-computed in the software stack. Then the software just asynchronously hands off flat event triggers to the FPGA.

This would seem to vastly reduce the complexity of FPGA development. 99% of the logic stays in the pre-existing quoter software. In most cases the FPGA just acts as a NIC card, passing along north/south bound packets to/from the CPU. The FPGA only needs a simple hot-path that tests an incoming packet against a set of triggers. And if tripped, inject a pre-defined messages into the gateway session. The FPGA layer doesn't have to build the book or even parse anything but a few critical fields.

All of this seems deceptively easy to implement. Of course, there's nothing new under the sun. The idea seems common enough that more than a handful of vendors already sell products that portend to provide something like this out of the box. It's tough to tell much about these products just from Googling, because they only seem to put up light-on-details sales sheets. Anybody with reviews, positive or negative, of any of the products from this space? Pricing also seems to be completely opaque. Overall my bias is leaning towards build instead of buy.

Anyone with general opinions on this topic who can share? (I realize this is skirting competitive proprietary information.) The paradigm sounds pretty simple in theory. But in theory, theory and practice are the same. In practice they're quite different. There are always hidden pitfalls, when you go from whiteboard into the weeds. In particular, I'd care to hear about any "unknown unknown" that I'm overlooking.

I'd also be curious if anyone's had experience both with this approach, in contrast to putting the full quoter stack entirely into the FPGA. I.e. removing the CPU/software layer entirely. Did you think the marginal gains, either in performance or maintainability, from the full stack FPGA were worth the much steeper development curve?
Hybrid FPGA
Posted by EspressoLover on 2020-10-08 17:55
Thanks so much guys! A lot of great points that I wasn't aware of before. I'm still in the early stages here, but you guys pointed out a lot of things that should keep me from wasting time on a wrong avenue. I'll update the board if/when I have more progress to report.
Hybrid FPGA
Posted by EspressoLover on 2020-10-21 20:46
I erroneously assumed that having an FPGA inject packets into a pre-existing TCP/IP session would be a solved a problem with off-the-shelf products. As it turns out this, this is the "and then a miracle occurs" step in the plan.

The cheapest quote I've found for a TCP IP block is $100k. And there doesn't appear to be any decent open source solutions. It seems like most people in this space run TCP on a softcore or an on-board hardcore. But the consultants that I've talked to are telling me that's 5uS of latency minimum, which nearly defeats the advantage over the most optimized software stack. (As an aside, I'm starting to wonder if putting flat triggers into C code on a SoC SmartNIC is actually the low-hanging fruit here...)

That brings me to my question. Anybody who has experience here with TCP/IP on an FPGA and can share? Even outdated perspectives or vague recollections would help me get the lay of the land.
Apple Silicon
Posted by EspressoLover on 2020-11-11 15:50
Anybody here thinking about the possibility of using Apple's new processors for HFT, algo trading, or research workloads? At the end of the day, these are still consumer products, so I'm not betting money on it. And the first-gen M1 will almost certainly not be competitive with Xeons. But there are some potentially killer features that might tip the scales:

* Neural engine. Could potentially be huge for liquidity-taking strats that lean on more complex ML models. I've never been able to get inference on even modest sized nets under 30 micros on a Xeon. On Apple processors, inference should be near instant.

* Low latency memory. Sure if you're super-optimized, you should barely get any cache misses, but this saves a lot effort spent staring at Valgrind.

* GPU on a low-latency SoC interconnect. Makes it more feasible to call the GPU in the hot-path

* ARM chipset. It's probably not competitive with x86 yet, but theoretically RISC-y ARM desktop processors should have higher potential by ditching all the legacy architecture in Intel chips.

Of course, Apple's not the first ARM or SoC in the datacenter. Nor do I think they'll even explicitly target enterprise. It might also be a dealbreaker if they never put ethernet on the SoC. But given the sheer scale of their resources, and looking at how they pushed performance on the A14, I'd expect them to deliver some really high-powered chips in a couple of generations.

Thoughts? Comments? Insults?
Test Harness Design for HFT
Posted by EspressoLover on 2021-03-29 19:33
I think you may be confounding two separate problems. At least I've always thought of this in terms of two orthogonal applications. One is backtesting strategy. Two is testing/profiling/benchmarking code that will run in live trading.

For profiling, you want to be able to run as close to a full production environment as feasible. (But it also helps to have a reduced stack for lower fidelity but more convenient profiling.) This means matching the real-time cadence of live market data. (Sometimes it's convenient to squash long time gaps between packets, when you know for sure you'll just be idling.) This is where you run integration tests on the application, clock latencies, measure your SLAs, run Valgrind, figure out runtime bottlenecks, etc.

For backtesting, you don't need the full stack. You definitely don't want to touch network- keep everything in a single process. Code your backtest environment so it exposes the same order entry and market data API as prod. The strategy layer should't event be aware of if it's inside live or sim. Pump pre-normalized market data to a synchronous consumer. Latency buffering is handled on the simulator side based on the logical timestamps of the market data- not the physical wall clock. Simulated latencies are derived from the SLAs you measure on the profiling step. (Although always be sure to backtest under a variety of latency conditions.) A running backtest should use 100% of CPU.

The goal in a backtest simulation is to 1) have as much fidelity to live strategy performance as possible, and 2) process at a very high throughput rate. Reconciling sim and live is hard enough, but extra hard when they're actually separate codebases. So, make sure you can use the same strategy core on both sides by keeping APIs transparent. This also helps in terms of throughput, since all the optimization effort put into live code automatically makes simulation faster.

Quantity has a quality all of its own. When you put in the effort to run fast backtests, it has a drastic impact on research productivity. It's really nice to be able to backtest an entire day in a few minutes.
Flat Files vs DB for Book Data
Posted by EspressoLover on 2021-04-30 17:32
How big is the data? On my shitty laptop I can ingest from a gzip flat file at around one million order book deltas per second per core. It uses maybe 70 mb of memory. That's all customized C++ code though.

For order book deltas, it probably doesn't make sense to pay the overhead of a database. There's not really much you can query on. You pretty much always need to grab deltas atomically on any single symbol-date. Therefore the only thing you can condition on is date or symbol. SQL is kinda overkill.

The big decision with flat files is whether you do one file per date or one file per symbol-date. That depends on how wide your universe is. For the former, you may waste time parsing the whole date just to get one symbol's data. For the latter, sequencing multiple symbols into time order has a lot of overhead. Personally, I think S3 storage is cheap enough that I'd just duplicate the data and store both.
Golf balls in a school bus...
Posted by EspressoLover on 2015-11-06 01:02
I'd posit that there's massive overfitting in most people's assessment of "Trait {X} makes for good job {Y}". How many hires does somebody make, let's say the upper end 100. How many potential features are in the selections space? Once you start getting into things like "likes meat", "does Judo" or "built a car", you've probably expanded to sorting through 1000+ potential traits. Even if you're regularizing by trying to sparsity to a handful of common traits, the problem is heavily over-specified.

This was basically what Google found when it used an actually statistically sound method to evaluate hires. Basically hiring managers assessment of candidates had meaningless correlation with subsequent performance. Humans are wired to look for patterns in the tea leaves, so we things random attributes like "stays up late" as deep insights. Doubly so when it comes to interpersonal biases, where we want to favor people like ourselves ("I like chess. He's a great trader. He also likes chess. Its probably because chess makes great traders").

The best approach is to reduce the search space to a small number of objective traits that you already have high Bayesian priors on being important. Intelligence, background, affability etc. Definitely fewer than 10, unless you've personally hired hundreds of candidates. The reality is you probably can't pick out great traders based on their hobbies, eating habits or sleep schedules.
Master's degree but no full-time work experience. Where to start?
Posted by EspressoLover on 2015-11-11 00:34
Depends if you're at a target school. Masters or even bachelors in stats with a lot of CS classes from Wharton, Harvard and Stanford frequently are recruited for top-tier hedge fund and bulge-bracket quant positions.
Master's degree but no full-time work experience. Where to start?
Posted by EspressoLover on 2015-11-11 01:04
You might have better luck in the Chicago prop scene. They tend to be less Ivy-discrimatory. You just have to be a lot more selective and do some background research before accepting an offer. There's a lot of joke shops out there, that you don't want to have on your resume. (There's also a lot of fantastic ones).

The biggest comparative advantage you have from a state school is the much larger size of your alumni network. Your best bet is to reach out to alumni contacts at the places you might be interested in. For example Penn State always seems to have a surprisingly large Wall Street presence, if not just because Penn Staters love other Penn Staters, and there are so many of them everywhere.

Also I'll add that if you've had 5 internships, you should reach out to people from those. Even if you don't want to work for any of those companies, you're likely to have contacts that have moved on to other places, or at least know people. Getting a single recommendation even from a low-level employee, gives you a massive advantage relative to an unsolicited resume drop.
Master's degree but no full-time work experience. Where to start?
Posted by EspressoLover on 2015-11-11 09:02
> By the way, UPenn and Harvard do not have strong CS programs

I don't disagree, but that's more relevant for a PhD than a Bachelors. General prestige school prestige matters much more than specific department strength at the undergraduate recruiting level. A CS major from Harvard is at least at order of magnitude more likely to get a front-office quant-ish job at Goldman or Citadel than a CS major from UIUC.

> [Y]ou don't usually take non-Wharton classes at Penn because of constraints about elective coursework and credit.

On this point, I think you're misinformed. Many Penn undergrads dual-major across the Wharton and Engineering schools (including Stats/CS) and graduate in four years.
Living in Bermuda
Posted by EspressoLover on 2015-12-21 19:27
"As investigated by attention restoration theory, natural environments, such as forests, mountain landscapes or beaches, appear to be particularly effective for restoring attention, perhaps because they contain a vast amount of diverse, relatively weak stimuli, thus inciting the mind to wander freely while relaxing its strict focus.[10]"

https://en.wikipedia.org/wiki/Directed_attention_fatigue

Living in Bermuda
Posted by EspressoLover on 2015-12-23 13:57
@aickley

It's not the cold so much as the lack of bright sunlight. If you move somewhere from 2000 to 3000 hours of sunlight a year, the difference in background happiness is very noticeable. (It's particularly true for dark-eyed people, who have sunlight thresholds than light-eyed people.) In fact the impact is so large that bright-light therapy is now starting to be used even for non-seasonal depression.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3695376/

http://www.ncbi.nlm.nih.gov/pubmed/11816051

http://time.com/4118775/light-therapy-non-seasonal-depression/
URGENT Advice
Posted by EspressoLover on 2016-05-20 21:53
Sorry, what does MV job mean? When you say regional internet company, that means non-US? Here's my two cents, take it with a grain of salt.

People do move from ML tech jobs to quant research, particularly on the prop side. (A lot of ad-tech work is basically quant trading by another name.) But they typically tend to be from brand-name top-tier tech companies. The main challenge is getting your resume past HR. You're going to get more mileage out of Google or Uber. Assuming you know your stuff, resume doesn't really matter as much as just crushing the white board. The resume's mostly just a filter to get seated for an interview.

So the real question would be are there quant firms that would recognize the new company? If not, you may have to first trade-up to a higher-tier tech firm before moving to quant. Once you're in one serious quant firm, jumping to any other is much much easier. On the flip-side the bank is probably a recognized name, but the work is less interesting and educational. Your whiteboard skills will probably lag. Also, rightly or wrongly, most buy-side quant firms tend to look down on banks.

Usually my recommendation for these types of questions is to troll LinkedIn for people who previously worked at similar roles at similar firms. Where are they today? That will give you a pretty decent idea of the distribution of career trajectories.
Advice
Posted by EspressoLover on 2016-06-03 12:49
Finish the Comp Sci MS. That's an extremely portable degree, much more so that an MFE. It'll get your foot in the door at a lot of different places. Including fields outside finance, if you want options.
Beating the expected returns of a Harvard Law degree
Posted by EspressoLover on 2016-10-15 23:29
Eh, I feel like you're not comparing apples to apples here. First not all, HL graduates go to BigLaw, so already that's a self-selected group. Second the upward trajectory is biased, because there's pretty large attrition between year 1 and year 8. Only a small minority of hired associates make partner. They don't explicitly fire you, but you are "encouraged" to look for other employment starting around Year 5. So most of the people still in BigLaw by Year 8 are "partner material" from the top of their associate class.

That being said if shortfall is your biggest concern then I'd do a Big5 tech firm. That by far is your lowest variance option. Not that Uber or AirBnB's really that different, they're definitely much closer to Google than what any person who didn't live in the 21st century Bay Area would call "startup". Even a large established, hedge fund like Citadel is going to have much more variable compensation. You'll probably fall below your threshold at least one year (even if you double for some others).
Beating the expected returns of a Harvard Law degree
Posted by EspressoLover on 2016-10-17 04:53
> I'm actually trying to anticipate how to keep stable some power dynamics / cultural expectations within my current romantic relationship. 

The findings that show a correlation between higher-female earnings and divorce are primarily driven by chronic unemployment or a college/non-college education gap. I really doubt that whether your combined household income is $600k or $700k is going to be the cause of a divorce.

Anecdotally, I can say a good number of my wife's close friends are single women in the latter part of NY Biglaw associates programs. First off, many don't really want to stay on that track. Most have expressed a desire to switch to a more family-friendly, low-impact job, if/when they get into a serious relationship. It might be worth it to communicate, because it's possible that you're reasoning from faulty assumptions. It's worth it to actually discuss what your joint expectations are, before making decisions that are going to affect the rest of your lives.

Second, I've never heard any say they refuse to date a man who makes less money than them. Yes they certainly want an educated partner, with a stable career and a modicum of ambition. But no, they're not going to refuse to date a guy because he's "only" a software engineer for Google instead of a PM at Two Sigma. Considering that in New York, college-educated 20-something women outnumber college-educated 20-something men 2:1, that'd be a pretty unrealistic standard.

Anyway, I'd stop thinking about your relationship in terms of "power dynamics". If you're in a mature, adult relationship, then you shouldn't have to think that way. It's self-destructive, and constantly framing things that way is a recipe for unhappiness. Your spouse should be a source of support, not a hurdle to overcome. If you think she's going to leave you because you end up making $300k instead of $350k, what do you think is going to happen if you end up developing ALS or Alzheimers?
General advice
Posted by EspressoLover on 2016-12-05 16:40
Re-calculate the historical performance if you had hedged out market beta. (Possibly sector and FF exposure as well.) S&P's been about 1.0 Sharpe since 2013, and you seem to be very long-beta biased. The first question any investor's going to ask is whether this is just glorified beta exposure.

I don't agree with the Sharpe hate. Sharpe's the worst strategy metric... except for all the rest. But the second question any investor's going to ask is what the drawdowns look like. Even if Sharpe's not great, if you can promise small drawdowns you'll get a lot more interest. That's because an investor can just set you up with a tight stop-loss, and mostly not worry too much about strategy internals.

Third question an investor's going to ask is how this would have performed in bear markets. 2013-2016 isn't exactly a diverse collection of market environments. Is it systematic? If so, can you backtest it going further back. If you can get 2008 in your performance, that'd be pretty informative. If you can get September 11, LTCM and the 90's tech bubble even more so.

+1 @Hitman. Speed it up. Your turnover's way too low. You're never going to get much better performance with your current holding periods. And that should be your primary goal. Having a 1 Sharpe strategy is like being the best player on your high school football team. Having a 2 Sharpe strategy is like getting signed to the NFL.
CV/interview advice: whether to mention the track record in PA and how?
Posted by EspressoLover on 2017-02-17 04:43
Why not lever it up more? At 20% return/20% vol, you're way under-Kelly sized. Since you've been running it successfully for 4 years, it's not like you're worried about implementation falling short of backtests. It's also your P/A, so it's not like you have to please dipshit institutional money which fetishizes 10% vol targets.

Consider, if you scale up to 40/40 and keep similar performance, it's very unlikely you'll underperform 20/20 after ten years of (monthly/daily) compounding. And most likely you'll have multiples more ending capital.
CV/interview advice: whether to mention the track record in PA and how?
Posted by EspressoLover on 2017-02-17 19:35
@NeroTulip

I agree with most of your points. We're pretty much on the same page. I think I gave the wrong impression by saying "way under-Kelly sized". It wasn't to advocate full Kelly sizing, just saying that I thought 0.2 (estimated) Kelly fraction seemed too low for OP. For all the reasons you list picking the right Kelly fraction is more art than science.

I suggested 0.4 as a fraction, which I think is justified given OP's situation. He has actual realized live data for four years, which gives a relatively tight standard error of 0.5 on the Sharpe estimate. It's also fairly long enough to reasonably expect this alpha not to disappear overnight. I would hope and presume he also has much longer backtests, and performance is in-line with live trading. Finally carry/trend/vol aren't some weird black-box strategies. They're very well known and studied, and documented to have worked for decades in nearly every single market.

> Blowup risk. People tend to think that Kelly cannot go broke. This is true with discrete distributions, but not with continuous ones. As Kelly maximizes the *expected* growth rate, not all paths avoid the absorbing barrier.

Continuous Kelly assumes continuous re-balancing, so the further away you're from this assumption the less appropriate it becomes. If you're trading illiquid stuff (or illiquid relative to your portfolio size) then it's definitely possible to become stuck in an over-leveraged position after a sharp decline.

However I really don't think this applies to OP at all. It sounds like he's trading major liquid instruments on a retail account. Daily re-balancing pretty much makes it impossible to go broke even at full Kelly size. He'd have to hit a one-day 100% decline to become insolvent. At 40% annualized vol, that's 2.5% daily vol. The insolvency scenario would require a 40-sigma downside move. Even with just monthly re-balancing, you'd still have to pull an 8.5 sigma drawdown. Maybe that's feasible in some esoteric quant-factor, but I think it's extremely unlikely in trend/carry/vol.
Working (and living) in Norway
Posted by EspressoLover on 2018-03-01 14:56
Sure, but how many Winter Olympic medals per capita does your country have?
quant career advice
Posted by EspressoLover on 2018-03-10 16:24
Let's pretend that you invented a perpetual motion machine. How hard do you think it would be to get people to pay attention? Think of how many thousands of cranks are trying to sell perpetual motion devices every single year. If you're in establishment physics or engineering, it's just an endless of deluge of failed promises. If you unmute the noise, even a little, you'll waste countless amounts of time, effort and money.

When it comes to novel systematic trading strategies that deliver actual alpha in real trading, the ratio of real opportunities to quacks maybe isn't quite as small as perpetual motions. But it's close. Let's say that you actually have a gem. How do potential investors not know that you didn't just overfit the hell out of it? How can they be sure that any out-of-sample walk-forward results aren't just made up? How do they know that your backtest assumptions accurately reflect reality?

So, if you did have a legitimate perpetual motion machine, how would you go about getting it out into the world? The first step is to distinguish your device from the endless masses of charlatans. You need to signal, in the econ sense of the world. Meaning show something that would be hard for a charlatan to fake.

Option one: Some real-world demonstration of the perpetual motion machine. Designs on paper are easy. Building and running something is not. If you can start by powering a car or your house with the device, it doesn't prove it with 100% certainty. Charlatans could be employ some Penn & Teller trick. But at the very least it helps you rise above the noise. Maybe if you can get the attention of some people, then you can acquire the resources to run a bigger demonstration and so on.

In quant trading a demonstration is some track record involving real life money. The more money involved the more legit the demonstration. Question number one, have you put any real money into your strategy? Personal money, friends and family, business associates? Number one that shows you at least have personal confidence. And number two live track records are more costly to fake than backtests.

Option two: Acquire some sort of social standing in the physics community, such that people will pay attention to you. If a Nobel prize winner claims to invent a perpetual machine, people will pay a lot more attention than Joe Schmoe. If you really believe in it, that maybe means getting a physics PhD, working in a physics labs, publishing papers on more conventional topics, building up a citation rank, getting a professorship. All of those things confer personal legitimacy. When you've acquired more status in the community, then you can start making extraordinary claims without being dismissed out of hand. You might even have to let a more established physicist take the lion's share of the credit just to boost the idea's credibility.

What does this look like for quant trading? It would mean becoming involved in industry or academia in some way. A PhD in quant trading adjacent field would help. Getting a job at a major firm would help. They're not going to hire you to build perpetual motion machines, but they'll probably hire you to do backend programming. That still builds up your resume and network. After enough years and with enough reputation, people will pay more attention to your backtests. It might involve just handing over the strategy to a portfolio manager, without any guarantees for yourself. If you're a great employee, you may even convince your firm to start by putting a little bit of real-life money behind your ideas. If they do as well as you believe, they'll eventually clamor to put more.
career after quant?
Posted by EspressoLover on 2018-05-21 18:54
> But they'll pay you £50k, not £150k.

The pay discrepancy between tech workers in the US and Europe never ceases to astound me. $67k USD (50k GBP) would be comp for entry level engineers working a 9-5 lifestyle-oriented job in a dirt-cheap market like Boise or San Antonio. A quant exile, who'd be slotted into something like "senior data scientist", would probably make at least $200k total comp in a major coastal metro.

Like, I just don't understand how it continues to persist. Why doesn't every single new startup, satellite campus and corporate IT office open in the EU? I understand there's more labor regulations, a culture of fewer hours, and non-salary costs. But still, these are huge discrepancies. Why wouldn't Amazon just build HQ2 in London or Berlin?
career after quant?
Posted by EspressoLover on 2018-05-22 18:23
Those reasons make sense. But the puzzle is why those things disproportionately affect tech workers? Other high-skilled professionals- physicians, corporate lawyers, accountants, investment bankers, management consultants, business executives, (non-software) engineers- don't have the very large discrepancy. They do make more in the US than the EU, but generally not 100-200% more.

I'll admit I have no direct experience in the European labor market. But in general, labor regs and culture can't be that much of a drag. Depending on the Northern European country, nominal GDP per capita averages 65-110% American levels. Maybe tech requires a uniquely high level of labor dynamism. But if that's so, why's the industry so heavily concentrated in high-regulation California?
Broad Question: Caltech for Aspiring Quant?
Posted by EspressoLover on 2018-10-02 18:19
It's helpful to have some idea behind the mechanics of how the graduate job market works. Most people get their first out of school job through on-campus recruiting (OCR). This doesn't apply to every job or firm, but you're substantially more likely to end up working for a company that recruits at your school. The way OCR works is that AcmeCo will pick a list of "target schools". Each school they add to the list demands time and resources. A group of AcmeCo employees will have to take time out of their busy schedule to travel to the school, run recruiting events, interview candidates, etc. Needless to say the larger the firm, the wider net they'll cast in terms of target schools.

The way firms decide on their "target schools" is basically some mixture of reputation and prestige, previous success (or lack thereof) of candidates from that school, how much a pain in the ass it is to get there, how easy the school's career services make things, and personal loyalty to fellow alumni. You might think the curriculum or rigor or the programs and degrees matter, but it largely does not. In almost any job 95% of what you learn in school is irrelevant. AcmeCo will pretty much just pick a target background based on the belief that the students are intelligent and ambitious, and therefore will probably be reasonably successful at any cognitively challenging role.

I don’t have anything specific to say about CalTech. But if you’re dead-set on doing something in field X, your best bet is to go to a school and choose a degree where a lot of institutions recruit for X at the school and a high percent of graduating seniors in that program get a role doing X. Your best bet is to push the admission and career office at your prospective colleges for concrete information about this.

A word of unsolicited advice, though. I would be wary of focusing on any one field in general at your age. I’m in my thirties and have been doing this for a decade plus, and even I’m not 100% sure that I’ll be in quant finance 5 years from now. It’s good to keep your options open and your ear to the ground. Who knows - maybe your passions will change, maybe you’ll do an internship in finance and hate it, maybe the job market will shift, maybe some new interesting field will pop up out of left field, maybe you'll discover a hidden talent for playing ukulele.

A second piece of unsolicited advice. It’s common for very ambitious people your age to naturally gravitate towards the hardest, most difficult path possible for their college years. It’s the no free lunch intuition. The harder I work, the bigger the reward. Nowhere in life is this less true than undergrad. Often times one particular school or program will have more prestige and better prospects despite being less rigorous, more lacksadasial and easier than its counterpart. It’s not fair, but it is the way things are.

Even if your cost function doesn’t penalize work and stress, at the very least judge your options based on their outcomes not their inputs.
Schengen for non Schengen citizens
Posted by EspressoLover on 2019-01-14 21:32
Aren't most continental universities free or near-free? And many of the classes are pretty much just a final exam, right? So, "enroll" to get a student visa. You won't fail out until the end of the semester, so that's got you covered for a quarter. Probably even two or three, assuming they have academic probation.
Schengen for non Schengen citizens
Posted by EspressoLover on 2019-01-14 21:32
[duplicate]

Edit: Might as well add my $0.02 since I have a duplicate post anyway...

IMO the biggest comparative advantage to the US is affordable family formation. Things like land for a 200 square meter house with a backyard, petrol for your gas-guzzling minivan, domestic help, groceries, plastic junk from Amazon, are all pretty cheap vis-a-vis Western Europe. Plus the IRS throws in a ton of tax breaks for kids.

I would say that regarding mtsm's question, there is indeed a pretty reliable way to have a European lifestyle in the US. Get a security clearance, then become a federal contractor or employee. This should be relatively easy if you're a citizen with any sort of technical background. Pay's decent, benefits are good, stress is low, overtime is rare, and job security's high. The DC area is pretty nice too, plenty of good food and architecture. Downside is that most of the work isn't very interesting, and the upside's definitely limited.
Schengen for non Schengen citizens
Posted by EspressoLover on 2019-01-15 16:09
> If you don't mind Sauerkraut for breakfast have a look here:

"Germany is not only a beautiful country, it is also one of the most powerful countries in Europe."

This is the most German sales pitch I've ever heard.
Which field would you enter if you were to start your career now?
Posted by EspressoLover on 2019-08-02 16:55
FaceApp is a pretty good reminder that it's still quite possible to make it huge in tech without touching the Silicon Valley nexus. All of those downsides are mostly avoidable by being lean enough to do without VC lucre, and the loss of autonomy that comes with it.

It's just tough to forego because the funding flows so freely. But most startups with $3 million seed rounds could pretty much do the same thing by rotating through $20k credit card limits. That probably means paying early early stage employees with actually decent stock options to keep salaries low, relocating outside the insanely priced tech center metros, writing C instead of javascript, and not paying through the nose for hyped up cloud-based serverless crap.

The way it looks, the FANG/VC/SV ecosystem is today's version of IBM circa 1975. Nobody could imagine a future of computing that wasn't dominated by Big Blue. Until it happened, then it seemed obvious in hindsight. Today's tech giants have built Robber Baron like monopoly juggernauts. But at it's core it's all built on top of the most decentralized platform in the history of the world. That's a house of sand if I've heard of one. There's no way that business model survives in the long-run. The walled gardens are a lot more pregnable than they look.

Right now, this generation's Apple Computer is in a garage somewhere. That garage almost certainly isn't in Palo Alto, and Andreesen Horowitz definitely won't be in at the ground floor. If you're young, insanely ambitious, and want to light the world on fire, imagine some version of 2030 where Google, Facebook or Amazon have been rendered irrelevant. Now figure out, how do we get from here to there?
Pushing 40 and moving to Quant
Posted by EspressoLover on 2019-12-17 21:54
I second @gax. Focus more on total comp rather than base. Most high-end jobs load comp in bonuses, stock, deferred, etc. Unless you're a physician, getting a $200k+ salary is pretty unusual. Even CEOs of S&P 500 companies have a median base of something like $250k.
Small team, high Sharpe
Posted by EspressoLover on 2019-12-17 18:06
On the other hand, even if it is selling tail risk, do you really care if you can put it in a thinly capitalized LLC, add a shit-ton of leverage, and constantly pull out profits? At most you can only lose a thin layer of margin capital. Imagine you put $1 in LTCM and every month redeemed capital above the initial principal. Even accounting for the blowup, you would have made pretty decent money.

At one point, if someone's good enough at disguising tail risk, then they'll probably also fool the broker/clearinghouse. At which point the tail risk mostly becomes their problem.

Small team, high Sharpe
Posted by EspressoLover on 2019-12-19 17:05
Fraud requires either material misrepresentation of a fact or failure to disclose duty-bound facts. At least in the US, a client has no positive obligation of disclosure to a broker. Unless you make an outright statemen like "my positions are incapable of losing more than their margin capital", there is no legal basis for civil or criminal fraud. If making excessively risky bets within a corporate entity was illegal, half of Wall Street would be in jail.

If you lose more than the account value, the broker may attempt to pierce the veil and recover losses from the owner and/or parent company. But corporate personhood is extremely well-protected by Delaware and Cayman courts. As long as the entity keeps separate records and bank accounts, it's nearly impossible to pierce the veil outside criminal malfeasance. I'm not aware of a single example in the history of finance where the limited partners were personally liable for the losses of an investment fund. If this was the case, then private equity would be an unviable business model, since the funds, and ultimately the investors, could be made liable for the debts and product liability of their portfolio companies.

To the second point, if you're doing a lot of business you can definitely find some broker to give you a lot of leverage. It might not be a "prestige name", but someone's hungry enough for business to ignore the risk. For example you mentioned strategies that layer 100 levels deep in the book. It's easy to find FCMs that advertise intraday leverage 20x higher than overnight CME margin. That comes out to 500:1 leverage on ES notional exposure. And that's just for the guy off the street. If you're pushing 10k+ contracts a day in volume, most off-label FCMs are happy to give you whatever leverage you want.

But the broader point I want to make, is that even if the strategy doesn't have tail risk, you should still use this approach to quarantine the exposure of any type of black box strategy. Even without tail risk, algo trading systems can easily burn all their money due to simple operational errors. This is doubly true in the context of a third-party manager. You can layer on external risk controls, but all systems are fallible.
starting as a Quant Researcher buy side a smart move? Dying industry?
Posted by EspressoLover on 2020-11-02 16:15
> I have also been talking to a few AI-labs on the IB side, and also to a few tech companies. Salary a bit lesser than QR buy-side

Have you looked at the FAANG companies and the major unicorns. I'd be pretty surprised if they don't beat the finance offers. An ML PhD with coding chops can start as an E5 at Facebook, which comps around $370k. If you're good, and get in a high-impact area like ad targeting, you should be able to hit E7 in four or five years, which comps around a million.

And that's not even taking into account being roguish about it, and flipping jobs every year or two to negotiate better packages. Forget who it was, but one of the senior executives at Microsoft said they're paying top-talent deep learning candidates more than first-round NFL draft picks.

Overall FANG-tech is probably less intellectually rewarding than quant trading though. Depends how much you like the pure life of the mind, versus "getting shit done". Tech work is a lot more emphasis on "doing" rather than "thinking".
starting as a Quant Researcher buy side a smart move? Dying industry?
Posted by EspressoLover on 2020-11-04 17:52
> what's stopping those rockstars from becoming PMs themselves?

Honestly, it's extremely stressful to be directly responsible for PnL. Even if your payout is ultimately tied to a volatile process, cashing the bonus checks every quarter is a whole lot less emotionally draining than watching the day-to-day time series. I think it's relatively rare that you find a single person with the personality confluence of a curious, thoughtful scholar with a cold-blooded calculated poker player.

Plus like @Sharpe mentioned, most of the major shops do a pretty good job at keeping people siloed just enough that they don't see the full picture.
Top Mathematical Finance/Quant Programs In UK That Are Target Schools For Top Quant HFs Or Prop Shops?
Posted by EspressoLover on 2020-11-17 21:49
Take what I say with a grain of salt. It's not like I've managed my own career that well. But why Mathematical Finance specifically? I'd go for a more portable degree, like CS.

I don't really think MathFin is going to help you that much in getting your foot in the door on the buyside. The curriculum's more oriented towards structuring and hedging derivatives, and the main demand for that skill are banks. Prop shops are almost certainly going to hire more CS grads, and quant funds are probably 50/50 at worse. The biggest consideration is that if you decide you don't like finance, CS gives you way more options outside the industry.
How can I improve my CV?
Posted by EspressoLover on 2020-11-24 03:55
95% of unsolicited resumes get filtered by automated keyword scans. Try to get a human, any human, in the loop. Even if just to give you actionable feedback about why you aren't the right candidate. Internal company recruiters are probably your best bet, since their whole job is to look for candidates. If you can get in touch with hiring managers, whether on LinkedIn or through a network, even better. Headhunters are hit-or-miss, but even then getting a human on the phone's going to be a big improvement over just getting repeatedly bounced off a keyword filter.

Idk about Europe, but at least in the US in prop/HFT/stat-arb world there are plenty of quants with only a bachelors to their name. IMO, all people ultimately care about is whether you add to the bottom-line or not. (Well not totally true, being a pleasant and fun person to spend 40 hours a week around also matters.) Nobody care if you're a Fields medalist or dropped out of kindergarten, as long as you make money.
How can I improve my CV?
Posted by EspressoLover on 2020-11-24 17:37
Out of genuine curiosity, how many "pure quants" (i.e. quants who don't touch software) are there actually still around?

This could just be my provincialism talking because I've always been in data driven roles/strats/orgs. But I can't imagine what use there is for a math PhD who doesn't code. What do these "pure quants" even do all day? Sit in the corner and solve differential equations from 9-5? Maybe I could see in the 90s and early 2000s when there was an explosion of new products and strategies.

But I really doubt if yet another variant on Levy jump diffusion is moving the needle for any major firm. ("Okay, this time it's a vol swap on a clique of caplets with a barrier on Tom Brady's odds of making the playoffs") And even if so, it's 2020- compute is essentially too cheap to meter. I think everyone sensible understands that it's far more practical to brute force numerical solutions with cheap, elastic clock cycles. It sure beats the care and feeding of high maintenance philosophes, just for the satisfaction of a crisp elegant analytic derivation.

(Just to clarify, I'm not knocking math PhDs. I've known a lot in the industry, and all of them, full-stop, were competent, if not excellent, at shipping code. All the way from a Jupyter notebook here and there, to maintaining large-scale production systems. At least IME, the industry does a pretty good job of keeping the philosophes in the math department.)

Switching from Data Science to Quant Research?
Posted by EspressoLover on 2021-03-26 02:12
At least in America, plenty of people without a graduate degree hold a QR title at "top firms". Nobody cares as long as you can make money. Don't get me wrong, a PhD can still help get your foot in the door because it's a credible signal of intelligence. But it's definitely not the only way.

As for QA vs QR, it's largely a distinction without a difference. As others have said, everything is engineering these days. I think in the 2000s this was different, because there was a Cambrian explosion of new products and strategies that kept requiring entirely new models to be built from scratch.

Nowadays, there isn't really anything new. We're pretty much refining the same shit we've been doing for the past decade and a half. There's no "pure research" left. Just tinkering and implementation. You're way more likely to spend your wrangling a Spark cluster than solving differential equations. To the extent that there are any pure research positions left, they're probably a low-impact career deadend. Adding one more Levy term to a jump diffusion model is not exactly moving the needle for the average desk at Goldman these days.

If you're really interested in doing academic research in an industry setting, I'd go for tech not finance. Somewhere like Facebook Research really does run like an academic department. A position there will let you do "pure research", and actually get rewarded for it. Nothing like that exists in finance anymore.
HEALTH
Posted by EspressoLover on 2016-06-23 05:44
Maybe try e-cigs, gum or smokeless tobacco? The bad part of smoking is inhaling smoldering organic compounds. Nicotine itself is, at worst, mostly harmless. At best, the evidence keeps mounting that it is strongly protective against Alzheimers, Parkinson's, dementia and age-related cognitive decline. Either way substituting cigarettes for any smoke-free alternative is a huge win.
I smell a Dragon getting pissed off.
Posted by EspressoLover on 2016-07-26 00:21
Geopolitical pessimists have succesfully predicted 57 of the past 2 world wars.

Not saying this type of thing can't blow up into something bigger, but historical comparisons tend to suffer from availability bias. For every assassination in Sarajevo, there's a dozen Fashoda incidents. Successfully defused crises rarely feature prominently in the history books, so we tend to imagine contemporary events ending in the worst case scenario.
How much data does a strategy needs to back test against to be considered usable?
Posted by EspressoLover on 2016-01-14 22:50
Rule of thumb: you need 2/sqrt(Sharpe Ratio) to be confident at p>95%. E.g.:

- 0.5 Sharpe (about same level as historical S&P 500) strategy needs 16 years of validation data.
-1.0 Sharpe (about the level of the Soros' Quantum Fund) needs 4 years.
-2.0 Sharpe (about equivalent to Medallion) needs 1 year.
-4.0 Sharpe (good HFT strategy) needs 3 months.

This assumes that the historical data in the validation set draws from the same distribution as present-day data. If there's regime change involved than the rules change, and there's no simple way to account for it. For example in 2008, even brain-dead simple quant strategies were printing money, because all the multistrats were liquidating to meet redemptions. Knowing this history, if I saw 2008 generate more than half the profit on a ten year backtest, I would discard that period. Also over time most sources of alpha tend to decay in effectiveness. What was a source of 20% out-performance and known to only a handful of firms ten years ago, may only be worth 5% today after being extensively published.
mtm correlation swap
Posted by EspressoLover on 2016-01-18 20:22
Correlation can't be incrementally updated this way because its Covar(X,Y)/SD(X)*SD(Y). The standard dev terms are not only nonlinear, but they're also used in the denominator so even if they were it wouldn't matter.

However you can do a covariance swap that can be daily decomposed. For an N-day covariance swap, the value at T+1 is ([Return on X in T] * [Return on Y in T]) / N + [Implied Covar at T+1] * N-1/N. (Assuming that the mean of X/Y is pre-parameterized at 0).
mtm correlation swap
Posted by EspressoLover on 2016-01-19 01:24
Not sure what's common, but here's a quick and dirty approach I would take. Start with breaking Correlation down into covar and vol:

Cor(x,y) = Covar(x,y) / (Stdev(X)*Stdev(Y)).

A simple approximation that's going to drastically simplify things:

Stdev(X) * Stdev(Y) ~= Var((X+Y)/sqrt(2))

This works because volatility across almost all assets tends to move together and is close to the same order of magnitude. For example with the daily returns of MSFT and AAPL from 2012 to today,

SD(MSFT) * SD(SPY) = 25.12 bps
Var(MSFT/2 + SPY/2) = 32.8 bps

Furthermore if you calculate these values by month, the pairwise correlation between the first value and the approximation is 95.2%. If the assets have persistently different vols, you may have to calibrate the approximation by a scaling factor. This can be done by just dividing the true value by the approximation on a long-range data set. (E.g. 0.77 in the above example).

The correlation approximation becomes a covariance swap divided by a variance swap. Both of which are linearly decomposable:

Cor(x,y) ~= Covar(x,y) / Var((x+y)/sqrt(2))

The remaining problem is that the variance swap is sitting in the denominator, so even though its a linear term it isn't applied linearly. However at the scale of daily financial returns subtracting the variance term will be a very close approximation to dividing by it (scaled to the starting ratio). The change in values of the constituent swaps are:

dCovar/dT = (X@t * Y@t) / N
dVar/dT ~= approxScaleFactor * (X@t + Y@t)^2 / (sqrt(2) * N)
dCor/dT = dCovar/dT * ImpliedVar - dVar * ImpliedCovar

This gives you a time-wise linear decomposition of swap value. However the linear exposures are not static over time. The correlation swap has convexity to the underlying covar and var swaps.
distance from the cross
Posted by EspressoLover on 2016-03-03 21:54
Betting on Trump is like being short gamma. He's way ahead in both delegates and polls. If the status quo remains undisturbed he'll sail right into the nomination. The biggest risk is him sticking his foot in his mouth or some line of attack piercing the teflon. Look at the Betfair chart: his odds tend to trend up and gap down.

If I was making Betfair markets, I'd be willing to offer much more immediate liquidity to Trump backers than layers. If a big news story breaks which shifts suddenly jumps the price, it's much more likely to be downward.

> Mainly though I'm surprised that the implied probability of Trump winning the GoP nomination is so low, from my limited media viewing this side of the Atlantic he looks like a shoe-in ?

As with most things election related, Nate Silver has the best pure analysis of the horse race aspect. I'd say 70-80% odds in favor of Trump are about right.
Is market making a "winner take all" venture?
Posted by EspressoLover on 2016-04-07 23:17
Winner-take-all games usually have low or effectively low-dimensionality. Arbitrage is winner-take-all because everyone's basically trying to do the exact same thing. The only relevant difference tends to be speed. On the other extreme, you have things like fundamental-driven stock-picking. Shop A can be fantastic at forecasting quarterly earnings, leveraging sell-side research and channel checking. Shop B can be better at DCF analysis, mosaic theory and predicting acquisitions. They both can survive by exploiting mutually exclusive niches.

I'd say there's a style of market making that's winner-take-all, or at least a few winners-take-all. It involves being very fast, being first to join on level formation, and aggressively maintaining queue position by being able to cancel quick when the market moves against you. The need for sophisticated research is pretty minimal. In fact it's kind of precluded. Simple logic executes faster. It's better to trade off smarter decisions in edge cases, for faster decisions in cut and dry cases. This approach tends to work best in markets with large liquidity rebates and thick books (more queue driven).

Even within this style, I'd still say there's some dimensionality beyond speed. I don't think it's as much winner-take-all compared to arbitrage. The main reason being that arbitrage (theoretically) never holds inventory, whereas market making does. Modeling the adversity of order flow and determining risk appetite is not a simple problem. Particularly on days with high trading activity. Even if Shop X is the fastest, they may not be quoting two-sided because they've hit risk limits. I suspect (but have no firm evidence) that market-making profits are much less concentrated during the highest volume or most news-driven sessions.

Outside this style, there's a long-tail of varying strategies, which leaves room for a diversity of participants. Major internalizers and dark-pool prop desks are usually going to have some non-neglibile lit-market making presence. (If not just because they have the advantage of avoiding interaction with their own toxic flow.) There's also participants specialized in open/close auctions that spill over to liquidity provision in continuous trading. There's an endless array of strategies heavily reliant on some idiosyncratic fee structure, designated status, order type or matching engine quirk found on specific ECNs. There are people patiently layering orders far down the book, putting on a lot more risk per dollar than vanilla market makers are willing to tolerate.

As others have mentioned this starts becoming fuzzy about what qualifies as market making. Where does market-making end and opportunistic trading begin? I'm sure PDT on net provides a huge amount of liquidity, but few would call them market makers. Consider a strategy that collects $0.0075/share in spread+rebates, loses $0.008 in flow toxicity, but make $0.001 in high-frequency directional alpha. They may even quote two-sided more often than not. Like when alpha's near zero, to hold queue. Are they market makers or stat-arb traders? Without the signals they'd run into the ground. But monetizing that alpha requires transaction-costs well beyond anything available to civilians.
Is market making a "winner take all" venture?
Posted by EspressoLover on 2016-04-11 21:58
@radikal

>There are many exchange mechanisms that provide strong guarantees that a "winner does not take all"

Hmm... Interesting. Any generalized examples you can share?
Is market making a "winner take all" venture?
Posted by EspressoLover on 2016-04-17 01:52
@radikal

Thanks! That was a great explanation.
PCA for yield curve basics
Posted by EspressoLover on 2016-09-16 00:08
Doesn't mean anything. PCA sign is completely arbitrarily, and flipping it has no functional effect. Whatever your software picks is just an esoteric artifact of the linear algebra internals.

My only advice if you end up using PCA regularly, is to enforce some sort of consistency convention. Otherwise you risk running into a bug, where you upload new production PCs for the month, but the sign has flipped, and all your models are running backwards. Codifying the first term as always positive for any PC isn't a bad convention.

"The signs of the columns of the rotation matrix are arbitrary, and so may differ between different programs for PCA, and even between different builds of R."
PCA for yield curve basics
Posted by EspressoLover on 2016-09-22 23:34
As long as the correlation matrix is the same*, then the PC loadings and percent of variance explained will remain the same. So if you're getting significantly** different loadings for different time-frames or price measures, that can only be due to the Epps effect. If the co-movement of the prices were perfectly synchronized, then PCs would be the same at every horizon and measure.

To that extent the best measure to use is probably the one that maximizes percent of variance explained. You want to be using PCs derived from the "true" correlation matrix, i.e. how the instruments would co-move absent any trading frictions. Asynchronous trading and discretization almost always decrease co-movement. So if one measure has stronger PCs, that's usually the measure least influenced by friction.

In the end though, I wouldn't sweat the choice too much. It's a pretty noisy measure, so minor methodology decisions are unlikely to have significant impact on the end trading model.


*This assumes that you're using correlation based PCA (e.g. princomp(..., cor=TRUE)). If you're using co-variance, then it's still true as long as A = yB, where A and B are the respective co-variance matrices, and y is a scalar. Either way, with asset returns, if two measures have the same correlation matrix, then this holds true for their co-variance matrices, except under very weird circumstances.

** Since returns are noisy, the sampled correlation matrix is never going to exactly equal the true correlation matrix from that measure. So even if two measures are processes with equal correlation, their samples are never going to have exactly the same PCs. You can either do a whole bunch of pain-in-the-ass random matrix math to test the null hypothesis of whether two PCAs are different. Or you can use sensible judgement and see if the differences are minor enough to probably not be due to any substantial difference. I'd recommend the latter.
Liquid prompt stuff and curve plays?
Posted by EspressoLover on 2016-12-17 19:19
Prompt means contracts with expiries before the front month. Usually day ahead, at least for US electricity markets.
future of ECN's etc.
Posted by EspressoLover on 2017-01-27 19:59
Over the long-term, the closest thing to Moore's law in finance, is that volumes double roughly every 5 years:

Attached File: tmp.pdf

We've kind of stalled since 2008. So it really depends on your view. Either the process is over (even the real Moore's law can't go on forever), or we'll return to trend. I'll assume the latter. That means in 10 years times, equity markets will be handling about 500% more trading than they are today. Given common estimates of trading volume elasticities, average end-user transaction costs will have fall over 60%.

For US equities most of the major names already trade at $.01 bid/ask nearly 100% of the time. Exchange fees, commissions and market impact matter, but even if those fall to 0, bid/ask spread costs still probably constitute well more than 40% of aggregate t-costs. There's just no way that penny tick sizes can stick around at projected volumes

So either A) volumes break their 50 year trend of continuously rising, B) the SEC implements sub-decimilization, C) nearly all volume goes dark, D) or there's some sort of hack that mimics sub-decimilization on lit exchanges (maybe a market-share coup by exchanges that rebate liquidity takers).
Limit Order Placement Question
Posted by EspressoLover on 2017-03-13 11:54
Yes, they're trying to gain queue position. Placing an order far away from the top of the book is like a free option. The probability of getting filled before getting close to the top is nearly zero. By the time the level gets close to the top the order should have significant time priority and be close to the front of the queue. If you don't like your queue position by the time you get near the top, then just cancel. The only downside is that outstanding limit orders require working capital commitments by most brokers at most times.
Scaling correlation between asset returns for time
Posted by EspressoLover on 2017-08-03 19:28
For a continuous time stochastic processes:

* Assuming the instantaneous covariance is stationary...

* Variance scales with sqrt(horizon). *Unless* the process exhibits auto-correlation. Either mean-reversion, which means variance scales slower. Or momentum: variance scales faster.

* Covariance scales with the the square root of horizon length. *Unless* past returns on process A have non-zero correlation with future returns on process B. I.e. there's some lead-lag dependency.

Since correlation is the just the ratio of covariance to variance, we can say that it should remain invariant across all horizons as long as the above conditions hold.

The latter two conditions imply a violation of market efficiency. That is, if they're broken it implies that we can with some non-zero accuracy predict forward returns on one or both assets from their past returns. That's not to say markets are perfectly efficient, but it's unlikely that there's gross violations of efficiency. Particularly if you're not looking at very short horizons where microstructure effects come into play. Therefore you should expect that correlations at any medium-frequency or longer horizon should be nearly identical.

(Minor caveat being as Silverside mentioned, if you're using different times, like comparing European close prices against American close prices. That introduces lead-lag effects without actually violating market efficiency)

As for stationarity, there's two likely sources of violation: periodic and trend. For the former, a common case might be overnight vs intraday returns. Two stocks may have repeatedly different correlations for their overnight returns. So if you're comparing minute-wise returns against daily this affect may come into play. Anything with inter-day or longer though, it's pretty unlikely there'd be major periodic affects. Maybe weekend or some January effect, but that's usually pretty weak.

As for trend non-stationarity, this is probably most likely due to using derivatives. An example might be looking at the correlation between some specific future and the spot index. Correlation tends to rise as you get closer to expiry, so you'd expect longer horizons, sampled at the same starting point, to exhibit higher correlation.
Statistical analysis of backrest results
Posted by EspressoLover on 2018-09-11 19:03
In terms of point 2, here's a grab bag of some things I find useful from that perspective.

Returns by Horizon: Take every fill, then look at some metric of "market-price" at some fixed time interval after the fill. What's the average return over that interval? E.g. if you fill a bid $100, and weighted-mid-price 100 milliseconds later is $99.97 then you're return for that horizon is -3 bps. Average that over all trades, or some interesting sub-sample of trades. Now do that for a sequence of horizons from very short to long and plot the curve.

This tells you a lot of interesting things. First it kind of helps you know what the general time-frame of your edge is. If almost all your PnL is realized in under a second, you want to focus on getting out of positions very quick. Or if the curve keeps realizing even well past the point of your strategy's average turnover, you probably want to trade less.

It's also a pretty good spot-check for overfitting. You generally want to see a smooth rise on the curve, with the fastest rate of return in the short-horizon, then a general leveling off, with maybe a slow decay on long-horizons as toxicity overwhelms alpha. Generally overfit signals have some sort of weird-ass structure, like all the returns coming from the middle horizons.

PnL by Source: I like to break this down into spread-capture (or payment), fees/rebates, impact from the fill trade (e.g. if you're market making and the level got swiped, causing an immediate mark-to-market loss), post-trade alpha/drift/toxicity (see above).

These are good metrics to keep in mind because they help you determine where to lean to squeeze more juice out of the strategy. E.g. if you're doing great on everything else, but losing all your money on fill impact, then you probably need to focus more on getting better queue position. Also good to look at how these numbers breakdown based on different subsets of trades, e.g. morning vs. afternoon or narrow-spread vs. wide-spread.

Risk-increasing vs. risk-reducing trades: Do trades that increase your total inventory have meaningfully different returns than those that result in a decrease. A lot of times, you'll see that the former have much better return characteristics than the latter. This could be for a number of reasons. One it's a sign of overfitting. To see why, think about an overfitted case, where some optimizer makes all it's PnL in backtest from doing the same trade a million times in a row. It's basically a sign that your fit has fewer degrees of freedom than the number of trades.

Two, assuming it's not overfit, it's a sign that your profit is compensation from some priced risk factor that others are offloading on to you. That's not necessarily a bad thing, for economic, institutional or even mechanical reasons the compensation could still be well worth it. But it's something to be aware of. It also increases the risk of fat-tailed drawdowns, since a common priced risk is more likely to be synchronously sold off by the entire market.

A somewhat related variant to this is looking at returns by how many times in a row you've traded in the same direction. If you're trading a portfolio of instruments you can also consider risk-reducing/increasing from the perspective of single-instrument or portfolio beta.

Distribution of touch sizes on fills: If you're quoting at the bid, what's the average and typical range of bid size at the time of your fill? This is a good metric in terms of estimating scalability of a strategy. It's also a good way to subset trades to determine how likely you are to worsen if you size up. E.g. if returns are much better when the touch is showing

Order lifetime characteristics: Things like how long the order was alive before it was filled, how long it spent at the touch vs. outside, what was the range of alpha signals during it's lifetime, how far did it move up the queue during it's life, how many orders joined behind it, what percent of the orders it shared a queue with were cancelled, etc. Calculating the average trade returns based on these types of things can you point to ways to optimize the strategy. E.g. maybe all your PnL comes from times you quoted deep-in-the-book and it's not worth it to join the top of the queue at the inside.

[Placeholder if I think of anything else]
Oil vs Gas Trade Idea
Posted by EspressoLover on 2018-11-20 19:15
The energy market's littered with the corpses of plucky traders who dived headfirst into some historically aberrant spread divergence, only to watch their portfolios wiped out as the spread diverged even further. WTI vs. Brent, diesel vs. gasoline, Henry Hub vs. nat gas basis, gas vs. power, corn vs. ethanol, crude prices vs. energy sector valuations, pretty much anything Enron did ever, etc.

You can't really treat energy spreads as a statistical black box because of the high risk of sudden regime change. If you throw a pairs trade on between MCD and BKC, it's pretty unlikely that Burger King is going to suddenly decide to completely change over to a cloud computing business. Your biggest risk is maybe M&A, but on a single name basis that's a pretty low, easily diversifiable risk.

With energy you have all these relatively close substitutes on top of rigid inelasticities. It's easy for some small shift in supply or demand to all of a sudden invert the economics on some part of the complex. The spread looks stable, stable, stable, then boom there's a violent phase change.
Oil vs Gas Trade Idea
Posted by EspressoLover on 2018-12-19 03:19
Talked to a nat gas trader a few weeks back, and this was his take on it. Take with a grain of salt, as this was one guy over drinks...

As oil prices collapse, a lot of the shale fields become unprofitable and production slows down. However those wells tend to output a lot of nat gas as an ancillary byproduct, and are now a major source of US supply. So in effect, falling oil prices "wag the dog", and raise NG prices through reduced production. While the two products are substitutes on the demand side, they're now complements on the supply side. In regimes where the latter side of the equation is driving the market, we'd actually expect the two to become anti-correlated.
Estimating amount of speculation in FX markets
Posted by EspressoLover on 2019-04-22 17:00
The problem's even thornier then you think. A lot of "speculative" order flow, particularly the dumb retail variety, is internalized well before it ever reaches anything resembling a real market. I really doubt Plus500's CFDs are showing up on any BIS survey.

My gut sense is that most of the volume is gambling, but the bulk of the positions at any given time are "real money". You have a lot of day traders who have small position limits, but ridiculously high turnover. However I'd guess that most of the major movements are being driven by central banks and other whales. They don't contribute that much volume but do take such huge positions that they soak up a ton of liquidity.
Pulling An Unsophisticated Market Maker In Before Hitting Their Bid/Ask
Posted by EspressoLover on 2019-08-21 16:38
Lots of good replies here, that I think have pretty much covered everything. I'll just add one more idea, that may or may not help.

You might try flashing your order on a periodic basis. E.g. say you're hoping to buy at $2.75, but the MM is sitting on the ask at $3.00. Submit a limit bid at $2.75, let it sit for 5ms, cancel, wait 50ms, repeat. Keep trying this for 20-100 cycles. The MM may be willing to meet you at $2.75, instead of his normal $3.00 reserve for two reasons:

1) It credibly signals that your order isn't trying to take advantage of some latency arbitrate. Since the order's arriving on periodic, predictable intervals, you're proving that the order isn't a direct response to events in other markets.

2) It credibly signals that you don't have a big inventory sitting behind the order. Because if you did, you'd be interested with interacting with as much liquidity as possible. Not just low-latency market makers. Therefore instead of cancelling the order after 5ms, you'd let it rest so that it interacts with the natural flow in the market.

Whether this works or not is completely dependent on whether the MM's bot is programmed to recognize the behavior. So, definitely can't tell you whether it will work in your case or not. But it's probably worth giving it a shot.
Reverse-engineering aggressive orders from ITCH feed
Posted by EspressoLover on 2020-03-17 17:52
> specifically to assume that "E"-messages following each other with 1 ns between them come from the same aggressive order... I have been informed that this assumption is not advisable

I can't comment on Stockholm specifically, but I have experience with similar ITCH systems. The timestamps are based on the matching engine event. So the entire marketable quantity of an order should have the same timestamp. Even if the messages spill across multiple MoldUDP packets. Subsequent orders represent a new matching engine event, with a different wall clock.

So, your assumption should hold. E-messages share a timestamp if and only if they belong to the same marketable order. The only time this may not be true is during an auction. During the cross, many orders may get executed on the same matching engine event. Also be mindful to distinguish the marketable and resting part of an order. I may enter a limit buy for 2000 shares against an ask of 1500 shares. 1500 would be executed at arrival. Then 500 would rest on the book at the bid, which may get crossed at a later time.

One way to empirically verify that your ITCH system behaves like mine is to check for add (type-A) messages with overlapping timestamps. Assuming, your order book doesn't support mass-quote functionality, then each new OUCH message should result in at most one add message on the ITCH feed. If you see A-messages with overlapping timestamps, then that tells you the timestamps aren't coming from matching engine event time.
non-quant trading
Posted by EspressoLover on 2020-04-07 10:38
> Any asset class with holding period hours to weeks (not HFT, not buy and hold). FX, commodities, macro bets, stocks, trend following, technical analysis, etc etc. Who are some legit non-quant traders, and what is their thought process when making trading decisions?

George Soros' two books on trading (Alchemy of Finance and The New Paradigm) might fit what you're looking for here. I think arguably he's the most successful trader within your criteria.

They're definitely not instruction manuals, any more so than spending an afternoon with Terry Tao will turn you into a Fields Medalist. But it peels back a layer on Soros' thought process. And afterwards if you squint, you can maybe start imagining how he might approach different scenarios. But again those book won't give you any alpha unless you're already starting with something pretty close to Soros' mind.

Along similar lines, if you can get ahold of it, check out Trader, the 1987 documentary on Paul Tudor Jones. PJT bought up all the copies and had them destroyed, but I believe there's a torrent floating around Pirate Bay. I'd second the Market Wizards series. Opalesque TV has some good interviews with a wide array of people who might fit your description.

One final point is be careful about comparing those who were successful in the past with the present-day. If you're talking about an hourly timeframe, markets are *way* more efficient than they were 30 years ago. I think there were a lot of successful non-quant traders from the past who were basically proxying for what's now done by stat arb desks. If you're interested in this from a historical perspective, maybe look up some of the literature on the turtle traders.
non-quant trading
Posted by EspressoLover on 2020-04-08 13:16
@doomanx

I really enjoyed Jim Rogers' travelogues as a kid too. Even if you completely overlook the investing discussions, it was fun to read about a super-rich dude driving around the world on a motorcycle with his model girlfriend.

#DEShaw

I can't speak to DE Shaw specifically, but a lot of the funds with quant DNA take a unique approach to fundamental/discretionary equities. Basically you start with human analysts who stack rank the stocks in their universe. That's then treated as an alpha, which is dumped into a stat-arb portfolio optimizer pooled across the entire group.

I think this is actually a pretty good approach. One problem with human portfolio managers, even otherwise good stock pickers, is the don't do enough to diversify across orthogonal bets. The human tendency is to take your strongest conviction idea, and just plow everything into it. Even if risk management limits your single-name concentration, often times a PM's positions will just be different riffs on the same theme.

If Michael Burry had been running 25 orthogonal bets instead of being obsessed with a single thesis, maybe he'd only have half the final returns, but with virtually none of the drawdown. So, I think overlaying stat-arb portfolio construction can also drastically increase the Sharpe ratios even without any improvement to stock picking skill. The secondary benefit is that it becomes a much more tractable and less noisy challenge to assess the skill of your individual analysts.
non-quant trading
Posted by EspressoLover on 2020-04-08 15:00
Yeah, you're right. Technically it's not kosher, because you're treating ordinal rankings as cardinal scores. But in practice, you can usually pick an arbitrary zero-centered mapping from rank to signal, and the exact details rarely make that much of a difference. Scaling magnitudes isn't that important, because turnover on a fundamental long/short portfolio doesn't incur that much in transaction costs.

For example, pretty much every academic paper on cross-sectional anomalies does this by assigning -1/0/+1 to the bottom decile, middle brackets, and top decile of the universe respectively. Even this simple approach is pretty much good enough, that it's been the standard approach for 50 years.

Although, I've never been directly involved in the process, I'm sure most of the real-world systems have some sort of accommodation for conviction. Both within an analysts' universe as well as between analysts. By this point, some of the funds have been doing something like this for nearly two decades. So, they should have pretty good empirical data for making evidence-driven tweaks. 

Unintuitive thesis results
Posted by EspressoLover on 2020-04-10 20:33
Scenario One: I go to the grocery store and see that the toilet paper shelves are nearly sold out. I decide to grab whatever's still left, even though toilet paper wasn't on my original shopping list.

Scenario Two: I'm interested in dating a girl at my office. We've been back and forth flirting for a few months now. One day she gets flowers from an anonymous admirer. I decide I better ask her out sooner rather than later.
Unintuitive thesis results
Posted by EspressoLover on 2020-04-12 16:12
Maybe, I'm misunderstanding the conversation. But there are economic reasons to cross the spread, besides just inventory effects. Some participants have alphas, and sometimes those alphas both exceed the spread cost and decay too fast to be monetized passively. A good example is after the index futures tick up, it likely makes sense to sweep the touch on cash equities.

I also think there are also economic reasons to use IOC. Resting liquidity, even on new level formation, is subject to adverse selection costs in a way that pure IOC isn't. If I'm trading off a spurious alpha signal, others are more likely to quickly swipe my resting limit order. This goes double when you're not operating under latency supremacy, because you're probably arriving at Nth place instead of at the front of the queue.
legging a spread is like spreading the legs
Posted by EspressoLover on 2020-04-14 22:53
Here's how I would approach it, but YMMV. First, I'm guessing (but could be mistaken):

O(TCost Illiquid Leg) >> O(TCost Liquid Legs)

In which case the overriding objective is to minimize the t-costs on the most expensive, illiquid leg. And that almost certainly means working the schedule passively and patiently. Like you suggest, let the illiquid execution dictate the cheap legs.

If it costs a lot less to leg in to the liquid instruments, you maybe don't care how they get filled. The relatively de minims difference between simple and efficient execution is maybe not worth extra complexity. But optimal execution of the liquid legs would basically involve trading off the risk of being unhedged against the marginal cost of faster execution.

And that basically involves coming up with a risk budget. How much is X% in additional portfolio volatility worth to save Y% drag on the strategy returns? (I'm assuming the unhedged exposure has zero drift, but if not it's easy to add a term for that.) So, you start by patiently working the illiquid leg. Then immediately after you get that fill, you solve for best execution on the liquid legs. Minimizing X% against Y%.

The biggest wrench to the above approach would be if the t-costs on the liquid legs still makes up a significant proportion of the strategy's aggregate t-costs. (For example VX is very liquid, but still expensive to trade because of the wide tick size.) Then it essentially becomes a 3-body problem, and the only way to solve it is to throw clock cycles at a numerical optimizer.

Particularly if you're dealing with multiple thick-book instruments, you probably want to be working in each leg's respective queue right from the start. But that poses a problem where you'll get filled at different times and have non-deterministic unhedged positions. One way to hack around this is if you can find a "bridge hedge". Some instrument that you temporarily hedge with while the actual legs are getting filled. Maybe it's not the perfect hedge, but its cheap and fast to trade, and does a good enough job for the execution window. For example if you don't want to cross the spread on STIR futures, you can proxy an equivalent duration hedge with ZB for a few hours while you work the Eurodollar queue.
Corporate Action Edge Case
Posted by EspressoLover on 2020-04-29 22:27
Oh man, you don't know the rabbit hole you're opening up here. Welcome to the world of odd-lot arbitrage. To paraphrase Kissinger, the fights are vicious because the stakes are so low.

The short answer is that the tender price is often pre-set well before the actual date it occurs. And even then the tender price does not often match the actual market price to any degree. (Especially when management is crooked.) This often creates situations where the tender price significantly differs from the real price price, giving rise to arbitrage opportunities.

Also it usually isn't just fractional shares, but anything less than 100-shares. At least in the US, where 100 is the odd-lot threshold. There's a number of reasons. Trading or holding an odd-lot often incurs high costs, so it helps avoid shareholder lawsuits. Also the point of a reverse split is usually to reduce the number of shareholders, so tendering out small-holders is often the point.

Unless extreme precision is really important, I'd just assume that anything less than 100 shares get tendered at last market close. But if hyper-realism is very important, than you'll need to read the actual legal body of every tender offer closely.
Tail Risk Strats
Posted by EspressoLover on 2020-05-06 02:12
I'll bite as someone who's generally skeptical of tail-risk strategies. In particular, here's what bedevils me about the tail risk thesis in general.

Let's say Universa does have some sort of secret sauce in terms of gaining long-vol exposure and reducing the bleed with sophisticated trading. Why couldn't they simply transmute that to a vanilla alpha product, by overlaying a standard short-vol hedge on top of their sophisticated tail-risk core?

The value proposition is they know how to cheaply buy vol that pay off similarly to OTM index puts in the case of a major drawdown. So isn't the next step to take that long-vol portfolio and short the equivalent in VX futures and collect the carry? Now instead of just avoiding the bleed, they'll be collecting the rich variance premium every month (while only paying the cheap premium on their core tail-risk position). This provides the high-Sharpe regular income of selling puts in good times, while avoiding blowups that normally accompany that strategy.

That seems like a much easier product to market than a tail-risk fund that pays off once a decade. At the very least there's no reason they couldn't sell *both* products. But the fact that they don't makes me suspicious that they're not doing anything much different than vanilla OTM put buying.

To caveat, I'm not a vol-guy and this is a very off-the-cuff analysis. So, I'm assuming I'm definitely wrong about something here.
Newbie question re modelling and identifying trading opportunities
Posted by EspressoLover on 2020-06-26 02:40
Great username!

My general experience is that most alphas are pretty robust to the details of how they're modeled. If there's something "real" underneath, then a simple approach or rule of thumb should still do a pretty good job. It won't necessarily capture as much of the alpha as a sophisticated model, but it should still be pretty reliable. If someone tells you that an alpha doesn't work at all without a very complex model, that's usually a sign that it's spurious and will fall apart out-sample. You'd be pretty surprised how many $100 million trading desks basically run on the type of simple rules like the examples you mention.

That being said there's two major drawbacks to the rule of thumb approach. First they're deceptively easy to overfit, because the bias-variance tradeoff can't be evaluated. The advantage of classical statistical techniques is that the results come with error bars, which make it easy to evaluate the null hypothesis. Even modern-day ML approaches like neural nets don't give confidence bounds, but do give a fully automated pipeline from ground truth to finished product. Therefore they can still be evaluated with empirical risk minimization and cross validation.

In contrast, the rules of thumb approach involves some totally opaque human intuition, who's inevitably biased by what he's seen before. You can't just "forget" about everything outside the immediate dataset and work from scratch. So, if you're going to take this approach you have to take pains to minimize the effective degrees of freedom and implicit number of hypothesis tested.

It's pretty easy to think "oh, let me just trying bumping this threshold a little or add this condition...Hmm, that didn't really work, let me just try a slight variant..." And then pretty soon you're overfitted like crazy on what's a seemingly simple rule. In the rule of thumb approach, come up with a simple a priori hypothesis, test it, then take it or leave it. Once you start adding bells and whistles to juice it, you pretty much gotta move statistical methods.

The second major drawback for rules of thumbs are that they don't generalize. Say your rule is buy when X is bigger than a threshold, and Y hasn't moved for 10 seconds? What about when Y hasn't moved for 30 seconds, should you lower your threshold? Or if X is way bigger than threshold, can you weaken the condition to if Y hasn't moved for 5 seconds? What if Y has moved, but not when you beta neutralize it? How do you combine multiple rules into a single strategy? What if you want to adjust for different trading costs? Or skew your inventory for risk purposes?

Almost by definition, the rule of thumb approach means you're leaving money on the table. Not that this is the worst thing in the world. Done is better than perfect. And there's certainly value to quickly turning around an MVP into production, getting validation in live trading, then iterating based on real-world learning.
Explaining market making
Posted by EspressoLover on 2021-04-26 16:29
You're going to think I'm being sarcastic. But honestly, if you want to grok market making on an intuitive level, watch some of those pawn shop shows on the History Channel.
Explaining market making
Posted by EspressoLover on 2021-04-29 19:12
A market maker is just a business. A business which sells the product of liquidity. Like pricing any product, there's more than one way to skin a cat. A slick-talking used car salesman will use gut feeling and intuition to set prices. Carvana will prices its inventory based off an algorithm built by PhD data scientists. There's room enough in the market that both approaches can co-exist in a stable ecosystem.
Competing in HFT Space without Massive Infra
Posted by EspressoLover on 2015-01-07 21:56
I’m a regular lurker on these phorums, have a lot of respect for the community here. I’m somewhat at the end of my rope and out of ideas, so I’d greatly appreciate any honest input. Basically for the past four years I’ve been running a one-man HFT operation. I had some initial modest success, especially when vol was higher, but the situation has become increasingly untenable. I don’t think I can compete without unfeasibly large investments in infrastructure. I trade CME equity and rates futures, not for any particular product specific reason, rather because they’re liquid and only need a single co-location point. I believe my models, alphas and strats are actually pretty impressive considering my operation’s size (high R^2s, robust performance, etc.).

However, even if I had the best alphas in the world, there are several issues that I think prevent me from competing. First I’m eating higher exchange fees. The KCGs of the world with self-clearing and exchange membership are paying less than half per contract. Second, latency is too high. I have DMA and pretty much the fastest possible commodity hardware setup (latency from incoming multicast to outgoing packet on the order of 10’s of microseconds). I’m lucky if I get the 50th matched trade against some signal event. Without going to custom hardware I don’t see much feasible improvement. I also suspect that only having a single CME gateway is a big issue. KCG can route duplicate IOCs through every gateway, realizing the lowest congestion pathway. Finally my lack of inter-colocation feeds is a big disadvantage, for example trading ES without seeing the activity on SPY. In 2011 these weren’t fatal issues, but at current vols/volumes every alpha event is picked dry. Adverse selection is now much worse.

Unfortunately I don’t see many options to improve this situation, short of dumping in a ton of money. If I was relatively sure of a positive NPV, I’d be willing to invest $500k into the infra. But I think that’s far less than what’s needed. The one advantage relative to KCG is a way lower cost basis. A few $k/day would keep the lights on. My best hope is to find some a small niche that leverages what I already have (good alphas/strats, pretty fast customized software, good research libraries). I have a lot of ideas: less liquid futures, equities, longer horizon strats, after-hours, partnership with good infra. But given my resources and risk tolerance, I’m going to have to throw in the towel soon, so I gotta get the next idea right.

I’d really appreciate an objective opinion from a removed third party. There’s a lot of really intelligent people on these boards, so I’m definitely looking forward to your advice. Also there seems to be a fairly good representation here from medium-sized shops. Any general insights about competing without eight-figure infra budgets? Thanks in advance to anyone who responds!
Competing in HFT Space without Massive Infra
Posted by EspressoLover on 2015-01-08 03:17
@darkmatters

Thanks for the response! I’ve dropped you a PM at your profile email.

With regards to positioning, I’d conservatively estimate 30th would be minimum viability. Put another way, for liquidity taking strategies, I’m getting around 18-20% fill rates when trying to hit the resting bid/ask, but I need to be somewhere around 35-40%. Obviously quantifying liquidity providing is a little hairier, but the overall gist is pretty similar.

However I think with a decently fast equities feed in production, the margin wouldn’t be so tight. I’ve done some research looking at this from a historical perspective. A good deal of my adverse selection on the index futures seems attributable to relative richness/cheapness of the mega-liquid equities: SPY, BAC, AAPL, VXX, etc. Anecdotally it does seem like more price discovery has shifted to New York from Chicago.


@a*

I do appreciate the frank advice. If you wouldn’t mind, could you follow-up? Let’s assume I do have the requisite passion, and am willing to push on. (Not claiming its true.) What would you do in my scenario? You seem to think a one-man, 6-digit operation doesn’t have a value proposition, and nowadays I don’t necessarily disagree with you. Given that I don’t the resources to be much more than a one-man, 6-digit operation; what would you say is the best way to continue in the field? Keep trying to grind it out, or folding into a larger group?

I do have a not altogether small amount of risk tolerance and skin in the game. I left a pretty decent research position at a KCG-peer during the halcyon days. But at some point, even total risk-neutrality means you change tactics when you find yourself in a negative EV process. Again, thanks for giving me your honest opinion.
Competing in HFT Space without Massive Infra
Posted by EspressoLover on 2015-01-08 03:18
@FDAX

Thanks for the awesome insights!

Your post really captures a major dilemma when deciding how much to invest in infra. Simulations have pretty limited fidelity at this granularity. I’m hesitant to spend limited resources making improvements that have uncertain benefit. On the other hand there’s always the possibility that I’m pushing against a wall, because I don’t know how much the lack of [X] is handicapping me. I’ve had a lot of wasted man-hours trying to juice the research to get over barriers that ultimately required infra improvements.

My median wire-to-wire latency is approximately 45us, so already I know I’m getting shredded by FPGAs. Most of the lat comes from generating model signals. I’ve tested compressing/pruning the model to evaluate a little faster, just to ballpark the impact of latency improvements. Running at median 35us seems to barely affects realized positioning. I would assume that most of the latency-superior competitors are on custom hardware, so I don’t see high returns from pushing commodity systems further.

Diversification does seem to be the best approach the more I think about it. It was hubris trying to build general-purpose systems for ES or ZN, thinking they would then be competitive in other markets. Well, easier said than done. I really believe that my alphas have a fair degree of orthogonality to them. There may be many markets where the comparative advantage outweighs the adverse selection, I just have to find them.

Longer horizon definitely would not be an easy transition for all the reasons you mention. I do have some limited experience in equity stat-arb. I’ve toyed with running a medium turnover strategy, using HFT alphas and infra to minimize execution t-costs. This would at least keep the short and long-term systems intellectually abstracted. Though it would certainly require a lot more risk capital.
Again, I really appreciate such a well-thought out, helpful response.
Competing in HFT Space without Massive Infra
Posted by EspressoLover on 2015-01-09 22:21
Again, thank you very much to everyone who took the time to respond. The value of what I’ve learned from you guys went way beyond my expectations. Every single response has been immensely useful to me. (As an aside I’ve added an email to my profile).

@goldoark

That’s along the lines of what I’m thinking. Achieving stable cash flow would make it easier to invest in infra, bring other people on, divide labor, etc. It’d be easier to re-tackle the competitive HFT side with less resource constraints, and a few other minds. That’s an interesting point about finding people who need better execution. Hadn’t thought much along that angle.

@HitmanH

Based in Florida, but somewhat amenable to re-location. I have been thinking about platforms. The only reason I hesitate, is I’ve heard bad things from former colleagues who went down this route. Platforms that misrepresented their latencies, inter-colocation capabilities, etc. Then again I’ve experienced building a system from top-to-bottom, so I think I’d be reasonably good at due diligence.

@signalseeker

I think that’s definitely a good point. I’d analogize it to trying to win mid-level chess tournaments, but you only start with a single bishop. Even if you’re Magnus, who even knows if it’s possible? More to the point, who cares? You’re spending a lot of time trying to improve at single-bishop chess, and a lot of the effort isn’t even directly translatable to regular chess.

Like I said to HitmanH, the only reason I hesitate to go down the bucket shop route, is worrying that I sign up and they end up having sub-par infra. Any general insight on how to pick a good bucket shop?

@a*

That makes a lot of sense. Owning proportion of a viable business is a lot better than 100% of zero. Increasingly it seems like those might be the best options. If you don’t mind though, how do these arrangements typically work with the pre-existing IP? I’d hesitate about losing any rights to everything I’ve done on day 1.
Competing in HFT Space without Massive Infra
Posted by EspressoLover on 2015-01-09 22:23
@radikal

Thanks very much for the offer on the intros. I may send you an email over the upcoming weeks, as I decide what to do.

Also thanks a lot for the data points on the latency and fees. That gives me a much better handle on the environment. Can you give any color on what the other CME specific issues you mentioned were? (Totally understandable if you can’t go into it).

@ESMaestro

I do have a Rule 106.D seat lease. Can’t afford to buy into membership. My clearing broker charges $0.07/contract at the top volume tier (which admittedly I could probably lean on harder). CME clearing fees for leasees are $0.21. Globex fees basically disappear at HFT volume levels. So that’s $0.28/contract. As I understand it members who self-clearing are only paying $0.095/contract. Though to be honest, I’m not sure even with this fee schedule if I could break-even with my crappy positional latency. It would certainly bring me a lot closer, but I’d have to re-fit some of my alphas to optimize at these tcost levels.

That’s really helpful to know that you’re seeing opportunities on CME at those ticks/horizons. I have some experience in equity stat arb, so I’m familiar with trading on those horizons in equities. One reason I never looked too much at this, is because I wasn’t aware if any significant opportunities exist in futures space. Knowing that people do have success in the space, gives me a lot more confidence.

Definitely great advice about remaining flexible with regards to strategies.

@andre

I would certainly appreciate the advice. I’ll drop you an email later today, after I have a chance to get together some of my technical details and sub-component latencies.
Competing in HFT Space without Massive Infra
Posted by EspressoLover on 2015-01-11 19:37
@harryb,

Thanks, for the opinion. Seems to be pretty good advice.

@a*

Awesome! Thank you for the clear explanation.

@radikal

Okay, well good to know that there is ambiguity in assessing platform infra quality. It's really helpful to know, so I can be on the lookout. Maybe ask around former colleagues.

The research and alpha IP does seem valuable to me, but I could be wrong. Based on my time at big-HFT, the alpha researchers were the most heavily poached, and the alpha code was definitely the "secret sauce". My alphas and research systems are nowhere near as good, but I believe they are fairly decent.

That being said, that was four years ago. Based on what I hear, HFT alpha has become more of a solved problem and more of the competitive focus has shifted to the infra side. So I could be outdated in my views.
Queue Position
Posted by EspressoLover on 2015-11-05 09:06
> you could keep track of queue position on a firm wide basis. so if traderA cancels his order it can get reallocated internally to someone else.

This is a good idea, but traderB has to be aware that she is, to a certain extent, consuming traderA's toxicity.
Quant credit trading
Posted by EspressoLover on 2015-11-06 00:17
Citadel has a large and sophisticated star-arb-esque credit group.

As I understand the main inhibitor isn't necessarily clean data. It's running these strategies in OTC dealer markets. It's a losing proposition for market makers to trade with counter-parties with alpha on the same time horizon as their turnover. This works in equities because everyone trades on anonymized centralized order books with time-price priority. But in CDS space, the broker-dealer is going to know which clients are ripping them off and simply stop trading with them.

Citadel gets around this, because they're large enough and do enough other non-quant credit business that no broker in their right mind will cut them off. But a fund that was solely devoted to quant credit wouldn't be able to survive.
Quant credit trading
Posted by EspressoLover on 2015-11-09 03:35
Unfortunately no. Just had a couple of close friends work in the space.
Execution on Thick-Book/Wide-Tick Equities/ETFs
Posted by EspressoLover on 2015-11-09 04:03
Anyone have insight here on what's the best approach for getting good execution on thick-book US equities? I.e. lower-priced, high-volume stocks that nearly always sit at one penny NBBO spreads with large size on the touch. BAC or VXX would be good examples. The baseline would obviously be hitting the ask, which eats $.005 in spread cost and another $.003 in exchange fees. But what kind of improvement is feasible from a relatively simple execution system or off-the-shelf algo?

The alphas I'm working with are relatively patient (time horizons of 5-30 minutes) and the execution isn't bound by any fixed schedule. I can join the bid or even 1 level away and probably still get filled with regularity. I vaguely remember from ITG's curves that 1/4-1/3 spread is a good rule of thumb for low-participation, high patience execution that tries to mostly execute passively. Don't know how accurate that is. Also I'm thinking that since the 1-tick spread is so wide there's a good amount of opportunity for dark price improvement at the mid or better.

I'd appreciate any insight phorumers might have around this issue. Thanks.
Queue Position
Posted by EspressoLover on 2015-12-04 00:24
> I wouldn't randomise it as Baltazar suggested. You want your simulations to be repeatable.

You could always use pseudo-random numbers from a hash function applied to the market data itself. If you want to sample over multiple random instances, just add another few bits to the hash that take an int corresponding to which Nth random instance it is. Unless the historical market data or its format is modified, runs should be totally repeatable. (Not necessarily endorsing the benefit of randomness as worth this effort, only pointing out the option).
Re: what does AndyM think of T-Notes?
Posted by EspressoLover on 2015-12-12 03:36
Insanely low risk tolerance in the market now. Implied vols are at a 70% premium to 1-month historical vols. All over a quarter point rate hike, which is priced in already.
Historical Stock 1 Min Data.
Posted by EspressoLover on 2015-12-14 02:41
TickData.com

https://www.tickdata.com/historical-market-data-product/quote-bars/

Can't vouch for them being the cheapest though.
Queue Position
Posted by EspressoLover on 2015-12-16 02:30
CME consolidates order updates, so 1a and 1b do not fully exhaust the ways that queue structure becomes unknown. You can (and frequently do) get situations where the update has +10 qty and +3 orders. Now all you know is that there are 3 orders at the end of the queue that sum to 10. Except you don't even know that because maybe the update also contained a cancel or modify as well. That's rarer, because usually bunched activity tends to be in the same direction, but its far from unheard of. Maybe there were 4 adds with total quantity +15, and 2 cancels of 5 qty.

You can really start to go down the rabbit hole where you chain these aggregated orders to each other. As information reveals on one, you modify the conditional bounds on the other. Doing this accurately is challenging, doing this accurately in O(10 uS) is Herculean. Another point is that trade updates tell you the size of the resting orders that were matched. You can use this to identify traded unique-size orders as no longer present in the queue.

Here's my higher level critique: Your approach is centered on determining the queue structure that can be known with total certainty. I think a fair number of things can be known with simple rules. Accounting for the more sophisticated dependency and complexity will let you know even more. But I don't think the tradeoff of effort for marginal information is worth it.

At the end of day most of the queue structure is going to be un-knowable, even with perfect tracking and unlimited computational resources. There's simply to many 1-qty orders washing everything out in a sea of noise. I think the effort is better spent on models that infer statistical expectations rather than perfect certainty. There's an analogy to fluid dynamics here. Even if all the data is known and the system follows deterministic laws, sometimes its just too overwhelmingly complicated. Simplified statistical approximations may sacrifice total accuracy, but they're still often the best appraoch.
Queue Position
Posted by EspressoLover on 2015-12-17 04:42
Yeah, using trade messages is the standard way to calibrate queue estimation algorithms absent the ability to actually use live orders. The behavior and format of trade messages is summarized here:

http://www.cmegroup.com/confluence/display/EPICSANDBOX/MDP+3.0+-+Trade+Summary+Order+Level+Detail#MDP3.0-TradeSummaryOrderLevelDetail-TradeSummaryMessageStructure
PA short HY credit
Posted by EspressoLover on 2015-12-17 04:54
If you want a rebalanced short ETF to behave like a classical short position, all you have to do is rebalance your own position in the opposite direction. E.g. say you start with 1 share of $100 in etf SHORT, which targets -1X daily in asset A. If A falls 10% on day 1, then rises 10% on day 2, the value of SHORT falls by $1. In contrast a classical short sale would have $0 PnL. But if on day 1 you sold the $10 worth of SHORT, maintaining a $100 constant exposure, by day 2 your PnL would be also be flat. Then at the end of day two you'd buy $10 more of SHORT to keep your exposure constant.
PA short HY credit
Posted by EspressoLover on 2015-12-17 14:15
Depends how liquid the ETF is, and how volatile the underlying is (which scales how much per day you have to rebalance). Consider SDS, which is actually a -2X inverse ETF to SP500 (there's few liquid -1X ETFs). The logic's still the same, you just sell into gains and buy into losses to maintain constant exposure. The mean absolute daily move in SDS over the past few years is about 1.2%. It's $20/share, nearly always trades at a penny bid/ask, and let's say costs you another $.005 per share in exchange fees and commissions.

That adds up to 15 basis points per annum in transaction costs, which could very likely still make this attractive versus the borrow rates for a traditional short. Not to mention the risk of short squeeze. (I know in this particular case futures are also an option, but I'm just illustrating the specific case of equity shorts vs inverse ETFs). Now of course, the ETF itself is also rebalancing their portfolio daily, as well as borrowing. So those costs, on top of management fees, do drag the performance of the ETF. In recent history SDS has suffered a mean performance drag of about -0.248 bps/day, or 62 basis points per annum.

Altogether this works because an institutional fund likely can negotiate better terms for its short sale than a retail investor. 


Queue Position
Posted by EspressoLover on 2015-12-18 18:43
@Leboswki

That's a good point about determining your own queue position. A minor point to be aware of though. The final entry in a trade message can be a partial fill. E.g. an order of size 15, can get filled 8 on one trade then 7 on the next. If you have a qty-7 as another unique order you might erroneously assume that it's been filled. So you need someone way to track the last possible partial against your existing uniques. A series of 1-order trades can always be a cumulative partial. Once you see a multiple order trade message you know the previous partials complete.

The level summary message will tell you the order count at the level. So if that went down by the same number of orders as the number of orders in the trade message, then it's likely that it wasn't a partial. Vice versa if it went down by one less than the number of orders (0 in the case of 1-order trade message), it most likely was a partial. However this isn't guaranteed since its always possible that there was other add/remove activity on the level that was bunched with the trade.

Also order's can be modified down in size without losing queue priority. If you have a 21 order early in the queue and 14 later, the 21 could have been modified down to 14. You might see a 14 fill and assume you're earlier in the queue than you are. As another poster alluded to in this thread, this can be an explicitly adversarial strategy to "shape the book". You'll see it much more often than a random distribution of order sizes and modified would imply.
You Know Who - Renaissance Watch
Posted by EspressoLover on 2016-03-09 01:30
The patent-ing is really insidious. It's not just about making money from infringement damages, but allows you to get a good luck at competitors' secret sauce in discovery. Since HFT is so reticent to publicly divulge even basic strategic elements, it will be really difficult to challenge based on prior art.
You Know Who - Renaissance Watch
Posted by EspressoLover on 2016-03-09 04:47
Well here's a case of a patent troll suing Citadel for the legendary idea of relative value trading between a leveraged ETF and its basket. As ridiculous as this sounds, it's still really hard to defend against. Of course, everyone who's been in the industry for more than 5 minutes realizes that leveraged ETFs have convexity effects relative to the benchmarks.

But that's surprisingly difficult to show in court. ETF market makers have no reason for publicizing even basic aspects of their strategy. Unfortunately prior art requires public documentation, and most HFT strategy tends to exist as oral lore.

http://news.priorsmart.com/leveraged-innovations-v-citadel-l9pC/

VIX and VXX
Posted by EspressoLover on 2016-03-18 18:22
VXX Return = VIX Return + (Spot/30-Day Futs Basis) Return + Carry + (Some minor ETF basis effects)

The former two terms are mean-reverting, the latter two integrate to a unit root. So the first thing to note is that even if spot VIX is lower, the 30-day futs price is still 10% higher than the Nov low. 30-day futures in Nov touched 15.96, today they're 17.73. VXX, as of now, is only 6% above its Nov low.

Carry, in fact, has actually been on net negative over the period, cumulatively dragging VXX price down. In fact since Nov-19, average VXX carry has been below -40% annualized. To be fair this is less than the historical average of -80% annualized. But on net the slope of the futures curve has still lowered VXX's price over the period, just less so than it normally does.
Tick Data Research Project Available
Posted by EspressoLover on 2016-04-13 09:01
Drop me an email with the specifics at the address in my profile. Can't promise anything. But if it seems like something I can easily shell-fu out with my existing tool set, I'd do it. I ain't proud. A few months of co-located Eurex data isn't a spectacular bounty, but it's worth an hour or so of work.
Equity position limits by name
Posted by EspressoLover on 2016-04-15 00:22
Rule of thumb, at least in Chicago prop-land a few years back, is that HFT strats tend to hit friction when they push 10% ADV in trading. In terms of pos-limits that's going to depend on strategy turnover. But I'd say at a minimum you should be doing 5X daily turnover, otherwise it's probably something not quite HFT.

All in all, HFT pos-limits in excess of 2% ADV start to smell funny. For the E-minis, the major HFT participants tend to keep position-limits on the order of 0.1% ADV. (See below CFTC paper). But ES is going to be more liquid and turnover more than most equities, so that's probably a bit low.

http://www.cftc.gov/idc/groups/public/@economicanalysis/documents/file/oce_riskandreturn0414.pdf
Equity position limits by name
Posted by EspressoLover on 2016-04-16 01:58
> If so, should that be 10% participation, rather than 10% ADV? 

Probably. Usually most HFTs fit static pos-limits over the order of weeks or months. Selecting %3MonthADV is pretty easy in this context. Targeting %participation would require quite a bit more of finesse than static monthly pos-limits. Monetization tends to be much less of a precise science than alpha.

> Does "friction" meaning slippage/market impact?

Sort of. But usually in HFT you're dealing with rapidly decaying signals. I tend to think of market impact as the long-con. If your alpha's going away in 100ms, then you might as well be a greedy pig feeding at the liquidity trough. Who cares if the market moves big, you're just as likely to want to trade in the opposite direction soon enough.

I think it more has to do with the adversity of mutual information. In the limit case, as you approach 100%Volume, then every single trade goes through your system. Pretend you have an oracle that exactly predicted the long-term price trajectory of every security. If the existing resting bid/ask is profitable then price improve it at an epsilon, if not then cross it. You'll be a party to every trade, and you're assured to make a profit regardless of the inventory you take on.

Now, slightly adjust the scenario. Say the oracle randomly betrays you 1% of the time. Not only does she give you bad information, but she relays the truth to your enemies and explicitly informs then that you're mis-informed. If you insist on continuing the above strategy, your enemies can bankrupt you during the betrayal periods. By exploiting your willingness to make arbitrarily large trades, you will lose more during that 1% then you will make in the 99% normal trading.

A simple way to fix this is to set position limits, which cap losses incurred during betrayal periods. But the consequence is that you will participate at less than 100%Volume. Even during normal periods you will miss out on trades as you randomly hit position limits. Even though you still have by far the best trading system on the street, the existence of even a small amount of mutual information limits your ability to capture the entire market.

So I think the "friction" mostly is a consequence of increasing adversity. As you scale up, you go from betting on "this system beats other systems in most scenarios" to "this system has to beat other systems in every scenario".
Does WalkForward represent a curve-fitting exercise?
Posted by EspressoLover on 2016-05-05 22:09
Instead of trying to fit trades, fit signals. Define some thesis, let's say GBPEUR falls over the next three minute every time 50 Cent makes an Instagram post. The wrong approach is to start by defining a trading system, short GBP every time this occurs, then start fitting trading parameters in some way to maximize PnL and minimize risk. Instead, define a target variable: GBPEUR 3 minute returns. Now define an indicator: whether a 50 Cent Instagram post had just recently occurred.

This is a much cleaner optimization. It's amenable to standard statical learning methods, e.g. linear regression, as well as measures of goodness of fit, e.g. t-stats and cross-validation. In contrast jumping directly into selecting trading parameters pretty much only works with grid selection. And goodness-of-fit is a nightmare. Once you've defined your signal, then you load a monetization scheme on top of it. You might say, trade every time your threshold crosses a certain boundary. Maybe you want to add a stop-loss for risk mitigation.

But you need to be clear, the point of monetization isn't to generate it's own edge. It's to harvest the alpha that you've discovered. Stop-losses should simply be a risk management tool. If your strategy's unprofitable unless your stop-losses/take-profits are set at the right level, that's a bad sign. You're trying to generate edge inside monetization, and that's considered harmful.

The other advantage of monetization/alpha separation is that makes it performance attribution much easier. You can evaluate the out-sample performance of your signal as orthogonal to your trading PnL. For example maybe GBPEUR has been declining strongly on every 50 Cent post, but your strategy waits for an exit signal in the form of a Kim Kardashian Snapchat. And in between the signals market has tended to trend against your position. That tells you something fundamentally different than the original signal being broken.
Docker
Posted by EspressoLover on 2016-05-16 20:39
Hitler's Opinion of Docker
VIRTU
Posted by EspressoLover on 2016-06-03 12:44
That's not that unusual. Most of the top-tier shops run 20+ Sharpes on their core HFT strategies.

It's just law of large numbers. Some back of the envelope calculations: Assume your edge is $.002/share traded. Average share price about $40. Let's say average holding time is 20 minutes. That's 39 trades a per unit of inventory per day. The average return per unit of notional inventory is 16 basis points a day.

The typical stock has about 1.5% daily vol. Most likely you're not holding overnight, so knock that down to 1%/day. That's an annualized Sharpe, per stock of 2.5. Now scale that by 400 liquid names. Across stocks your portfolio's going to be nearly uncorrelated. Yes there will be some residual beta exposure, but that's easy enough to hedge.

That's a portfolio Sharpe of 50. On a daily level, a losing day is a 3-sigma event. 1 day in 3 years with a normal distribution. Which isn't too far off from Virtu. The lesson here is achieving that high risk-adjusted returns requires extreme diversification. Both "horizontal diversification" in the sense of trading as many as names as possible. (Combined with aggressively hedging any common factor exposures). As well as "vertical diversification", i.e. having a trading signal that flips frequently. Which allows you to monetize more trades for the same notional inventory exposure.
VIRTU
Posted by EspressoLover on 2016-06-03 18:18
@Phantom309

"If we want to guess how Virtu probably makes their money, I would start with (1) payment for order flow and (2) maker-taker rebates."

There are a number of HFT-futures operations that basically never have down days. Yet in that space neither payment for order flow nor maker-taker applies. In the E-minis alone, double digits shares are not uncommon (and that's just on a single instrument). Horizontally scale that strategy across 15-20 major contracts and that's a 40+ Sharpe. In equity space you can diversify across even more names.

"For example, 25% of HFTs have a Sharpe ratio greater than 9.10, and 10% of HFTs have a Sharpe ratio greater than 12.68" http://www.cftc.gov/idc/groups/public/@economicanalysis/documents/file/oce_riskandreturn0414.pdf

@katastrofa

To clarify, I didn't mean 16 bps spread capture. My example was 16 bps per inventory unit per day. My assumption was $0.002/share spread capture. At $40 a share price, that's *0.5 bps* spread capture. 16 bps/day comes from having ~40x daily turnover.
VIX jump on no S&P decline
Posted by EspressoLover on 2016-06-13 22:00
Anyone have any color on why VIX is acting so fucky as of Thursday (June 9)? Up 33% in two days with SPY down less than 2%. Biggest jump in history on such a small in decline. Closest thing seems to be Gulf War I and LTCM. But there's no real news. Fed Futures are only pricing a 2% chance of rate hike on Wednesday. Brexit's still priced at 25% odds in betting markets. Recent historical vol only running at 9-10%, so options at VIX 20 have crazy expensive decay.

I'm guessing some big vol-arb desk or quant fund is liquidating. Or could just be day traders chasing momentum in the vol ETFs...
VIX jump on no S&P decline
Posted by EspressoLover on 2016-06-14 08:36
@chiral3

Thanks for the pointer. The static VIX options basket as of June 1 does seem to be only up 19% (vs 47% on the index). So there definitely might be something to the strike constituent re-balancing. Then again there's been two weeks of fairly heavy gamma decay on those maturities, so not sure if the discrepancy is that far out of the ordinary. (Unfortunately I've never tried this exercise in "normal markets", and don't have the historical data). If you get the chance to peruse your notes, I'd appreciate any additional color you could share.

@goldorak

The back-end of the curve has a <<1.0 beta to the front-end. If one's hedging properly then you'd expect to hold 2-3X back end contracts for every front. (Not that this is done in practice). VXZ's been up about 11% since last Friday, and VXX has been up 27%. Assuming the trade was done right, it hasn't hurt so much. The real pain has been those short vol, who've hedged with equities.

Even more so because the 30-day VX futs have moved in closer than historical lockstep with the index. When VIX goes off the reservation, the futures tend to be pretty dampened. (The benefit of an instrument with an actual tradable price.) That's what makes me suspect this is driven by the ETFs (nearly 100% of the volume of which is in the 30-day futs).

@martinghoul/radikal

Hmmm. I probably am underestimating Brexit factor here... Occam's razor and whatnot.
VIX jump on no S&P decline
Posted by EspressoLover on 2016-06-15 22:57
USD and GBP denominated betting markets quoting nearly exactly the same odds. Seems like simple arbitrage. Bet on stay at GBP bookies, leave at USD bookies. Either way you get paid out in the more valuable currency.

Edit addendum:
- USD odds at Betfair
- GBP bookie odds
VIX jump on no S&P decline
Posted by EspressoLover on 2016-06-15 23:25
@theDude

Looking at the new VIX settlement series for this week, and put strike expansion does seem to account for a lot of the discrepancy. The stock positioning argument makes sense. {Jan, Feb} 2016 had really low VIX volatility relative to SPX volatility at the time. Can't see really what else would have drove that besides positioning.

@chiral

Thanks for the info! Very interesting. The only thing that throws me from the Aug [19-28] comparison is that the VX futs curve became heavily backwardated. Today the futures are still trading at a premium to spot. If distortions in the index calculation are the cause, you'd expect them to dampen out in 30+ days. Normally the futs reflect this expectation.

But it could just be that the market's filled to the brim with particularly dumb money from all the Brexit hedgers. At any rate seems like a pretty crappy hedge. Since Brexit odds started going up, SPX has only fallen half of FTSE. Hedgers are paying astronomical IVs on SPX puts. And the worst part is being hedged requires twice as many.
IOC and FOK
Posted by EspressoLover on 2016-06-17 01:09
Don't think that data's available anywhere major. But even if it is, I'm not sure how useful it is. A lot of HFTs spam IOCs that have little to no chance of getting filled. I'd imagine that the ratio of cancelled IOCs to filled IOCs is pretty close to the maximum allowable limit most days.
Fund Check Carlisle Management
Posted by EspressoLover on 2016-06-18 21:52
> At some point your view has to be that you either are better than lifecos at forecasting mortality (you're not)

Is that really the case? It doesn't strike me as the physicals for life insurance being that extensive. What if you just sequence people's genomes at 23AndMe? As far as I know the insurers can't even legally do that. An eviler man than I might find a pool of healthy-looking 50-somethings who are likely to develop some early-onset inherited disease. At the very least the APOE allele accounts for ~15% of mortality variance.

Throw in a DEXA body-comp scan and the sitting test, and you're miles ahead of the standard height/weight/blood-pressure mortality tables.
VIX jump on no S&P decline
Posted by EspressoLover on 2016-07-26 12:03
VIX/S&P relative value works very well. The paper's implementation is much simpler than what's done in practice. There's room for major enhancements. If anything performance has only improved since the first papers were published.

Also tends to be pretty robust across a wide variety of market regimes. Biggest downside is that turnover's fairly high, and you're limited by the liquidity in the VIX products. It would probably be challenging to scale over O(100 mn) AUM.

The vast majority of traders in VIX products are using the instruments as a proxy for a leveraged bet on the market. UVXY is basically like 8X inverse S&P. Your typical leverage-constrained gambling-addicted day-trader is insensitive to 50 bps of roll yield in the VX futs. With all that dumb money slushing around, prices do tend to get seriously out-of-whack.
VIX jump on no S&P decline
Posted by EspressoLover on 2016-07-26 13:20
Without spilling too much secret sauce, some important points of consideration:

1) VIX is mean-reverting, while S&P is not. Also the VIX index is subject to all kinds of mechanical idiosyncrasies. Sometimes the contango between the index and futs represents genuine risk premium, sometimes it represents reasonable expectations about how the index will drift or reset. Not all contango's made equally.

2) Getting the VIX/S&P hedge correct is tricky. The hedge ratio is pretty dynamic, and can change fairly substantially on a daily or even intraday horizon. Treating this problem like fitting any old equity beta is mediocre. Doing it right requires a fair bit of statistical voodoo. Because VIX is so volatile, being wrong with the right hedge ratio can oftentimes be better than being right with the wrong ratio.

3) Like the paper says, it is the case that only the front futs are liquid enough for a large portfolio. But there still is an actively traded term structure, which contains pretty relevant information. Don't ignore the dynamics in the back-months or the shape of the overall term structure.
Docker
Posted by EspressoLover on 2016-09-22 23:39
> That way my "installation handbook" of the server was cut from 30 to 10 pages. Moving the production to another environment should be possible in about a day now.

Good to hear. If you're happy with the results, I'd recommend considering an IaC, like Chef or Ansible, as the next step. You'll be able to ditch the installation manual completely, hit one button, and spin up a production server in ten minutes.
Deutsche Bank =Lehman Brothers?
Posted by EspressoLover on 2016-10-01 19:07
> Sadly that disregards banking is about confidence and if you cannot robustly address your investors' concerns it's academic and you'll cease being a viable entity.

I definitely have no expertise in this area, so take what I say with a grain of salt. But my understanding is that the sizable majority of DB's funding is secured credit. The issue with Lehman wasn't that people lost faith in the entity, but they lost faith in the collateral. That's not an issue with DB, as the collateral's mostly quality sovereign bonds, not CMOs.

On the political side, I think the DOJ's just going to arbitrarily adjust the size of the fine until DB's spread is under distress levels. No way that Obama wants to bookend his presidency with another financial crisis. Eighth-year Obama has seen some shit, and is definitely way less anti-establishment than when he started. There's an instructive analogy here with his veto of the Saudi sovereign immunity bill. The administration's much more interested in preserving stability, rather than uncompromising punitive justice.
Deutsche Bank =Lehman Brothers?
Posted by EspressoLover on 2016-10-02 22:18
@Kitno

Thanks for the interesting points. I've bumped up my priors on a DB unwind.

Maybe the trade here is to sell short-dated DB vol and offset with the long-dated. Implied's are 10-15% higher for October than January. It seems like the adults in the room are in agreement that DB isn't collapsing next month. Demand for near-term protection seems mostly driven by dumb-money and prudence-signaling. If we're on the path to the end, we'll see spreads and vol widening before jump-to-default. Vega should outperform gamma until then. Of course, if it doesn't collapse, you just harvest a nice carry.
You Know Who - Renaissance Watch
Posted by EspressoLover on 2016-11-22 00:58
"Eventually the scientists went so far as to develop an in-house programming language for their models rather than settle for a numbercentric option such as ASCII, which was popular at the time. "

...Hammertime
You Know Who - Renaissance Watch
Posted by EspressoLover on 2016-12-08 21:23
From their latest ADV, they've appeared to have shut down RIFF. There's also a new fund RIDGE, that appears to basically just be RIDA without the futures.

https://www.adviserinfo.sec.gov/IAPD/Content/Common/crd_iapd_Brochure.aspx?BRCHR_VRSN_ID=380746
Quantopian – why I don’t take part
Posted by EspressoLover on 2017-01-13 08:02
> should I spill my guts out and give away my code to Kaggle (who will probably award the win to someone who data mined their loss function), or ask for an angel investment where I get to keep real profits and build a business that means something?

Never underestimate the extreme lengths that modern-day 25-year old geeks will go to avoid making a phone call or putting on a suit.

Estimating price impact in the opening auction
Posted by EspressoLover on 2017-02-08 20:40
Why not do A/B testing? Randomly pick some small percent of symbol-days where trading is shut off. Then compare the regressions between the on-group and off-group. That removes the problem of distinguishing between your alpha and your impact.
Virtu
Posted by EspressoLover on 2017-03-13 10:45
> The economics of these trades are obvious, but the real question is why one firm dominates them while others are doing more predictive modeling, longer holding period, slightly lower Sharpe strategies

What makes you think Virtu dominates? You don't think Citadel, KCG, Jump, etc. have O(100mn) strategies with O(0) down days?

I suppose Virtu's relatively unique in that they don't even seem to try to do the quant-y stuff. Maybe my understanding of the firm's off, but they only care about doing purely mechanical strategies. Rejecting the quant stuff seems more about business focus, rather than some sort of secret sauce. Chasing quants can get really expensive really quick, and that style tends to go through more hit-or-miss periods.

Plus Virtu's reputed corporate culture isn't well suited for high-level alpha work. You can slave drive people 16 hours a day, and the market data parsers will still get written. But burning people out doesn't lend itself to deeper creative work. The best quant firms tend to be like RennTech, more university department than investment bank.
Virtu
Posted by EspressoLover on 2017-03-14 08:28
> Everyone is at a LOT of places, but Virtu has been the most aggressive.

Which brings up my biggest quandary about Virtu. Why the hell don't they run an internalization pool? (At least as far as I can tell they don't, and if they do it's either not under their name or not big enough to be in league tables). Given that they have such high market share in equities, wouldn't it make sense to wholesale retail order flow? Internalization is already barely any different than market making on lit venues. The major barriers are institutional clout and infrastructure, which Virtu has in spades.

The only explanation I can think of is that they can't match SIG, Two Sigma, ATD, et al's payment-for-order flow rates. Without alphas they can't price improve enough to win any contracts from major brokers. Still you'd think they'd be able to grab at least some of the retail wholesale market. It'd be worth it to make some push. Dark flow is just so much more profitable than lit flow.
Monetizing order flow in futures
Posted by EspressoLover on 2017-05-25 16:51
The topic is whether captive order flow has any value in a space like futures. I.e. a market you can't internalize and have to trade on a single centralized exchange. My knee jerk reaction: of course not! Payment for order flow only works, because you can preferentially cross incoming flow in your dark pool before routing to exchanges. The value comes from the priority your liquidity gets against uninformed flow. Even in a space like options, without dark trading but with multiple exchanges, you can kind of hack this by routing to the venue you're quoting at.

But futures, this doesn't work. Everyone's in the same order book. Even if you're sourcing the order flow, your liquidity doesn't enjoy any special privilege. There's no leg up. The fact that there's no major payment for order flow operations (AFAIK) in futures space seems to confirm this world view.

However, I don't know if this is exactly true. I'm not sure if it's legal or practical, but I do think you can monetize captive uninformed flow, even if you have to trade at a centralized venue. Let's say some instrument is quoting 10.50x10.53, and a retail market sell order comes in. What's to stop the broker from submitting a bid at 10.51 immediately before? It still captures $0.005 in spread against the mid.

At first glance this seems like it should be prohibited as front-running. I don't know the letter of the law, however it's not against the spirit of the law. Front-running involves looking at a client order and trading ahead of it in the same direction. That's bad because it worsens the quality of the liquidity the client accesses (worse fill prices, smaller size, and larger market impact). This has the exact opposite impact. The client is getting better execution. In the example she sells at the higher price of 10.51 instead of 10.50. The logic isn't any different from internalization, except the cross occurs inside the venue instead of outside.

Even if it's theoretically possible, there's the question of whether it actually would generate any real money. Exchange fees still apply. You have to improve by at least a tick (unlike the minuscule improvements done by equity internalizers). Also most major futures trade at tick-tight spreads, where that kind of price improvement isn't possible. Still, even if you can only join, there's some value in knowing the queue is about to get smaller due to uninformed flow. Probably not a huge amount given retail order sizes relative to queue sizes, so maybe it's really not worth it.

But in instruments with non-FIFO matching, there may be bigger advantages. Particularly pro-rata or size priority. For example you can flash a very large quote for the tiny interval you know the client order will be arriving. By dominating the quote size, you'll match against almost the entire incoming order. If you queue up [Quote]->[Client Order]->[Cancel] on the NIC, another ATS won't have time to process and lift your quote.

Anyone ever hear of something like this before?
Monetizing order flow in futures
Posted by EspressoLover on 2017-05-26 20:31
@dVega

Yeah, I neglected to talk about informed flow. I agree there's value to segmenting informed and uninformed flow. Trade with the former and against the latter. The way I look at it though, in the long run captive flow is pretty much just uninformed flow. Traders who are sophisticated enough to to be informed, are also usually sophisticated enough to do TCA. Given enough time follow-on trading will show up as high market impact and crappy execution. Then they'll drop you.

Is that true everywhere and always? Of course not. But from the marketing side, keeping informed flow captive is a very hard proposition. Somewhat anecdotally confirming this, the sizable majority of payment-for-order-flow business comes from retail traders, who are pretty much the archetype of dumb flow.

@HitmanH

Good point, about OTC trades. I'm not really familiar with the less liquid futures, or how OTC works with exchange products. But my general understanding is that there's a whole bunch of rules and restrictions, that the facilities mostly exist for large block trades, and the ability to do OTC entirely exists at the pleasure of the exchange. I'd imagine if you started trying to divert major natural volume off-venue, the exchange would pretty much shut you down. Again, this is just a vague understanding.

In FIFO matching, it seems like the sure shot is getting between spread. However even in tight spreads, you do still get the option of being the first to replenish the queue. E.g. if bid is 150 contracts, and retail order hits 10, you're going to know about this before anyone else gets it on the data feed. Essentially you have the first option to refill queue by adding 10 back to bid. Or maybe add 2, expecting others will refill ~8 behind you. Legally the advantage here is you don't even have to trade in front of the customer order. But I'm pretty skeptical that this would be valuable enough to cover the cost of paying the brokerage.

Definitely agree with you regarding retail flow characteristics.

@ronin

I see where you're coming from, but disagree. (i) and (ii) are different due to differing information content of the matching trade. Assume that the captive flow is characteristic of retail order flow: small, uninformed, uncorrelated orders. That means the 1) market impact of incoming orders is near-zero, 2) the direction of orders in the captive flow does not predict subsequent market movement, and 3) each incoming order is 50/50 long or short regardless of the prior orders.

I can cross an incoming order at 10.51, and the bid moves back to 10.50. But all I have to do is wait for an offsetting sell order. At that point I can jump in front of the 10.53 ask by quoting at 10.52, closing out the position and netting $0.01. In expectation, I will only have to route a single incoming sell order to market before getting a buy that I can cross to close. 1 small sell order will have minimal price impact, and is very unlikely to move the market.

Yes, the non-captive flow could move the market against my position. But uninformed flow is by definition unpredictive of the behavior of other market participants. So the market's as likely to move with me as against. It adds risk, but in expectation it doesn't lower profits.

Price improving against general market flow isn't the same story. My counter-party could be anyone. It could be someone working a huge position, a system with a directional alpha, or a massive market order that wipes out the next level as well. All problems basically precluded in the other scenario. Either way conditional on being filled there's now some expectation that the market moves against the open position. That means this scenario shouldn't have lower expected PnL as the captive flow scenario. Overall one should be able to quote tighter with pure uninformed flow, and still make money.
Monetizing order flow in futures
Posted by EspressoLover on 2017-05-30 18:50
@ronin

Totally agree with your caveats. Very important points to consider. The fact that no one seems to pays for order flow (AFAIK), does seem to imply that these issues break the model.
Pairs Trading - Selection Methodology?
Posted by EspressoLover on 2017-06-09 00:06
+1 @nonius

The problem doesn't stem from selecting the right pairs. The problem stems from pairs-trading being obsolete in DM equities at anything longer than an HFT horizon. To the extent that it used to work, it did so because stat-arb traders were significantly less sophisticated.

If Ford moves, low-latency algos will slam GM in about 1 millisecond. And anyone holding longer-term risk isn't trading GM against Ford, they're trading GM against S&P 500, the industrials sector, the auto industry, the US dollar and a whole galaxy of multi-colored exotic style factors. Plus they're conditioning on news, earnings, analyst activity, order flow, and microstructure.

It doesn't matter if you can pick the optimally best twin every single time. A single co-pairing will always have way less information. It's like you're trying to win the Tour de France by making the most nutritious trail mix, when all your competitors are on HGH and EPO.
Trading strategy questions
Posted by EspressoLover on 2017-06-14 21:53
#2

The way to think about this problem is with Kelly Sizing

At least if the following conditions hold: 1) You only care about the long-run growth, and expect the process to continue for a long enough time to sufficiently approximate "long-run". Easier said than done. Psychologically people really have a tough time pushing through drawdowns, even if they know it's mathematically optimal. 2) Your utility function for wealth is logarithmic. 3) The process continuously compounds. Or you can rebalance frequently enough that it's effectively continuous. Daily should meet the criteria for pretty much all financial products. 4) You don't have any limits or costs to leverage, or at least none significant enough to meaningfully affect your strategy.

In which case Kelly says the targeted volatility should be [Sharpe]. So if you're expected annualized Sharpe is 1.0, your targeted annualized volatility should be 100%. Or if you're rebalancing daily, the targeted daily vol should be 6.25%. If the unlevered strategy produces 2% daily volatility, you'd want to lever up the positions by 213% to reach target vol. (And rebalance daily to keep the leverage constant).

However like you say, forward expected performance is likely to be lower relative to your previous estimates. Overestimating Kelly sizing also has an asymmetrical impact on performance relative to underestimating. So you have to decide on some fraction to shrink your Sharpe by. There's not really an easy answer here. It depends on how confident you are in the statistics. However shrinking the Sharpe ratio by 50% seems to be a decent rule of thumb though. (Equivalent to applying a fraction of 0.5 to the computed Kelly size). For a 1.0 Sharpe strategy, you'd treat it like a 0.5 Sharpe. In which case Kelly would dictate targeting 50% annualized volatility or 3% daily vol. Then levering or allocate accordingly.

#3

I'm a big fan of the Copernican Principle. There's some process generating abnormal returns, and it will live for a finite amount of time. You're probably not a "special observer", so most likely your discovery of the process represents a random sample of time from its life. Unless you have some other priors on the life expectancy of the process, Gott would dictate that the median expected remaining life would be the same as the current age of the process.

So if your backtests indicate that this strategy has been profitable for the past 5 years, you'd expect it to last for another 5. With 50% confidence you'd expect it to survive between 15 months to 15 years. However when the process dies, you will only be informed through a noisy channel, and it will take some amount of time to decide with statistical confidence that the gravy train is over.

If you're a Bayesian, than the longer the process has lived the longer a stream of bad returns it will take before declaring it dead. This represents a "cost" in that at some point in the future you will eat volatility without excess return while trading a dead process. This cost can be quantified using Kelly. For example if you expect 20% of the time you spend trading this strategy to be "dead", you need to shrink the expected Sharpe ratio appropriately. That costs you 36% of the long-run expected return due to smaller Kelly size.

You have to decide the optimal p-value which trades off missed profit from type 1 errors with Kelly costs from type 2 errors. I'd say once you decided on the procedure, pre-commit to it. Avoid the temptation to ignore your instruments and fly by the seat of your pants. Yes, human wisdom may give you insight into what's going on. But it's usually outweighed by emotional hysteria.
Trading strategy questions
Posted by EspressoLover on 2017-06-15 22:07
@nerotulip

Yes, you are correct. Should have written that targeted return is [Sharpe]^2. Which, of course implies that targeted vol is [Sharpe]. Thanks for pointing this out!

@energetic

With regard to marketability, have you considered just beta neutralizing? It's at least worth trying in backtest.

There's two approaches, you could either hedge the asset with some S&P 500 instrument on days when you're long. OR you could just permanently hold a short position equal to [Asset Beta]*[Mean Days long]. Either way this should push the long-run beta to 0.

However the relative performance between the two hedging schemes does tells you something interesting. The former should reduce the volatility more than the latter. All things equal that should lead to better Sharpe performance. However if the latter performs significantly better, then that tells you a component of your strategy's edge is coming from timing the general market.


@energetic [w.r.t Sharpe timeframe]

Just wanted to clear up something you alluded to. Especially because my previous errata (corrected by NeroTulip) may have caused this misconception.

Kelly leverage is actually invariant to horizon. Targeted return is [Sharpe]^2. Leverage should equal [Sharpe]^2 / [Expected returns]. Sharpe scales O(sqrt(horizon)) and returns scale O(horizon). A change in horizon cancels out in the numerator and denominator. (Obviously the same reasoning holds true for fractional Kelly as well.)

As an example, let's say it's some strategy with unlevered annual returns of 20% and annualized Sharpe of 1.0. Kelly would say target 100% annualized returns, and lever at 5.0 ratio. Now say you use 2.5 year horizon instead. Unleveled returns at this horizon are 50%. Unleveled Sharpe at 2.5 years horizon is sqrt(2.5). Kelly would say target returns of (sqrt(2.5))^2 = 250%. Kelly leverage would still be 5.0, same as the annualized solution.
Trading strategy questions
Posted by EspressoLover on 2017-06-20 15:53
@energetic

You could try cross-validation to quantify the impact of data-mining. First decide on some fully automated parameter selection method. Then divide the entire sample into bins. I'd suggest months, but years probably works fine.

For each bin, fit the parameters for everything *but* the dates in the bin. Then backtest inside the bin using those parameters. You can then stitch together the months to get an out-sample backtest for the entire series. One more step is to fit the parameter set using the entire series, then backtest the entire series to get an in-sample backtest. The difference between the former and the latter estimates the overfitting bias introduced by parameter selection. Because you're comparing over the same time period, there's less noise then comparing in-sample/out-sample on two different time periods, where the regime may also differ.

Finally let me just add the caveat that this only accounts for overfitting bias from parameter selection. It's important to be aware that there's some overfitting due to selection of the model itself. Presumably before you even started you were exploring and thinking about an array of strategies. We probably wouldn't be here talking about some strategy with Sharpe 0.1. Unfortunately gauging how much overfitting this process incurred is not so easy.

@energetic (2/2)

It may be worth re-trying conviction sizing once beta-neutralized. It may have not looked great before because of the uncompensated volatility from the SPY exposure. It may also be worth considering going short, in addition to flat and long. With beta-neutralization the cost of shorting should be significantly less.
Portfolio Construction
Posted by EspressoLover on 2017-09-20 21:03
I agree with ronin, if you're talking about dynamic sophisticated trading strategies, CoVar is likely to be highly regime-dependent. Not because of any fundamental linkage, but because investors tend to treat all dynamic trading strategies the same during risk-off periods. For example non-agency MBS and equity stat-arb had zero correlation until August 2007, at which point they suddenly had near perfect correlation. The same multistrat redemptions that hit one spread contagion to the other. The Moskowitz paper on the Carry Factor is another example. Carry strategies across asset classes tend to have low correlation, except when the Global Carry Factor is in a steep drawdown, at which point coordinated unwinds across asset classes change the regime.

Doesn't always mean correlations go to 1, sometimes they're even inverted. Short-term liquid strategies, like HFT often do better during risk-off periods. The impact of distressed unwinds is small relative to the enhanced opportunities from market dislocations. Just taking a return matrix as a black-box doesn't really work. Even if you had perfect knowledge about future returns, that isn't enough. If you can quickly go to cash on button press when shit hits the fan that's worth a lot. Also depends on the tolerance of the invested capital. If things get really dislocated, will your investors tolerate the losses to stay around for the reversion?
buy on rumor sell on news
Posted by EspressoLover on 2017-12-22 08:30
I'm definitely not a bull, but to play devils advocate: earnings. If analysts are anywhere close to the mark, 2018 should be a gangbuster year for corporate earnings. GDP growth is consistently clocking above 3%, consumer confidence is high, homebuilders are on a tear, and labor costs are still cheap. There's a really well-timed synchronized business cycle expansion across all the major global economies. US corporates now make something like 40% of their profits overseas, so they're benefiting from high EM growth rates. Domestically, companies keep acquiring more monopolistic pricing power and market consolidation, because apparently antitrust law has become a hypothetical concept.

I mean even given all of that, equities still scare the hell out of me. At current valuations everything has to keep firing on all cylinders. Growth slows down even in just one major economy, then companies will start missing their quarterly targets. Inflation picks up: rates rise and P/Es must contract from their sky high levels. Oil prices collapse and there's real risk of contagion from the energy sector. Chinese hard-landing: duration loses its bid, rates rise and P/Es contract. Real estate contraction: consumer confidence collapses because rising home values are the only thing making up for stagnant wages.

That's not even getting into all the macro headline risk below the surface. We're only four weeks away from a government shutdown. The North Korea situation hasn't been solved at all. Saudi Arabia and Iran are a hair away from turning the Persian Gulf into scud missile volleyball court. ISIS terrorists are apparently morons, but if they keep making attacks every couple weeks, given enough tries they'll eventually pull off a 9/11-esque massacre. The US and Russia have basically restarted the Cold War. Anti-EU populists are still improving their polling in nearly every Eurozone country. And there's a very real possibility of a constitutional crisis in the US within the next 12 month.

Plus how much of the inflow to US equities is hot-money performance chasers. Seems like if equities correct even a little, we'll see a big unwind. But still if things don't go bad, 2018 could possibly be as good as 2017 for US equities. So, I say the market's basically short gamma. No news seems like the best news for bulls.
buy on rumor sell on news
Posted by EspressoLover on 2018-01-08 11:59
> For instance, simply beautiful soup x SPX x reuters x general inquirer gives rather good T+10 signals 

Very interesting... Wonder if this would work with idio returns on single names. T+10's a really nice sweet spot for this, long enough where T-Costs aren't a major issue but short enough that Sharpes still look good with decent R-squared. Might be able to get really excellent risk-adjusted returns if you could do the same with 100+ orthogonal alphas.

Looked at something similar years ago, but didn't really find much. (Different horizon, library and source though.) Then again this current market's in a regime all of its own. So maybe this might have some juice.
Correction
Posted by EspressoLover on 2018-02-06 12:27
The below is unsubstantiated hearsay from some friends in rates space, so take with a grain of salt...

Japanese pensions and insurers have been loading up on duration to reach for extra yield. Up until last year, this mostly worked, because rates were going to stay 0 forever. As the trade got more crowded they kept increasing exposure to maintain the same same yield enhancement. Well, obviously this has been pretty painful so far in 2018. Finally the pain of losses in treasuries broke the camels back on Friday. There's been a major long squeeze in treasuries. As positions get liquidated, the contagion has spilled over into equities. Regardless of the specific details, it seems pretty clear that this selloff definitely originated in treasuries.
XIV termination
Posted by EspressoLover on 2018-02-07 14:38
I believe the "indicative price" just comes from a published feed put out by the ETF issuer. The basic point is to allow the ETF market makers to keep track of what price the issuer will create/redeem shares at the end of the day. There's no obligation, legal or otherwise, to actually make sure that it accurately reflects the market value of the underlying. The only incentive is business: accurate indicative values = happy market makers = better liquidity = more trading = more NAV.

My guess is that once Credit Suisse decided to de-activate the listing, they no longer gave a shit about publishing the indicative value. They probably just turned off the server. It's not like they were going to be creating/redeeming shares at the end of day anyway. They had enough fires to fight that day, so why even bother?

If you want an accurate picture of the intraday NAV, you'll probably have to look at the underlying. XIV tracks the 24 hour move in the 30-day constant maturity VX futures. You can pull this data from CBOE. The weightings should be about 25% on the February contract and 75% on the Match. (It also sometimes uses swaps, but those should pretty closely track the futures).

Now that being said, I'm not sure if there was any type of intraday margin call. If that was the case then XIV portfolio management would have to deviate from their mandate.
XIV termination
Posted by EspressoLover on 2018-02-07 20:02
VXG8 Chart
VXH8 Chart
CBOE VX Margin Requirements

Friday, Feb 02: VXG8 closes at $15.625. VXH8 closes at $14.965. XIV is holding approximately 25% of NAV short VXG8, and 75% of NAV short VXH8. XIV closes near $115. CBOE's margin requirements are $6200 for VXG8 and $4000 for VXH8. (Contract multiplier is 1000X.)

Monday, Feb 05: VXG8 reaches an intraday high of $33.35. A short position entered into at Friday's close would have lost 113% of its equity. VXH8 reaches an intraday high of $29.25. Equivalently a short would have lost 95% of its equity.

Therefore, assuming XIV held the same position from Friday's close to this intraday high point, its portfolio would have lost 99.5% of its equity. It also would have recovered had it continued hold this position into Tuesday morning. That position would currently be only 37% off its Friday close.

However, a margin call seems very likely, because peak losses were far outside CBOE margin limits. The threshold for an XIV margin call would be a 63% loss from Friday's closing value. Based on this and eyeballing the charts, XIV blew through its margin all the way to its theoretical 99% loss within 20 minutes between 5 PM EST and 5:20 PM EST. (This bar incidentally contained a huge amount of VX trading - 2.4 billion in notional volume. It seems quite likely this was the place the big beast was finally felled.)

Now normally, XIV rebalance around 4 PM, as per its mandate. At that time its position would have been 42% off its Friday close. However to rebalance, XIV would have to buy cover 84% of its starting NAV. The combined market cap between XIV and the equivalent SVXY was over $3 billion prior to Monday. So that's something like $2.4 billion in VX futures, all to be done in a short window. To put how huge of a trade that is, it's over 100% of the entire VX curve's ADV over the last 3 months.

I think something like a bear raid happened. The more the VX futures rose, the more XIV would have to buy back to stay balanced. And the more they had to buy, the higher the price went. And so on, with the market ultimately knowing that once they passed 63% losses, regardless of what they did CBOE would close them out. And the market knows exactly the logic that this humongous portfolio is working under, because its all published in the prospectus. So they're front-running the hell out of them the entire time.

For the traders at XIV, its a catch-22. Trade too slow, and the longer the bear raid has to push you to the margin call point. Trade too fast, and you have huge market impact and suffer huge losses, only worsening the feedback loop. This used to be not such an issue, because UVXY/TVIX was also pretty big and traded in the opposite direction. However short-vol's performance has been so good, that XIV got huge compared to UVXY/TVIX. Lesson: The market can stay irrational longer then you can stay solvent. Especially if you're a levered ETF operating under prefixed rules.

> Have you had a chance to have look at their prospectus?

Just browsed through it. It would seem that the issues have unilateral discretion in declaring a market disruption event, in which case they can set indicative value to be whatever number they decide.

> If their prospectus allows liquidation clause to be triggered by a vol spike during EH trading

Doesn't matter what the prospectus allows. If CBOE margin calls you during EH, then you have to trade. And not in a way that's going to contain market impact.
XIV termination
Posted by EspressoLover on 2018-02-07 22:41
1. FCMs have to segregate client collateral. VXX and XIV are two separate funds, and hence two separate client accounts. Therefore CS can't pledge collateral from one fund to support the positions of the other. The only thing CS could have done would be to pledge its own proprietary funds as collateral. Edit: As NeroTulip pointed, this may not necessarily be the case.

2. Idk. As far as I can tell, the prospectus says they can announce liquidation at any time after NAV has fallen 80%. So they got margin called Monday night, liquidated out of their position, were stuck holding the bag, lost 90% of investor money. Then, probably after a lot of late-night/early morning calls with lawyers, made the official announcement on Tuesday. I don't think they were under any legal obligation to announce immediately after a margin call.

3. You're describing the roll rebalance. I'm talking about the leverage rebalance. Say you have an inverse ETF on asset A. You raise $1 million in NAV, with the mandate of targeting -1X of daily returns on A. On day 0 you go out and short $1 million, which leaves you with $2 million in cash/collateral and $1 million in equity. The next day A rises 50%. You still have $2 million in cash, but your short position is now marked at $1.5 million. You only have $500 thousand in equity. Effectively, you're now -3X levered ($1.5mn short / $500k account value). To get back to -1X target, you have to reduce your short position to $500 thousand. Which means you go out and buy back $1 million of A. Edit:/ Errata correction - thanks Energetic.

All daily inverse ETFs have to trade %2X of their underlying everyday, where %X is the daily move. So if the price rises %10 they have to buy back %20 of their opening NAV. If the price falls 50% they have to go out and short another %100. (Incidentally, my original math was under by a factor of 2.). This isn't an issue for VXX, because it's not levered, it only has to roll. But for every other levered ETF it is. It's an even bigger issue with inverse ETFs because they have to trade into extreme market moves.
XIV termination
Posted by EspressoLover on 2018-02-09 03:11
@Energetic

Yes, thanks for pointing that out.
XIV termination
Posted by EspressoLover on 2018-02-09 14:04
CBOE kind of shoots themselves in the foot by running such a shitty futures exchange. I think a lot more participants wouldn't bother with the ETF if the underlying futures were better. Trading the ETF as a proxy is a lot smoother experience than dealing with the actual VX futs. Why would I pay $0.05 tick bid/ask on a $20 contract when I can pay $0.01 and likely no exchange fees on a $50+ ETF? That's an order of magnitude higher costs.

Plus if you're doing anything co-located, not only do you have to pay for an entire presence at CFE just to trade VX (because literally nothing else important trades there), but you have to deal with CFE's garbage tech stack ("Not just 1 level of book data, but 4!"). Finally if you're doing any sort of VIX relative value, you can't cross-margin shit, because again nothing important trades at CFE.

Move the VX to the CME, drop the tick size to $0.001, cross-margin with ES, and offer zero exchange and data fees for the first 12 months. At least half the liquidity and trading activity would move out of the ETFs to the futs.
Index trading strategy
Posted by EspressoLover on 2018-03-27 12:06
You should consider trying the same approach with VSTOXX futures. I'd assume the logic could pretty much ported directly. Might get some diversification and higher Sharpe for free.

> I don't have VIX futures data pre-2004. 

Have you considered just using ATM implied vol? I.e. replace 1 month VX futures with the vol of a 1-month ATM straddle, 2 month with 2 month, and so on. Obviously the numbers don't line up perfectly. But in general if VX curve is in backwardated, then ATM vol curve is probably backwardated too.

This frees you from the constraint of needing an active vol future. All you need is a relatively liquid option surface. Not only can you backtest back to the 80s, but you can start touching all kinds of assets. Single-stocks, Nikkei, oil, treasuries, gold, ags, even vol-of-vol.
Index trading strategy
Posted by EspressoLover on 2018-03-28 12:24
Parameter (over)fitting

I think I've mentioned this to to you before, but I'd suggest using hold-one-out cross validation. Sub-divide into months, then fit your parameters using all the data *except* that month. Then use those parameters to generate signals for that month. Stitch together that signal series and you now are working with pure out-sample with regard to parameter selection. (The caveat being, that the subjective model selection is still subject to a certain degree of selection bias.)

Asynchronous Close Times

Contra the others, I don't see this being a huge issue. I guess the biggest question are you using 1-day lagged deltas on the VX curve or not? (You can still use 1-day deltas on S&P, because that's synchronized.) Like, are you looking at total contango in the curve or change in contango from yesterday. If the latter, then I'd be a lot more concerned about lookahead bias, and really try to get synchronized data. But if you're just looking at absolute levels, then the variance between 4:00 PM curves and 4:15 curves are pretty miniscule.

If anything the lookahead bias would probably make the backtests underestimate performance in this case. As a toy model, I'm assuming the rough idea is to be more long the S&P 500 when VIX curve is backwardated. The idea being that during periods of market distress, the time-varying equity risk premium is temporarily elevated. If VX becomes backwardated from 4:00->4:15, then the market probably sold off over that time. Hence if you're using 4:15 curves, your backtest is probably biasing long SPY on 4:00->4:15 selloffs and vice versa. (Whereas if you're using 1-day deltas in the curve, then this effect goes the other way, but much stronger.)

Benchmarking

I tend to agree with other posters. I think this is much more of an absolute return product, then an enhanced beta product. (Which overall is a good thing - absolute return managers get paid more). Just because you're trading the S&P 500 doesn't mean that it's the proper benchmark. Unless the correlation varies a lot year to year, long-run correlation is pretty indicative. Being +/- 100% at certain times doesn't really matter, as long as it washes out over the accounting cycle. To take this reasoning to its limit, nobody cares about the beta exposure of an HFT strategy on ES.

Risk and Tails

Despite what I said about the S&P500 not being the appropriate benchmark, the strategy would have a pretty similar risk profile to the index. From a risk perspective you're basically taking daily SPY returns and randomly flipping the signs. Now hopefully for expected returns, you're doing that in the right way. But from the perspective of higher-order moments, it'd be pretty unlikely that this would produce any serious divergence from SPY's distribution.

You're not doing anything crazy like doubling down into losses, or holding positions until they hit some preset gain. Any kind of autocorrelation in SPY daily returns is going to be pretty minimal. Skew's almost always negative in SPY, so the fact that you're short a significant percent of times should mean your strat has better skew than SPY. I'd be really hard to believe that this has any sort of dangerous hidden tail risk (or at least not any that S&P500 doesn't already suffer from).

IMO the biggest risk is if the signal is 0, either because it reverted or was never truly there to begin with. In which case you're eating SPY volatility without any tailwind. That scenario will make your long-run drawdowns significantly higher than SPY with its long-run positive return. For example going back to 1995, the biggest drawdown compounded SPY had was 50%. But if you re-normalize daily returns to zero-EV, that increases to 75%.
Index trading strategy
Posted by EspressoLover on 2018-03-31 03:54
@energetic

> I didn't understand the very last paragraph about the risk of staying in cash. Especially the last sentence.

Sorry. I think my wording was ambiguous. I said "IMO the biggest risk is if the signal is 0". What I meant was "the biggest risk is if the signal has 0 predictive value". (Not "if the signal has 0 magnitude and stays in cash".) My point was the biggest thing that would keep me up at night about this investment is whether the backtests were overfit, whether the regime permanently changed, or if the production implementation diverges from the simulation in some subtle way.

There's two ways to think about investment risk. One is in terms of higher order moments of the probability distribution of returns. I.e. volatility, skew, tails, etc. The other is in terms of how much money am I putting at risk. Things like max drawdown, shortfall risk, VaR. With the former metrics, expected returns don't matter. Whether a strategy has alpha of 0% or 50% a year, ipso facto that doesn't affect volatility or kurtosis. However it may significantly change its max drawdown. Having high returns buffers against the latter type of risk, because it acts as a tailwind against cumulative losses.

Like I mentioned before I wouldn't worry too much about the former type of risk metrics for this strategy. (At least anymore than I would worry about those things for my S&P 500 investment.) But I would try to get a very strong handle on the model's generalization error, have tight pre-defined criteria for rejecting the null hypothesis, carefully reconcile live trading against simulations, and understanding where exactly my live results fall in the predicted distribution.

The strategy in backtest appears to have very good risk metrics: no down years, half the drawdown of SPY, drawdown only equal to one year's average return. But that's largely an artifact of very high returns relative to the underlying. If in live trading, the returns fall short of backtest expectations, the above risk metrics will deteriorate significantly.

In short, my opinion is the biggest risk is not delivering the forecasted returns.

> But local correlation could be very high b/c the strategy sometimes stays long for weeks. Could it be a concern for someone who needs to report monthly (or even quarterly)?

Depends on the investor. I'd maybe take a look at the distribution of month-to-month, quarter-to-quarter, and year-to-year betas. Just to quantify the magnitude of the effect. Different investors have different mandates regarding market neutrality, and how that's defined. Staying long (or short) for too long may get you re-classified from Equity Market Neutral to Systematic Macro. (Which is a worse space to raise money in if you're not already a big name.)

I'd focus on making the system as strong as you can, shop it around at places like Millennium, then see what kind of feedback you get. If push comes to shove, and you're getting too much grief for being long-beta, you can always make tweaks to the program to keep it neutral over more granular time periods. For example you could bias in the opposite direction of the mean beta from the past 90 days.

However, my intuition would be that this type of adjustment doesn't affect performance significantly. It seems like the essence of the system is to exploit the time-varying component of the equity risk premium. If the VX curve exhibits some particular shape over a very long period, I'd suspect that's more reflective of persistent vol-specific dynamics. Whereas a temporarily dislocated VX curve is more indicative of market wide stress. It's nothing more than a wild-ass guess, but I'd bet that the performance heavily concentrates around the points when the signal flips, not when the signal sits stagnant in one direction.
Index trading strategy
Posted by EspressoLover on 2018-04-12 18:08
I'm gonna interject here, and say that this discussion (IMO) is way overemphasizing relative performance to the S&P. The strategy only has a beta of 0.3, it's much much closer to an absolute return product than a smart beta product.

Of course, in some scenarios it's going to have significant underperformance relative to S&P. Simply because it's barely correlated with the index. Just because it trades the S&P doesn't mean that it's the right benchmark, anymore than one should focus on relative performance to the Nikkei, oil, bitcoin, or Seattle real estate.

Scalability

> The main issue with your strategy is that it's a daily strategy with the sharpe of a weekly/monthly strategy. This implies that your return per trade is very low, and that the strategy will not scale well.

This is a fair criticism assuming this was trading single-name equities. However, the strategy would be trading the most liquid instrument in the world. The estimated market impact cost of $100 million in ES is 1.25 bps. Let's make some rough assumptions and assume the signal flips every week on average, future returns roughly match backtest, and market impact scales with the square root of size.

That implies the strategy turns over 100x AUM per annum. At 20% CAGR, that's 20 bps returns per notional traded. The strategy could pay 5 bps per trade before returns degraded to less than 75% of zero-impact performance. That gives the fund the ability to do $1.6 billion per clip. Since each clip is 2X AUM (because it flips), that gives a capacity estimate of $800 million AUM.

Theoretical justification

> I have never found a leading indicator in options (and I have looked), and I could never find a non-hand-waving explanation for why there would be a leading indicator in options. 

I think you're framing this in terms of alpha or information. In reality, I think the results are driven by time-variation in the equity risk premium (ERP). Pretty much everyone sane accepts that ERP is real. Even totally uninformed investors can earn excess returns relative to the risk-free rate, by taking on equity beta risk.

Almost as broadly accepted, is that the expected ERP varies over time. It would be silly to expect that ERP stays static forever. Equilibrium shifts, and whatever the process that makes ERP non-zero is almost certainly going to vary over time. Academic finance has done a pretty good job of showing that the sizable majority of day-to-day market variance is driven by changes to the discount rate, not changes to expected cash flows.

The ERP isn't directly observable, but there's no reason to believe that it's adversarially hidden, like alpha. Alpha's extremely rivalrous, and is quickly consumed by the first traders to access it. But the sizable majority of investors are highly inelastic to index prices. If they weren't, market volatility would be significantly lower, since most market movements aren't driven by revisions to expected cash flows.

Consider that market-wide CAPE is a pretty good predictor of 10-year forward returns. This isn't a hard predictor to generate. There's certainly no way you could call it "alpha". Yet the market is highly inelastic to this measure, CAPE does vary substantially over time, and there's no sign of this relationship being arb'd away any time soon.

It seems pretty likely to me that the VIX curve has a pretty deep relationship with the ERP. The curve could be backwardated either because of genuine vol expectations, i.e. maybe 30-day real volatility will be higher than 120-day volatility. In which case it shouldn't have any predictive value. But nearly every well-studied term structure invalidates the expectation hypothesis. This includes VX, where future realized vols have only a very weak relationship to term structure prices. More likely the VX curve is proxying for market-wide risk aversion.

This isn't an endorsement, and I don't know if this is actually the case. But I am saying that there does seem a viable justification for why this thing might work. The one thing that does "smell" to me is that the strat makes money on its short positions (rather than just conserving capital and reducing long-run beta). I do believe that there are periods where the ERP significantly compresses to near-zero, but it's a lot more extraordinary claim to believe that ERP regularly goes negative.
Index trading strategy
Posted by EspressoLover on 2018-04-12 19:08
@tradeking

That's 100% a fair criticism of my estimate.

But I believe @energetic mentioned that the signal flips about once per week. That implies a half-life of 2-3 days. Which seems about in line with about how fast the VIX term structure changes. It would seem that an actual fund would have much longer than an hour to scale into positions.

The convention of using the data just prior to the close was to shoehorn a straight-forward interday backtest. My gut feeling is that if you tested this thing intraday, trading at the time-of-day the signal actually changes, and using reasonable VWAP assumptions, that the performance would look very similar to the interday backtest with quite a bit of capacity. But again, just an intuitive gut feeling. At the end of the day it's an empirical question.
Index trading strategy
Posted by EspressoLover on 2018-04-25 21:22
> Do you have a reference for that? I would expect it to be the complete opposite, that the majority of day-to-day market variance is driven by uncertainty in expected future cash flow.

This is basically what Shiller won the Nobel Prize for. Lecture: Using this decomposition and a vector-autoregressive model in difference form, with post World War II stock market returns, Campbell and Ammer found that excess returns innovations have a standard deviation that is two or three times greater than the standard deviation of innovations in future divi- dend growth. Aggregate stock market fluctuations have therefore been domi- nated by fluctuations in predicted future returns, not by news about future divi- dends paid to investors.13 (Figure 2 in those notes also really drives the point home)

> arbing the difference between the variance of daily price movements and variance of realized cash flow.. any movements on average resulted in a deviation from the expected value.

Well, you can exploit it but it's certainly not arbitrage. The duration of equity cash flows is somewhere between 10 and 30 years depending on the year, business cycle, methodology, etc. So you are looking at buying and holding around a decade or longer.

However you can do something kind of like this. An investor can use the CAPE valuation of the market index at the current time to scale his exposure. She can target a consistent Kelly fraction, in which case she'd lever up her equity exposure when CAPE was cheap and deleverage down when CAPE gets expensive. (And possibly if CAPE gets very extremely high like 1999, short equities). This strategy would increase historical returns against plain-old-indexing by a factor of 30-50%.

That being said the percentage of real money sophisticated enough to do this is a drop in the bucket against all the equity exposure in the world. If any phenomenon demonstrates the limits to arbitrage and EMH this would certainly be it.

#Reverse-engineering

I'm going to go out on a limb here, and say that the risk that some malicious party comes in and steals your strategy, then scales it up so much that soaks up all the alpha, is pretty low. You should be much more concerned about the risk that this thing never gets off the ground, because no one really picks up interest. I'd say that scenario is about 100 times more likely than former.

First off, if the returns are anything like historical, 90% of investors are going to more than happy to pay you 2/20 and deal with the day-to-day headaches. Second, this market is so deep and liquid that it would take an extraordinary amount of money to dissipate the alpha. Maybe someone might steal your system and throw a few million here and there, but unless they're quickly willing to scale up to billions on some system they reverse-engineered off a website, it ain't moving prices.

Furthermore, it's not like this is some super-duper secret sauce. People have been publishing on the relation between VX curve and ERP for at least several years now. I do believe that you have specific techniques and features, such that your particular implementation is superior to the published research. But don't think that AQR doesn't have a dozen quants right now working on something pretty damn similar.

Publish the damn code on Github if you need to. If $100 million AUM steals your strategy, and you get $10 million in legit investors, you're still way ahead.

Q on estimating market impact as function of strategy order sizing
Posted by EspressoLover on 2018-06-13 16:37
Never traded STIR or pro-rata, and this is pretty much just a random musing, but here's my 2 bps...

Gonna assume you have some sort of decent order book simulator that accounts for previously consumed liquidity. When you simulate with increasingly larger portfolio sizes, at what point do you observe the "elbow", where PnL starts leveling off?

That's probably not too far off from your actual scalability limits. It doesn't account for how your displayed size will affect the other participants. But pro-rata makes the game theory fairly straightforward. Assuming the other liquidity providers are about as roughly informed and rational as you, then the point where you get decreasing marginal returns from adding more liquidity to the level is also about the same point for them.

In FIFO queue this logic doesn't really hold, because every order's in a different position in the queue. But with pro-rata your behavior is mostly fungible with the other participant's behavior. (Unless there's huge discrepancies between participant's models, infrastrucutre or costs)
Optimal market making in mutiple instruments
Posted by EspressoLover on 2018-06-21 09:42
Keep in mind that there's three separate reasons to avoid risk, each with their own motivations and logic. 1) Because you're risk-averse, i.e. you want less variance in PnL. 2) Because of adverse selection, i.e. when you take on risk the market tends to move in the opposite direction. So more risk, equals less PnL. 3) Because it consumes working capital. At one point if you take on enough risk, you will not be allowed to take on more with the money on hand.

A lot of market makers run 10+ Sharpe returns. In which case, 1) is effectively irrelevant. It's also most related to what you're asking about. But many market makers simply do not care about risk management, in the sense of lowering day-to-day volatility in returns. Their concern is much more focused on scalability and capacity, so they want to do every ex-ante profitable trade possible, regardless of its marginal contribution to risk.

From the perspective of 3), things are relatively simple. You have total portfolio size/beta/notional limits. When you quote, it consumes pushes you further towards those limits. At certain times you may want to quote more than the room you have left in your portfolio limits. In which case, whether it's single or multi instrument, the logic's simple. Prioritize the quotes you expect to have the highest profitability.

For 2), in the context of your question, adverse selection can happen in two flavors. The first is instrument specific adversity. In this case the market making sizing can be considered in isolation between instruments. The second is adversity in common factors between instruments. Like maybe you're worried that all your bids might get lifted, just as the entire market is tanking.

But this isn't that much more complex than single instrument sizing. You identify the common factor. Then, just like single instrument sizing, you derive the amount you're willing to quote in the face of that adversity. That gives you a total sizing to budget across the universe. You limit your individual instrument quote sizes accordingly. Like 3), you prioritize the most ex-ante profitable opportunities first.

How much this common-factor adverse selection hurts you depends a lot on your alphas and latency. A big driver for equities market makers is price discovery on index futures. If ES ticks down, a whole bunch of bids on single-names will get swiped all at the same time. But if Alice has a microwave link to Chicago, you can likely cancel your quotes before that happens. In which case, Alice would have much less worry about market-wide adverse selection than Bob.
Optimal market making in mutiple instruments
Posted by EspressoLover on 2018-06-22 19:39
@gaj

A firm like Virtu has positive trading PnL 99% of its trading days. Virtu's biggest risk isn't daily trading volatility, it's having insufficient trading revenue to cover their fixed expenses. (Well, and also blowup risk, but more on that below.) If Virtu was offered the ability to double their trading revenue, keep their fixed expenses constant, but lose money 20% of days, they'd take it in a heartbeat.

The reason a market maker like this tries to stay flat isn't to minimize PnL volatility, it's because of adverse selection. When you're a market maker you can basically only interact with the natural incoming flow. If you're at capacity, then doubling a portfolio's position sizing also means doubling the holding times. And because you're taking the other side of the direction that people want to trade, the market tends to drift against your positions.

The more inventory you hold, the worse this hurts. At some point a marginal increase in risk costs you more in drift then you make in spread+alpha. Hence being proactive about keeping flat or near flat.

A secondary consideration is minimizing the possible damage of a blowup. You can have all the circuit breakers and risk checks you want, but the best risk management has always been a thinly capitalized LLC. If you keep positions small, then that also means you can get away with keeping the account small. If shit hits the fan, you can't lose more money then you put up. (Absent your counterparties deciding to pursue expensive and onerous litigation)
Pca-based portfolio properties and trading implications
Posted by EspressoLover on 2018-06-26 22:56
What you want is a factor model. You want to isolate out the variance into orthogonal sub-components. No matter how you approach this, you're going to divide this into common factors and single-stock idiosyncratic variance. Then use those orthogonally deconstructed returns for signal and risk models.

The PCA approach works and produces something usable. Like @NeroTulip said, the most difficult part of the problem is deciding how many N eigenvalues to retain. You can either do this through random matrix theory or cross-validation.

However PCA falls well short of the gold standard. From a Bayesian perspective you're making a few major assumptions by using PCA. One is that you know nothing about the stocks themselves or how they correlate. This throws out a lot of information. Obviously we have priors about certain stocks being more likely to cluster with other stocks. The second is that you're assuming non-sparsity. The common factors are not penalized for non-zero weights. This makes you very likely to throw out important factors like industry grouping. Another is that volatility and correlation is constant across the fitted periods, i.e. no heteroskedacity.

Rather then reinventing the wheel, I suggest you just use an off-the-shelf commercial factor model that's already ready to go. These models already incorporate market beta, country and regional factors, style factors (like momentum or value), and industry factors among other things. The work is based off decades of research in academic finance, and incorporates much more information then you'll get from a black-boxed approach like PCA. Barra is one of the major provider, and it's worth exploring their options if you have the funds. For example here's their brochure for their US equity factor model.
Pca-based portfolio properties and trading implications
Posted by EspressoLover on 2018-06-28 00:10
@Zoho

Ah, I see. If the intention is speculative research, let me suggest a more interesting approach. Maybe, first start with a Barra-like factor model. Then strip out factors to get idiosyncratic returns, and run try the PCA approach on that dataset. It would be enlightening to see, what if anything isn't already being captured by Barra. Plus, you're not wasting statistical power explaining effects that you already know exist, e.g. market beta, HML, SMB, dollar exposure, etc.

I'd also suggest trying some of the various flavors of sparse PCA. The problem with regular PCA is that you're biased towards market-spanning factors. Because of the inherent noise in 1000+ columns, you rarely can use anymore than a handful of recovered factors. But with sparse PCA, by limiting to smaller subsets, you're more likely to recover a larger number of stable factors.

For example, oil exposure is a good example. 90% of US stocks have de minims exposure to oil prices , outside the general impact on market beta. But oil drillers stocks have strong positive correlation, while airlines have strong negative correlation. It's unlikely that this would be anywhere near the top of your eigenvectors, because it's getting penalized for not explaining any variance in the majority of names. So most likely it would get overwhelmed by the noisy eigenvalues and cut off by your threshold.
Intraday mean reversion vs momentum
Posted by EspressoLover on 2018-07-02 10:44
> mean reversion. Examples: market making, pairs trading / stat arb, index arb.

At the risk of getting into a debate over semantics... I don't think these things are all necessarily inherently mean-reversion. Let's say I'm market making but only joining the side with the largest orders. Or let's say I'm trading based on a pairs signal. But I only enter the side that hasn't moved, because I think the active side is leading and the stagnant side will catch up.

My general intuition is that mean-reversion strategies tend to work because they're providing liquidity in some sense. That is they're smoothing the natural noise-driven imbalance in the order flow. Momentum strategies tend to work because they're enhancing the process of price discovery. Often because some order flow is having insufficient market impact relative to its information content.
Execution on Thick-Book/Wide-Tick Equities/ETFs
Posted by EspressoLover on 2018-07-09 23:08
Thanks, @gaj. Those are all great suggestions. Agree with all the points.

Never really found a silver bullet. There seems to be a long-tail of optimizations you can make, where you keep doubling time and effort for increasingly marginal gains. But there's not really a clear point, where you can say "okay, looks like all the low-hanging fruit is exhausted". As you said, it's also hard to disentangle execution strategy from the underlying signal. So, the "right" behavior is pretty dependent on the specific strategy and resources available.

Beyond what you mention, I've also found it worthwhile to keep trying different execution algos from different brokers. There does seem to be a pretty-wide discrepancy in quality. Another thing that helped, especially if you're not large size, is utilizing taker-maker exchanges. Either to get the rebate if you're aggressing. Or to get fast fills if you're passive, avoiding the need to manage long-lived orders.

Finally, I've found that it helps if the parent strategy gives the execution system some flexibility. It's conceptually easier to keep execution modularized, and to mandate it to complete orders in X amount of time. However those times that it has to fill at bad prices to meet it last minute obligation can contribute a seriously disproportionate amount of costs.

Again, this doesn't seem to be an easy problem. So I'd love to hear anything else you find.
Execution on Thick-Book/Wide-Tick Equities/ETFs
Posted by EspressoLover on 2018-08-14 21:55
@anonq

Thanks! Will definitely look into that.
Automated Options Market Making
Posted by EspressoLover on 2018-08-15 21:07
While this thread is bumped, I'm going to quasi-hijack for a semi-related tangent of discussion...

What's the feasibility of running an electronic option market maker like how HFT is done in equity space? The impression I get is that having souped-up vol pricing models is a lot of the secret sauce at most OMMs. But what happens if you throw out the vol models completely, and just use order book dynamics to set quotes? You keep inventory small and turnover high, focus on collecting small but consistent profits per contract, run at low latency to get ahead of the market, avoid overnight risk, and stay confined to liquid strikes on liquid names.

You're basically piggy-backing off the information content of other OMMs' vol models by swimming with the order book. Join queues when they start grow, and exit levels when they crumble. Maybe a less extreme version, would be using vanilla Black-Scholes to back out deltas, and adjust quotes when the underlying moves. But still, nothing required other than simple off-the-shelf vol models. You can easily drop that on top of an existing HFT tech stack.

I have very little experience in options space. So I'm thinking the answer is either A) yeah, tons of people already do that, or B) that clearly doesn't work for [obvious reasons]. But still from talking to OMM people I get the sense that the process is qualitatively different than HFT in equities/futures/FX. I think not every OMM could do this, because there's a lot of money in the less liquid strikes, and in taking on outsized, hard-to-manage risks from mis-priced vol surfaces.

But I would expect a fairly large segment of the OMM space would run this way, in which case that subset would be dominated by the existing HFT shops. Yet, you generally don't hear about those firms having any significant toehold in options. Maybe the answer is C) sure, you can do this, but the profit potential compared to ordinary OMM is de minims...
VIX options expirations
Posted by EspressoLover on 2018-08-28 19:15
Expiration Date:
The Expiration Date (usually a Wednesday) will be identified explicitly in the expiration date of the product. If that Wednesday or the Friday that is 30 days following that Wednesday is an Exchange holiday, the Expiration Date will be on the business day immediately preceding that Wednesday.

http://www.cboe.com/products/vix-index-volatility/vix-options-and-futures/vix-options/vix-options-specs
Speed bumps on exchanges
Posted by EspressoLover on 2018-09-01 22:22
> "Cboe wouldn’t seek protected-quote status for orders displayed on EDGA, the people said. Such a stance could help Cboe’s plan win SEC approval." 

Mea Culpa: Did not see that at the end of the article. It seems like CBOE is behaving honestly here.

> Could you elaborate on the equivalence of the delay and last look for a liquidity provider?

Yeah, echoing bullero, my understanding is that the delay only applies to liquidity taking orders. Basically, MMs get to see 4ms into the future. It's like a free option, which is basically the same thing that last-look provides.

E.g. if the index futures tick down, liquidity providers at other exchanges will try to pull their bids right away. But with a taker-only delay, the providers can keep their bids alive for 3.9ms. If the futures tick back, then keep the order alive and retain priority. If they don't, then cancel. You'll still get out faster than the market sells get through the speed bump.
Speed bumps on exchanges
Posted by EspressoLover on 2018-09-01 01:56
This is pretty much just a naked attempt to garner market share by gaming Reg NMS. A delay for liquidity providers but not takers is pretty much just last-look by another name.

The last look functionality gives the liquidity providers an implicit option. Therefore a quote with last-look is not be equivalent to a quote at a normal exchange. Market makers will tend to keep quotes alive at EDGA, after NBBO crumbles at other exchanges, because they get 4 ms to wait out price discovery. Since EDGA is <2% of ADV, the relative market share gain could be pretty substantial.

Hopefully the SEC is smart enough to see through this ruse. I'm all for venues experimenting with new market structure. But only traditional limit orders, without speed bumps or other provider privileges, should be regulatorily eligible for NBBO.

Edit: Apparently EDGA is not trying to make its speed-bumped quotes NMS protected. So, this does seem like an honest attempt to experiment, rather than game NBBO.
Automated Options Market Making
Posted by EspressoLover on 2018-12-17 23:32
Not an options guy, but my understanding is that most option exchanges support Mass Cancel functionality. That skews the latency game in favor of the market maker. The aggressor has to get dozens of messages into the order gateway before the OMM lands one.

Plus exchanges tend to have credit/risk checks inside the order pathway, which among other things cancel open all open orders when a limit's breached. These systems are usually pretty tightly integrated inside the core exchange systems. Even if the cancel dispatch is asynchronous with the matching engine, I doubt the order gateway will process that many incoming messages before the risk controls do their job.

It's kind of unfair to liquidity takers, and somewhat analogous of last look. But from the exchange's perspective they'd much rather bias in favor of OMMs not blowing up beyond their margin.
How do I set up an HFT shop?
Posted by EspressoLover on 2018-12-25 19:00
I have a pre-existing software stack that my checks all your boxes. It's set up for market data and order entry for a a number of DMA protocols, and shouldn't be hard to port to any major exchange.

I'm just a prop trader, not a tech vendor, so it won't be a polished product. But on the the flip-side there's a lot more flexibility to work out a licensing arrangement. Depending on what you're looking at, I might even be able to sub-lease alphas or other strategy components. Anyway, drop me a line if you're interested.
How do I set up an HFT shop?
Posted by EspressoLover on 2019-01-06 16:25
@strange

Hmmm, must have given the wrong impression in a previous post. I have pretty limited direct experience with options, definitely nothing in a DMA/HFT context. Apologize for the confusion.
How do I set up an HFT shop?
Posted by EspressoLover on 2019-01-06 16:25
[Duplicate]
Volume participation for HFT liquidity providing
Posted by EspressoLover on 2019-01-08 18:42
Looking for general advice about capturing more volume in a liquidity providing HFT strategy. One thing I’m working on is a system doing this at a relatively niche venue. The operation is doing well in terms of performance - high sharpe, consistent daily profitability, high turnover, etc. But because the volume’s are small achieving high market share is important. Right now we’re trading about 0.5% of the ADV, and I’d like to be hitting 10% or more.

Currently using an ensemble of pretty good unconditional directional alphas. But the toxicity/adversity/impact model is basic. The only feature is queue position. Obviously that biases to small clips on new levels. Which perform great, but are a limited subset of the volume. The market’s also pretty thick-book, wide-tick, so a high proportion of executions occur when price levels stabilize.

If the adversity model relies on queue alone, you mostly miss out on these fills. You never get to the front of the queue, because you have to work your way from the back of an already long queue, that’s overly penalized. Even if you join when alpha’s in your favor, you have to rest for a while to get to the front. Over that long lifetime, alpha’s pretty much guaranteed to revert back, and the model cancels.

My kneejerk is that what’s required is a more feature-rich adversity model. Particularly some metrics indicative of a stable market, like rolling volatility, recent trade intensity, age of the orders in the level, etc. But with some (admittedly shallow) investigation, it seems like queue position has so far been overwhelmingly predictive compared to other indicators.

Maybe I’m barking up the wrong tree, and effort would be better spend on some avenue. Like just improving the alphas, or using longer-biased alphas when evaluating long queues. Or just experimenting with different quoting mechanics.

Obviously every market and strategy is different. Not expecting to hear any easy answers. But if any other NPers who’ve faced similar dilemmas can point me in the general direction of something that worked, offer advice, or even just randomly speculate, I’d be very appreciative.
Volume participation for HFT liquidity providing
Posted by EspressoLover on 2019-01-14 20:18
@gaj

> Is your alpha ever large enough to cross the spread? The easiest way to get queue priority (and increase ADV) is to trigger a new price level yourself.

Sometimes, but the spreads are very thick. Also should have mentioned the fee-structure is taker-taker, so colonizing a new level isn’t a free lunch like in maker-taker. Overall though, you’re right, this is low-hanging fruit and I should be utilizing it.

>  it sounds like you're saying that a short queue means high adversity, so you wait for the queue to be long enough before you join.

Sorry for not being clear in the original post. Definitely meant the opposite: shorter queues have much lower adversity (at least in this market). So much so, that I’m heavily biased towards only participating when touch size is small. The problem is that I’m missing out on all the volume that occurs when queues are long.

Some of that can be mitigated by getting in when queues are small, settling into a good position, and staying in as the queues grow large. But the market has pretty large spreads and doesn’t tick that often. So a high percentage of time the queues are long, and even the orders at the front joined an already large touch.

@ronin

Thanks, that’s a good perspective. You may very well might be right about the targets being unrealistic. It’s good to keep this hypothesis on the table. No point trying to squeeze more juice from a spent lemon.

But there are some reasons why I think 10% ADV is feasible. That’s basically what Virtu does as a market maker on lit US equities. It’s also about the collective market share of passive HFTs in ES on the CME. Obviously Virtu has a lot more resources, but this market’s also a lot less competitive than US equities. As of now I have a pretty substantial infrastructure and research advantage over the other participants.

To clarify, I’m not trying to build up positions with successive trades in a single direction. Just trying to opportunistically trade on fast alpha, with O(100X) daily turnover and capturing a few tenths a tick per trade. Not overly concerned with the impact from my trading activity, since the autocorrelation of the trades are effectively zero.
Volume participation for HFT liquidity providing
Posted by EspressoLover on 2019-01-14 20:17
Guys, thanks so much each one of you for the thoughtful and thought-provoking replies! This definitely helped clarify my thinking and gave me a lot of momentum this past week to tackle the problem.

@anonq

> I think what you mean by an adversity model is effectively short term alpha 

That’s a fair point. I guess I’d call “alpha” the unconditional directional forecast, whereas adverse selection is the forecast conditional on a given order being filled. (I.e. at a given time open orders all have the same alpha, but different adverse selection scores.) This is really just semantics, but wanted to clarify any ambiguity from the original post.

> unless you're extremely fast

Should have clarified. I basically have at or near the lowest latency in the market for the time being.

> to have a signal to send the order earlier, ideally as the queue is forming and so gotta be fast. 

You’re definitely right. I have the alpha and the latency, so should be leaning on this more. It’s pretty low-hanging fruit.

The challenge is that the market has very thick tick-sizes, and queue formation doesn’t occur that often. Even if you’re always ahead of the curve on queue formation, there’s still a lot of missing ADV, where from resting order entry to execution, the book never ticks.

>  So I think what you're looking for is different horizon alphas from the ultra short term to whatever horizon you're currently trading

Great point. Started mining a little bit in this direction, and it already looks promising.

@prikolno

Yeah, I think you hit the nail on the head regarding over-sensitivity. The state-space model is an interesting point. Currently I’m just training with supervised learning, which doesn’t do a good job penalizing position loss on cancels. Looking into re-fitting some of the models with reinforcement learning.

> Are you not participating inside-spread when the book dislocates?

Yes, I am participating in these conditions. And those trades tend to be consistently profitable.But this particular market has very thick tick sizes, so the book doesn’t dislocate that frequently. A high proportion of the ADV in the market comes from orders that join an already long queue on a stable book, and sit for a while before being filled. It’s on that volume, where participation is a challenge.

>  I'm assuming your strategy isn't parameterized in a way that order size ("small clips") is what's limiting the capacity… It's definitely possible to hit 10% ADV with small clips.

Thanks. It’s very helpful to hear this perspective in terms of focusing the direction of the research.
Kalman filters
Posted by EspressoLover on 2019-05-06 19:44
EMAs are recoverable basis functions for any linear model. And that extends to any continuous non-linear model, if you feed them into a universal approximation like a neural network or kernel machine.

Therefore, I'd say your default should just be OLE on a whole bunch of EMAs spanning the full range of potentially relevant time horizons. OLE is still an unbiased linear estimator in the time-series domain. It's true you need time-series specific regressions if you want to rely on the p-values for coefficient signifigance or point estimation. But nowadays you're probably cross-validating anyways, and the computation cost of regression is basically free.

IMO Kalman filters, or any other state-space model, should probably only be used if you have very strong inductive priors or a small/high-variance dataset. Otherwise just get a shit-ton of data and throw it at a regression of EMAs.
Einar Aas watch
Posted by EspressoLover on 2019-05-07 16:56
Not directly related to Einas Aas... but recently listened to this podcast from some guys who build trading models for the electricity market.

Their take is that renewables and battery storage have completely changed the nature of the market. Basically all the old rules about trading electricity have gone out the window. Everyone's pretty much feeling around blind with these new market dynamics.

Thought about this thread. Maybe Einar Aas' belly-up was because he lost his edge in a changing market.
The Wash Sale Rule is Decadent and Depraved
Posted by EspressoLover on 2019-10-16 19:12
Another unrelated pernicious affect of the wash sale rule is that it has the potential to effectively convert all your unrelated long-term gains into high-tax short-term gains. Like a virus across a tax entity's entire portfolio.

That's because for purposes of determining holding period for long vs. short-term, the period includes all chained wash sales. If the wash chain extends past 365 days, then all those losses are counted as long-term. Meaning if you have any realized long-term gains, they're offset with priority before short-term gains.

Assume that the prior strategy was run inside a much larger multi-strat fund. And say the $42.5 million capital losses on the losing symbol X is washed for over 365 days. When that capital loss is realized it counts as a long-term capital loss, whereas the $47.5 million capital gains in the strategy are counted as short-term. Say the fund had $100 million in long-term capital gains from unrelated strategies. The $42.5 million loss gets deducted from that before the short-term losses from the other leg of the strategy.

In effect the algo-strategy has effectively pumped $42.5 million from the long-term rate to the much higher long-term rate. Since the difference is about 20%, it's increased the fund's total tax liability by $8 million dollars. Which has more than offset the actual income of $5 million generated by the trading strategy.
The Wash Sale Rule is Decadent and Depraved
Posted by EspressoLover on 2019-10-16 17:49
Public Service Announcement. If you're a US tax-domiciled algo trader, then you need to be aware of the wash sale rule.

Not just in the sense of "make sure you talk to a CPA to do tax planning, so you can shave a few percent off your bill". But literally in the sense that the wash rule can theoretically generate unlimited tax liability for an algo trader. There are scenarios where you can end up owing more than 100% of your income to the IRS. Consider a thought experiment.

Alice runs a high-frequency pairs trading strategy between stock X and stock Y. Each trade averages a profit of 1 bp. X tends to lead Y, so the average per-leg profit is +5 bps for Y and -4 bps for X. The transaction costs run 0.25 bps per leg. Let's say Alice trades $100 billion notional in 2019, for a net profit of $5 million.

Now let's look at tax liability. Once taking into account trade costs, Alice will have $47.5 million realized short-term capital gains income on her trades in Y. And a $42.5 million capital loss on X. However (assuming she continues through January 2020), it's likely that almost all of the latter will be disallowed under wash sale rules. That leaves Alice with a $47.5 million taxable income. And a tax bill around $20 million. On a real income of $5 million...

Let's look at why the capital losses are actually disallowed. The wash sale rule says that every time you exit a position with a loss, you can only deduct the loss if you don't re-enter the position within 30 days. It's a well-intentioned rule designed to prevent something like Softbank harvesting its tax losses on WeWork by selling all its shares on December 31, then buying them back on January 1. However an algo trader that buys and sells a stock for 1 second, then does it anytime within the next month is equally affected. (The common sense reform would be to say the wash sale rule only applies for positions held longer than 30 days, but tax law, much like bird law, is not governed by reason.)

Needless to say, virtually all trading done by high-turnover algo strategies are wash sale eligible. What happens to the tax loss on a wash sale? It gets deducted from the cost basis of the next trade on the same symbol in per-share FIFO order. So won't it stochastically fall out in an algo trading context? Even if Alice on average loses money in X, nearly 50% will be non-wash sales. So most of the wash sale losses will end up harvested, right?

Nope. Wash sale determination is based on the tax accounting basis, not the trade prices. And losses from prior wash sales offset the cost basis on subsequent trades. Once you've accumulated enough wash sale losses, your cost basis becomes sufficiently depressed that every trade is a tax loss, even if it's individually profitable. Think of it as a biased random walk. Say Alice is profitable on 49% of her trades in X. Once her cumulative profits dip low enough, the cost basis offset becomes too large and every single trade, regardless of real profit, becomes a tax/wash loss.

But don't the losses roll off after 30 days? Not if you continue to actively trade the symbol. Say you lose $1 trading X on day 1. You have $1 in washable losses Then on day 30 you break even on a trade in X. However the $1 wash loss offsets the cost basis, so you have another $1 tax loss. Which means that the wash has been laundered into the new trade, and wash-eligibility is re-pegged at day 30. If you trade again on day 60, it refreshes yet again. A wash loss can theoretically be rolled forward forever. The only way to break the chain is to either stop trading the symbol completely for 30 days or make enough profitable trades on the symbol to cancel out your prior cumulative losses.

But won't Alice's tax bill just cancel out next year? First the IRS isn't going to sit on its hands for a year when someone owes $20 million. Second as long as Alice continues to trade the strategy, the wash sales will keep chaining forward. But let's say she does stop trading X for 30 days. What happens is the final non-washed trades will have a super depressed cost basis. So Alice will finally realize her $42.5 million capital loss on X. If that's in the same tax year, then things are fine. She can just cancel out the capital gains on her profitable trades when rolling up her income.

But if it's in the next tax year, then Alice is truly fucked. Individuals and LLCs can't carryback capital losses. So Alice is stuck with $42.5 million in capital loss carryforwards. Even assuming she continues to generate $5 million in annual trading income, it would take nearly a decade to become solvent. And if not, capital loss carryforwards only offset $3000 in ordinary income per year. It would take 14,000 years to become solvent.

Practical takeaways. Tax planning is really boring. And most of the time it's fine to let the CPAs handle it asynchronously, and not pay attention until April 15 rolls around. At worse poor planning usually just means paying 40% instead of 20%. But for algo-traders, it's very important to evaluate wash sale impact before December 31. That's because the only way to make sure that wash sale offsets don't roll over into the next year is to stop trading a symbol completely for 30 days starting from January at the latest.

In particular, pay close attention to any symbol that you primarily use for hedging purposes. Wash sale tsunamis, like Alice's, happen on symbols where cumulative profits are consistently negative. And remember tax accounting incorporate all the costs related to sales (i.e. exchange fees, cancel fees, commissions, pro-rated colocation fees, etc.). Also remember that wash sale logic is extremely bespoke. Don't necessarily count on your broker to get it right. And don't just spot-check or guesstimate it.

If you're preemptive about it, you should also take a Section 475 mark-to-market election. Then the wash sale rule no longer applies. But that has to be already done by the beginning fo the relevant year. I.e. you can't do it retroactively. Also at least as an individual it's predicated on trader tax status, so if the IRS challenges that, then MTM also is in jeopardy.

It's probably not a major issue for you. But remember there's a small corner case of scenarios where the impact of the wash sale rule becomes truly pathological.
You Know Who - Renaissance Watch
Posted by EspressoLover on 2019-12-17 21:27
C'mon guys. Information geometry? I'm sure there's maybe one small application there using information geometry for some corner case that gives a 2% improvement over regular PCA. But no, RennTech isn't a bunch of wizards crushing the market with secret math recovered from the Roswell crash site.

Here's an interview with Nick Patterson, a former senior researcher at RennTech. He confirms that the vast majority of their strategies just use simple regression. The difference is that smart people use those basic techniques with a lot more elegance and accuracy. In the same way that Gordon Ramsay, despite following the same basic process, grills a much better steak than your local Applebee's. Don't underestimate the power of craftsmanship.

The factors that make RennTech a league above Two Sigma are pretty much the same factors that make Two Sigma a league above DE Shaw. No magic. No secret sauce. No mathematical gnosticism. It's things like expansive reliable curated datasets, deep expertise on market structure, good execution systems, powerful research and backtesting software, good access to markets, economies of scale, talented practitioners, and excellent organizational management.

EDIT: Thanks @rod for pointing out an error.
You Know Who - Renaissance Watch
Posted by EspressoLover on 2019-12-18 15:22
@rod. Thanks for pointing that out. You're right.

good DMA futures brokers?
Posted by EspressoLover on 2019-12-28 18:58
I've generally had good experience with Advantage.

If you're doing DMA, the broker is mostly a commoditized product. The only relevant dimensions is the commission schedule, fixed cost for colocation, and intraday margin requirements. Account size doesn't matter, just volume.

My general experience is that the off-brand discount brokers tend to give much better pricing than the bulge bracket firms.
good DMA futures brokers?
Posted by EspressoLover on 2019-12-29 06:35
On a day-to-day basis, you're pretty much never interacting with the broker in a proper DMA setup.

At the beginning of the trading day, your co-located quoter machine subscribes to the market data multicast feed, then opens a direct TCP socket with the exchange gateway(s). Orders are sent directly from your box to the exchange gateway, and never intermediate with the broker. (Same for responses from the exchange.)

At the end of each trading day, the clearing broker "settles up" with the exchange on your behalf. Crediting/debiting any trading PnL, exchange fees, monthly market data and access fees, etc. As well as the brokers' own commissions and fixed fees. Unless this process breaks, your only interaction is reconciling the end-of-day account statement.

On modern exchanges the pre-trade risk checks are done inside the exchange gateway. This makes it so that all participants go through exactly the same checks and no one's disadvantaged latency-wise. At CME this system is called ICC. At NASDAQ-based exchanges its called PTRM. Same principles. The clearing broker sets limits for each client account, and the exchange software enforces those limits.

Changes to these risk limits are only suppose to be made by a human functionary at the broker. So, this is the only channel in the relationship where customer service matters. Particularly if you're frequently re-allocating working capital between instruments.

The broker can also run post-trade checks on their own systems. They'll get a near real-time duplicate of all your activity at the exchange through "drop-copy" functionality. It lets them monitor each client's positions, open orders, and trades in near real-time. If they don't like what they see for whatever reason, they can flip a "kill-switch" with the exchange, which instantly disconnects your session. Obviously this is all asynchronous and irrelevant to latency concerns.

Between the in-exchange pre-trade checks and the drop-copy post-trade checks, that's more than enough risk control for nearly any setup. Any broker that demands they intermediate your orders through their own risk control system is full of shit. Drop them like a bad habit, since the extra hop will obliterate the DMA latency advantage.

By colo fees, I meant how most DMA-oriented brokers offer their clients sub-leased cabinet space in the datacenter. It's kind of nice, especially when starting out to keep fixed expenses lower. Instead of directly leasing an entire half-cabinet from the exchange, you can sub-lease 1U or 2U of rack-space for a lot less. Along the same lines most of the brokers in the space can arrange a seat lease/sale. But if you want to bring your own colo cabinet or exchange membership, that works too. In that case the only thing you pay the broker are the clearing commissions.

In terms of margin, I can't speak directly to options. But for futures, only overnight margin is determined by the exchange. Intraday margin is at the discretion of the broker, and typically a lot lower than overnight. Also, if you're market making the margin requirements of open orders vs. actual positions is a relevant consideration. You could have very large "worst-case" positions, but tend to keep small portfolio positions. Sometimes if the broker's chill and you're doing a lot of business, you can get them to basically ignore the unfilled open orders for purposes of leverage determination.
Recovery shape
Posted by EspressoLover on 2020-04-09 14:12
I wish there was more literature on the economic impact of the 1957 flu. It killed about 100,000 Americans out of 170 million. Which seems to be where most models predict we'll land with Covid-19.

The 1958 recession was the deepest in post-war history until 2008, but the recovery was really fast. It only lasted three quarters. The thing is that most of the economic histories I've read on it, only seem to make passing mention to the 1957 flu pandemic.
I have built a long-term algotrading strategy, how to bring out the most?
Posted by EspressoLover on 2020-05-21 21:25
Have you tried neutralizing the beta (and possibly the industry and factor exposure as well) to raise the Sharpe ratio?

Long-only trading of orthogonal signals on single-names in the S&P 500 is inefficient from a mean-variance objective. Without knowing more about your strat, I'd guess the sizable bulk of your vol is probably beta. Whereas it's pretty unlikely that you're deriving any serious alpha from market exposure. (Or at the very least if your backtest tells you are, it's probably spurious.)

Try looking down this route. To echo Tradeking, it's *way* easier to sell a 2+ Sharpe strategy than a 0.9 one. Distilling more alpha's always arduous, but oftentimes there's a (nearly) free lunch when it comes to reducing vol.
I have built a long-term algotrading strategy, how to bring out the most?
Posted by EspressoLover on 2020-05-22 17:22
@tralek

(Sorry, previous comment had a typo should have said "reducing vol" not "reducing lunch")

Effectively you don't have a 0.9 Sharpe strat. You have a 0.6 Sharpe strat (alpha/volatility). The additional 0.3 (and lower drawdown vis-a-vis the market) comes from diversifying its exposure with the market portfolio. But that doesn't "count", because nobody is going to pay you to provide beta. If a big investor wants long exposure to the S&P 500, she's going to do it through Vanguard at 5 basis points. Or Bridgewater at 100 basis points. But definitely not you at any cost.

I really want to emphasize this, because the only chance you have of selling this to a serious investor is as an absolute return product. And a 0.6 Sharpe absolute return product may be sellable, but not unless you have a pedigree. It is absolutely essential that you boost the Sharpe to at least 1.0, if not higher.

You mention that the strategy only seems to do well during bullish expansions? Are you sure that's not just because of the long-only nature of your positions? If you think of what you've been trading as a mixture of S&P 500 + [your unique signals], then maybe it's the S&P 500 that's making it bullish-bias. Try backtesting again, but beta-neutralizing every position. (I.e. short an equivalent amount of SPY on every long trade).

Beyond this, I see three likely avenues to boost the Sharpe. The first is turning over your positions more frequently. You say you hold a few days, but why? How frequently are you rebalancing the portfolio? If you're just using end-of-day data, it's possible that intraday rebalancing may reveal more trading opportunities.

Have you evaluated what the alpha realization curve looks like? It's frequently the case that 50% of the alpha realizes in a few hours. If you can exit trades much faster and still get most of the profit, that means less exposure and/or more capital available to do other trades that would otherwise be tied up in long-term positions.

The second option is to simply diversify across a larger basket of trades. I assume that your signals gives you something like a ranking of stocks in your universe? And then you go long the best stock? What about buying the top 5 names? Or top 100? Or all the stocks with positive signals in proportion to the signal magnitude? If you're long-only that doesn't help that much, because you still have market exposure regardless. But if you're beta neutral, then diverisfying the single-name exposure is a big win.

Also, have you evaluated the signal in terms of the short-side? What if you hedge the beta by shorting the bottom ranked stocks (instead of shorting SPY)? Now you're getting an alpha both from the short and long side. In most anomalies the short-side has larger alpha than the long-side. That would naively double the Sharpe. (Not completely because single names have higher t-costs than index hedges, cost of borrow, etc.)

Third, can you expand the size of the universe? Right now you look at S&P 500 single names. Can you do the Russell 3000? Can you add ETFs? Can you expand internationally to Europe, Asia or emerging markets? Commodities? Currencies? Bonds? If you can find 4 different markets or countries with orthogonal performance, you've just doubled the Sharpe ratio.

Finally I want to second @gaj. I'm not convinced that the strategy's performance isn't spurious. I don't want to be negative, but any potential investor will ask the same thing. At 0.6 Sharpe with 12 years of history, the null hypothesis barely clears statistical significance of 2.0 t-stat. And that doesn't account for any lookback bias. Was this the very first thesis you tried? Is any of that performance using in-sample trained parameters? Did you tweak the strategy parameters based on historical performance?

This is a secondary, but very convenient benefit of higher-Sharpe strategies. They can be statistically validated with less historical data. The upshot is if performance is spurious, then all of the previous suggestions will tend to make the strategy look worse. If the strategy is real, then diversifying and hedging will distill the signal. But if you've just overfit noise, then removing any of the vol will just dampen the noise.

This is one of the biggest pitfalls even seasoned practitioners fall into. It's tough to call it when a strategy you've worked hard on ends up being a dead end. But it happens all the time. A tell-tale sign is when performance seems to evaporate when you make any changes. Real signals are usually robust. It's way too easy to be defensive and declare "my particular strategy just doesn't work in a beta-neutral context" or "it has to be rebalanced at market close" or "it just doesn't work outside the US". In reality, you should probably interpret it as a sign that there's just nothing there.
Use of modelling intraday seasonality in volatility.
Posted by EspressoLover on 2020-05-27 19:37
This approach doesn't make any intuitive sense to me. You're scaling the forecasts from an ARIMA model by the volatility forecast, right?

So let's say the general model is that 70% of the price move mean-reverts over a 1 minute model. And let's say your seasonal volatility forecasts 200% of baseline vol. So now you expect a 140% mean reversion? That sounds like nonsense. It's hard for me to imagine any sensible reason we'd expect mean version to exceed 100%.

This approach is unnecessarily adding a level of uncalibrated and untested indirection. Why even forecast volatility at all? All you ultimately care about are returns. Why not take those seasonal variables that you're using in the volatility model, and drop them as interactors directly into the return regression?

In general, I'd caution from putting too much stock in the implementation details of academic papers. The academic's incentives are different than the practitioner's. His goal isn't to make money, it's to impress the tenure committee. That's often done by adding a lot of unnecessary mathematical wizardry into published papers.
Use of modelling intraday seasonality in volatility.
Posted by EspressoLover on 2020-05-28 02:56
I see where the concern is coming from, but the approach still doesn't make sense. If there's heteroskedasticity in the dataset, then the correct approach is to use weighted regression in inverse proportion to the dependent variable's volatility. (Well, technically the residuals, but R^2 is so small in return forecasting, that it's essentially the same.)

If I'm understanding you correctly, the paper's approach is to simply linearly scale the forecast by the heteroskedasticity. I've never heard of such a thing either in finance or anywhere else. It's hard for me to even imagine a scenario where that *would* work.

In practice, you can probably just ignore the seasonal heteroskedasticity entirely. Trying regressing with both weighted and unweighted OLS and compare the in- and out-sample forecasts. My guess is the two models are 90%+ correlated. And even if not, do you really want to underweight your model on 3:30 PM, just because prices are more volatile around that time?

It may help from an MLE perspective, but remember trading is ultimately about making money. Arguably it's more, not less, important to get your alpha correct at 3:30 PM, because you're probably trading more than you are at noon. And if noon and 3:30 PM really behave that different, then you should probably fit separate regressions for the two regimes anyway. If you're using 1-15 minute horizons, there's no shortage of data.
You Know Who - Renaissance Watch
Posted by EspressoLover on 2020-06-15 23:39
One thing to consider is that RIDA may simply be less risk-adverse to these types of selloffs than the average equity market neutral fund.

Medallion, by trading at much shorter horizons, probably over-performs during periods of market turmoil. RIDA and Medallion would act like natural hedges against each other. In the same way that bonds hedge stocks in a 60/40 portfolio. That probably encourages the insiders to over-leverage RIDA, at least relative to what's optimal from the perspective of a standalone investor.
Understanding Fixed Income Relative Value
Posted by EspressoLover on 2020-07-11 18:30
> From what I've read Relative Value trading is usually a subset of statistical arbitrage.

I don't know if I really agree with this. With the caveat that these are ambiguous terms, one of the hallmarks of stat-arb* is that there's a specific forecast over a fixed horizon. Whereas this isn't always the case with RV.

Consider the classic RV trade of buying off-the-run treasuries at a discount to on-the-run. It doesn't involve any directional view on rates, just the belief that the off-the-run bonds are trading too cheap relative to their on-the-run counterparts. The RV trader believes that the spread will inevitably narrow but doesn't necessarily have a specific timeframe. It could be tomorrow, it could be six months. (With regard to fixed income, it certainly helps that RV trades almost always have positive carry.) In contrast a stat-arb strategy generates a signal at a fixed horizon. If you traded OTR stat-arb style, you'd want to forecast something like we expect the excess spread to decay by 25% per month.

This may seem like a distinction without a difference but there's an important subtlety. With forecasts you can systematically blend disparate signals and instruments using a standardized metric. Portfolio construction becomes a purely mechanical function of maximizing return relative to risk. Otherwise you're comparing apples to oranges.

Let's say you're an RV portfolio manager. You see that OTR spreads are two times wider than normal, bond-CDS basis is only 50% wider but carry is very high, cash-futures basis isn't that wide but you expect an upcoming event to catalyze narrowing. How do you decide to allocate capital? How much powder should you keep dry for future opportunities? The only solution is to inject human judgement and intuition. This is why you wind up with a discretionary manager in the loop.

A natural question is why don't RV funds all take the stat-arb approach? And I think the answer is because the latter requires much more powerful analytics and sophistication. Discretionarily trading the OTR spread doesn't demand much more than a souped-up Excel sheet, a Bloomberg terminal to pull prices, and a couples years of experience to get a gut feeling for what's "wide" given today's conditions.

In contrast going full systematic stat-arb would require years of pristine historical data (tough in OTC markets), accurate models of liquidity and execution costs (also tough for OTC), and careful training of historical models against a backdrop of sharp regime changes triggered by breaks in banking, regulatory and macroeconomic conditions. To be honest, I think a lot of times this approach isn't really possible for many classic FI RV strategies. AFAIK, RennTech and similar firms, who should be well positioned, don't really swim in those waters.

(*As a postscript, I'll add that "stat-arb" is an overloaded term. It can refers to a generalized set of techniques like signals fitted on historical returns, mean-variance portfolio optimization and orthogonal factor decomposition. It also commonly is used to describe a cluster of equity market neutral strategies that focus on sources of alpha around return reversals, earnings events, analyst forecasts, etc. The two often go together, but not always. For this post, when I say "stat-arb" I'm referring to the former.)
Pegged Order
Posted by EspressoLover on 2020-08-03 16:28
My guess would be the abusive scenario would be using it as a free option on new level formation.

Say the market's quoting at 100x101. You submit a buy order pegged to one level below the ask price. Say within the window a large order comes and swipes the ask, so the market's quoting at 100x102. Now the pegged order arrives and prices at 101. You've successfully grabbed first queue position on new level formation. But if the market doesn't move, the order just joins the back of the pre-existing queue at 100. Very low fill risk, you'll almost certainly be able to cancel.

The difference with vanilla limit orders is that speculating on new level formation requires taking real risk. If the market's quoting 100x101, and you're anticipating that it's about to move to 101x102, you better be right. Otherwise you pay to cross the spread and have nothing to show for it. In contrast with the pegged order, you can just spam it all day long without incurring any real risk. And even if you only get a hook once every thousand times, you'll still turn a profit. (Slight exaggeration: there may be OTR penalties, small risk you could get filled on the back queue, etc. But in general very small amortized costs for a lot of upside.)
Market making models
Posted by EspressoLover on 2020-09-01 13:53
@prikolno

Thanks for the elucidation! I always learn something from your posts. I've never heard of the "exhaust" paradigm before, but can definitely see how it'd add value for certain categories of strategies.

@strange

At nearly 20 years old, it's starting to get a little long in the tooth. But I'd still recommend Larry Harris' Trading and Exchanges. To drop the link again, this presentation covers things from basics to fairly advanced concepts pretty well. Most of the papers by Kirilenko or Brogaard are pretty good, especially because they have access to de-anonymized datasets. Hasbrouk's a good author too.
Market making models
Posted by EspressoLover on 2020-09-01 03:08
I could be wrong, I often am. But I'd say that HJB and the like have essentially no use for practitioners of HFT-style market making. I'd freely admit that I'm taking a provincial view here, if not for the the fact that the original A&S paper was titled High-frequency trading in a limit order book. (With most subsequent papers using similar terminology.)

As far as I can tell, A&S don't seem to have talked to any actual high-frequency traders. Or for that matter anybody who even regularly uses a limit order book. The word "data" only appears once (in case you're wondering it's available, but the authors definitely won't be using it). Anytime I see these type of quant finance papers- you know the kind with pages of differential equations but zero actual datasets- my eyes glaze over.

It makes me feel like I'm reading an anthropology paper on "Childrearing Habits in Papua New Guinea". Except the author's never left Cleveland, is only vaguely familiar with New Guinea, and barely spends any time even with his own kids. The armchair anthropologist reaches speculative conclusions by long chains of logic derived from arbitrary first principles. Maybe he does winds up getting things right. And if so it'd be super-impressive. Still, if I was trying to crack the emerging Papuan toy market, I wouldn't take his advice without booking a flight to Port Moresby.

Complex PDE models like HJB usually wind up baking in a whole bunch of unvalidated assumptions. Which basically forces the data to fit the pre-determined model. In practice, it's almost always better to let the empirical evidence drive the model rather than vice versa. The upshot tends to be "flat" models with more degrees of freedom, that allow for parameterizing over a less constrained universe of behavior. Among the ways that HJB misses the mark here for real practitioners (IMO):

Over-focus on volatility and risk: The vast majority of major market makers, don't care about optimizing for portfolio volatility. Virtu had one losing day in like a decade of trading. If your Sharpe ratio's 30, you don't care about boosting it to 40.

Ignoring adverse selection: Unlike risk, this *is* the binding constraint that limits market maker sizing. Market makers don't stop quoting larger because they're afraid of portfolio risk, they do so because the expected PnL on the marginal liquidity becomes negative. It's the first, not the second, moment. Isn't this just effectively the same way that HJB penalizes inventory? No. Inventory that you've been sitting on has way less toxicity than a quote that just got lifted. Liquidity sourced from retail brokerages has less adversity than lit exchange flow. Order flow at market open is more toxic than flow at the close. From a vol perspective, inventory-reducing trades are always good. From an adversity perspective, they can still be very toxic.

Ignoring queue position: HJB assumes that there's no competition. If I had no competition, my biggest concern would be what color to paint the helipad on my yacht. In practice market making is viciously competitive. Both in terms of price and queue position. The former can fit into the HJB worldview. The latter most definitely does not. Every market maker in the world would tell you the necessity of quoting at a position that is instantaneously unprofitable to build queue position. If you listen to HJB, no one would ever try to seize first queue position on new level formation. Yet that's consistently the most profitable trade in the market making universe.

Impractical in low-latency environments: HJB requires numerically solving a complex PDE. I left this point last, because I suppose you could approximate or pre-compute in a way to get response time under 10 microseconds. I wouldn't want to be the guy responsible for it though.

Papers like Avellaneda and Stoikov certainly are impressive intellectual achievements. I certainly always feel smart when I finally work my way through the pages of stochastic calculus. And then after that feeling wears off, I realize there's nothing useful here. @prikolno is smarter and definitely better informed than me. So, I'd trust him when he says it has practical applications for some players. But I don't see it for anyone in my corner of the world.

On a tangent, one piece that I would recommend for background reading that does a pretty good job of tying together academic theory with real-world is
this presentation. The author uses a fair bit of stochastic calculus, but he always keeps it grounded in the empirical data before straying too far into the clouds. I won't endorse every statement in there, but I think if more academics in quant finance would do well to follow his example.
Performance Metrics for HFT Strat
Posted by EspressoLover on 2020-10-08 19:09
Just pick some reasonable maximum position limit, then grid search for max PnL.

There's really no need to optimize around Sharpe, because most working HFT strategies are constrained by liquidity not risk. Your intuition is right, optimizing on Sharpe will produce very small PnLs. The first marginal unit of risk is almost always going to be the most profitable. Max(Sharpe) collapses into trading one lot at a time. The same is true for Max(PnL/Volume).

As long as you have a solid HFT alpha/strategy, you'll rarely find yourself with high-risk/high-PnL parameters. At high turnovers, most of the portfolio variance becomes dominated by the trades, not the positions. What makes or breaks a strategy is short-run returns post-fill, not how the inventory drifts over time.

In contrast low-frequency strategies tend to hover near zero-Sharpe, because EMH implies that the ex-ante return of any random position is about zero. Buy-and-hold stocks based off a dartboard, and you pretty much match the index.

But on a trade-dominated strategy, the returns come from microstructure alpha minus t-costs. There's no reason to expect this number to cluster around zero. And indeed if you just keep making a bunch of random trades, you'll very quickly lose all your money. Almost all high-turnover parameterizations are going to be either straight up or straight down. In HFT-world, high-PnL/high-risk Sharpes are a rare coincidence that require a lot of stars to align.

If you pre-set a small maximum position limit, it forces the optimizer into staying within high-turnover strategies. And therefore avoids the problem of high-PnL/high-risk parameterizations. A tight position limit caps the variance contribution from long-run drift. Unless you literally have no other profitable options (in which case it doesn't matter), high-turnover parameterizations will always dominate the PnL metric under a pos-limit constraint.

All that being said, sometimes it's practical to set a high Sharpe floor on a strategy. Especially when you're starting out. Not necessarily from a risk perspective, but a validation one. One of the hardest challenges is verifying if/when live trading is not in line with simulated backtests. If you start with a 30-Sharpe parameterization, then even a single losing trading day lets you reject the null hypothesis at p<0.05. It can be useful to start with a Max(PnL) subject to Sharpe > [X], then gradually relax that constraint over time as you build up confidence in the live implementation.
Why is native CME iceberg algo still used when MBO data helps designate?
Posted by EspressoLover on 2020-10-16 19:05
Caveat, I haven't really done much at CME for awhile. Also I never really played with iceberg orders of any type. But I would think that the biggest reason to use native ice is to gain atomic matching on the hidden liquidity.

This is important if you either don't have latency supremacy or want to interact with oversized IOC orders. For the former, the book can change state before your synthetic has time to insert its next tranche. So HFTs will tend to swipe the best liquidity before it interacts with your hidden liquidity. For IOCs even latency supremacy is insufficient, since you'll never get a chance to interact with the cancel quantity outside of the atomic matching event.

Again, I can't really tell you how *big* of an effect these things are. Certainly they have to be balanced against the very real cost of the obvious visibility that comes with native icebergs. Definitely don't throw out the Occam hypothesis that most native ice users are irrational/lazy/uninformed.
Why is native CME iceberg algo still used when MBO data helps designate?
Posted by EspressoLover on 2020-10-21 17:18
@prikolno

Thanks for this. Extremely informed and informative as always. Any chance you can share more on the FPGA consideration? I don't have any FPGA background, but am working on an FPGA project at the moment. It'd be helpful as a learning opportunity, even if the issue was already patched.

@ESMaestro

My gut sense is that your problems are mostly driven by the high price and high volatility of the S&P 500 in recent years. The "effective tick size" is much smaller. $0.25 becomes relatively smaller against larger absolute price moves. I haven't looked directly into it, but I'd be virtually certain that top-of-book touch sizes are much smaller than they were five years ago. Take this with a grain of salt, but I'd guess that the easiest remedy would be to re-optimize for a thinner book regime.

I say this because market makers tend to size impact based on some concave function of queue size. If you're trying to passively fill with 20 lot child orders, you'll eat a lot more market impact in a regime where touch sizes are 100 contracts, compared to one with 1000. Starting from this point, I think you have can pursue one or more o the following three solutions.

The most obvious is to scale down the size of the child orders. Maybe only display 10 lots at a time instead of 20. With a thinner book/smaller tick size, touch sizes are smaller but the market moves faster. So you have the freedom of slicing orders into finer granularities, Expected fill time on resting limits should be shorter compared to five years ago. As an aside, I don't know if TT supports something like this, but you may want to adaptively size the child orders depending on something like the rolling touch size or average limit order fill time.

The second option is to place your limit orders further down the book. Similar reasoning applies. In a thick regime, deep book orders can take unacceptably large to fill. But with large price moves, away quoting becomes comparatively more viable. Market impact tends to be smaller because you'll mostly be joined at large queues. Plus HFTs are less likely to profile liquidity that originates in the deep book.

The third option is to take make hay while the sun's shining, and take advantage of cheaper liquidity. That entails being more willing to cross the spread. Of course, aggressing will create higher impact, so you want to restrict this behavior to near the end of a parent order. If you use a marketable limit, you can both swipe resting liquidity and colonize first position in the queue. This approach can be particularly compelling if you have microstructure alphas to incorporate into the execution algorithm. Not only does that shrink the expected cost of crossing the spread, but it tends to diminish the long-run impact since your flow will profile like an HFT instead of real demand.
Weird Trading Signals
Posted by EspressoLover on 2020-11-27 15:53
I don't think it's that weird of a signal. The weekend effect has been debated in academic finance for decades. It's also nearly indisputable that equity returns are disproportionately concentrated in the overnight period. There are pretty good economic justifications for daily/weekly seasonality effects. Due to psychological and institutional reasons, participants tend to demand different risk premias at different periods. A lot of people just plain prefer to de-risk to enjoy a less stressful weekend.

I think these types of signals tend to be underexploited. They're kind of an awkward fit for most of the major desks and strategy categories. A while back, I used to trade around the VIX term structure, and there were a lot of easily predictable daily seasonalities that persisted year after year. And that's one of the most studied/traded assets in the world.

All that being said, I'd still bet money that whatever you found is spurious correlation. Not because the hypothesis is infeasible. But just because almost any time a researcher tends to stumble upon an accidental hypothesis it ends up being spurious. IMO there's a huge latent state space of these potential "accidental hypotheses" that your mind is unconsciously and continuously scanning for when doing exploratory data analysis. You're not really even aware of the process, so even strong intuitive statisticians, tend to not correctly down-weight the multiple hypotheses effect.

I'd tend to be skeptical unless the t-stat is very large, or it's validated on totally out-sample data. Barring that, I'd maybe try to test it by systematizing the multiple hypothesis selection in a CV framework. Build a program to start from scratch and look at all the factors that you did in the exploratory data analysis. At the very least other seasonality periods. Make sure not to privilege the Friday->Monday conclusion in anyway. Then check to see how the cross-validated results look out-sample.
DeFi and decentralized market making
Posted by EspressoLover on 2020-12-01 17:15
I've been toying around with DeFi stuff as a side project for the past couple months. I never knew much about crypto or crypto trading before, so thought it would be a fun learning opportunity.

So far, I've replicated the pre-existing strats that front-run Uniswap trades in the Ethereum mempool. It's pretty amazing how easily exploitable the ecosystem is. Anyone can see your transactions before they print, then execute ahead of you by bumping the gas fee.

Front-running trades like this is kinda bush league though. The money's easy, but relatively small. The crazy high taker fees (30 bps each way), means that the only real opportunities are illiquid micro-cap tokens, which get regularly dislocated by ordinary trades. Against that you're fighting Ethereum's super-high gas fees. Plus the half dozen active players frequently get in vicious gas auction battles. A lot of the returns come from optimizations in EVM byte code. (Particularly SSTORE gas rebate tricks.)

If you could come up with a consistent strategy on the super-liquid pools (e.g. USDC-ETH), then I think you could really print money. Gas fees are invariant to transaction size. If you trade millions instead of thousands, gas costs amortize to epsilon. You could do some really cool stuff, by strategically flashing liquidity, either to avoid impermanent loss (the term DeFi zoomers use for adverse selection) or to kludge cheap passive directional execution (impossible through naive liquidity providing since Uniswap doesn't use a limit order book).

It also seems like there's a lot of opportunities to arb the DeFi exchanges against the centralized ones. My hunch would be that most price discovery occurs at the centralized exchanges. If you have the ability to dominate gas auctions, then it's basically the equivalent to latency supremacy. Going the other way, if you have the infrastructure/software to preemptively scan the mempool, then that probably gives you an advantage over regular centralized exchange traders. If they're watching Uniswap at all, they're probably only seeing market moving trades after they mine to the blockchain.

My two bps. As a neophyte, still figuring things out. It'll be interesting to see how Eth 2.0 changes the landscape. Anyone in the space or curious about it, feel free to drop a line if you want to shoot the shit offline.
GME
Posted by EspressoLover on 2021-01-29 05:15
> According to this survey (which is probably very noisy) https://www.reddit.com/r/wallstreetbets/comments/k6jjm3/wsb_owns_58_of_gme_gme_survey_update/ wsb owns 5.8% of GME. 

I wouldn't be surprised if that's a big under-estimate of the aggregate retail flow. The whole thing has replaced sports and TV, as the primary topic in all my group chats over the past 48 hours. The zeitgeist feels a lot like when there's a huge lottery jackpot. Off the top of my head, I'd guesstimate that 20-30% of the people in my social circle have mentioned buying it. I'd say even 5% have bought AMC, BB, or NOK. I don't think any of them regularly post on /r/WSB, but they're all buying for the same meme reasons. Nostalgia for the early 2000s, and some general need to feel like a part of a movement is shutting down higher order rationality and caution.

The median position seems tiny, but the tail is skewed. The huge runup in wealth in the past year has definitely increased the risk appetite of the young-ish upper-middle class to exuberant levels. For example, I know someone who recently got paid $10 million after his SaaS was acquired. He claims to have thrown a half million into GME stock and calls. Because it's all imaginary money anyway. Similar story for a crypto millionaire acquittance.

Heck, there's a shit ton of middle class Joes, who feel like Baron Rothschild after cashing out $100k in home equity at 2%, then watching that turn into $250k in TSLA. They've been cooped up at home, no place to spend money. Americans do not have the temperament to sustain a positive savings rate. Money is for spending or gambling, not collecting dividends in boring-ass blue chip stocks. The tinder is dry. These bros are ready to YOLO.

Take it for what you will. This is all personal ancedotes extrapolated into a rough aggregate estimate. But I'd guesstimate you had 2 million Americans, and maybe another half million foreigners, buy in at $3-4k average size.
GME
Posted by EspressoLover on 2021-01-29 17:34
> That is just 10 yard. And it's been 10 yard per day, for 10 days

Ahh. I see. Understood.

My guess is that a small chunk of participants are trading at crazy high turnover on top of the vol. /r/WSB keeps repeating "Hold the line", but I'm sure every self-styled day trader in America is watching this stock, and pushing decent volume as the whipsaws light up their Fibonacci-MACD-BS indicators like a Christmas tree

You also have a lot of shorts in large multistrats with sophisticated portfolio construction teams. Say a fund allocates $500 million short. Rebalanced hourly, you're looking at O(10%) moves. That'd be close to $400 million volume per day. If half of the 65 million shares shorted are rebalancing intraday, that could account for $5+ billion ADV.

Finally, I'd expect that the HFTs, and HFT-ish stat arb desks are trading between each other like crazy. There's a lot of money to be made for everyone. This isn't like normal market conditions, where the low-latency kings scrape rebates at the front of the queue. A lot of chaos in the microstructure will cause a lot of disagreement between even slightly different models and parameterizations. Disagreement creates volume.

Even on liquid blue chips, HFTs only hold a sliver of inventory. Yet the order flow magnitudes are humongous. Intermediating stochastic imbalances is the name of the game for short-horizon stat arb. I would bet Medallion or PDT or Teza is printing huge volumes, but basically neutral at any scale longer than an hour. That could result in an unusually high multiple of tactical flow to real flow.
GME
Posted by EspressoLover on 2021-02-01 04:03
[Warning ahead of time. This post is a long tangent only loosely related to the topic. Please skip if not of interest]

> Why should I ruin anything by taking the risk trading in a bat-shit environment where all my models are probably worth shit?

That's definitely what I originally thought. But experience has largely proved otherwise. (Caveat, other people's experience may be different than mine, HFT can mean different things to different people, not all strategies are the same, yada, yada, yada.)

A disproportionately high amount of HFT PnL comes from a sliver of highly turbulent periods. Like surprisingly so. If you back out the flash crashes and the volatility spikes and the halts and so on, my guess is a lot desks would barely be profitable. (Another caveat. A lot of my experience is in trading unusually volatile and unstable markets. Again YMMV.)

The biggest risk for most HFT operations is failing to generate enough to cover the fixed costs. For every one HFT that blows up from market losses, at least ten fail because they can't pay the bills. Going dark introduces a risk all of its own. Intraday PnL drawdowns are pretty small. HFTs keep really tiny inventories relative to their revenues. So even if the vol on the underlying goes haywire, do you really care if your 30 Sharpe strategy goes to 15 Sharpe?

(That being said, the one reason to avoid turbulent conditions is lack of confidence in the ability of the IT systems to operate correctly. Knight-style operational failure is much more of a real risk than actual market exposure on a correctly working strategy.)

I can't speak for GME, but in similar conditions, you'd really be surprised just how non-toxic the order flow turns out to be. Especially on a short-term basis. A lot of the real flow winds up being near random and totally insensitive to the spread, unaware of its own impact, and uncorrelated with microstructure alpha. Obviously, that's all great for market makers.

My theory is that the vast majority of non-HFT participants, simply can't keep up with the market. Either their normal refresh times are too slow or they can't handle the volume of the data. In orderly markets the execution algos used by big institutions are pretty good about managing their realized t-costs. But I think they break down under stress. A lot of participants are shooting orders off stale or even corrupt prices. For a market maker that can keep up in realtime it's almost like having a last-look advantage.
GME
Posted by EspressoLover on 2021-02-01 17:44
> They can go back to normal trading and they're holding shares in a stonk which has 20b in cash value.

Raising $20 billion seems optimistic. The entire capitalization of the stock is only $20 billion right now. They'd have to attract significantly more than the cumulative inflows that has already arrived from WSB. A lot of the WSB holders bought in at much cheaper prices at the beginning, so we're actually talking about even more than the cumulative inflows. Doubling the float overnight, without impacting the price seems unlikely.

I could see maybe raising $5 billion in a secondary. Assuming the fundamental value of GME is about $1 billion, shareholders would wind up paying a buck for twenty cents of assets. To be honest, that might actually still be a better value than TSLA, or even AMZN. But at the very least there's a possibility that those Tesla and Amazon are overvalued, but ridiculously dynamic and innovative companies. It's a long shot but they may actually grow into their valuations. However, George Sherman is never going to be Jeff Bezos no matter how much cash WSB throws at him.

It's probably all moot thought, because the SEC is likely to block them from selling new shares at the current price. The same way they did with Hertz.
Index trading strategy, part II
Posted by EspressoLover on 2021-03-25 05:02
> That means somebody could use your signal to trade at 3:50 and exit at 4:00 to make 15% p.a. with only 10-minute volatility. 

I believe what's happening is that @energetic is still using the 4 PM signal, but executing SPY at the 3:50 price. That would explain why there's such a sharp drop-off in returns. If that's the case, then the 4-3:50 delta probably represents lookahead bias, and can't actually be captured. If this is mostly represents mean reversion in SPY (i.e. buying the dips that are driven by risk-off sell offs), then that might explain why there's such a sharp drop-off when trading a T+0 signal at T-1.

> Who in their right mind doesn't want to make 20%/year? Do many people really have better options? I really really doubt it.

I understand your frustration. I'm actually a lot less skeptical than most other posters here. (Largely because I looked at something analogous years ago, and found pretty similar results.) If you can actually deliver 20%/year, investors should be beating down your door. Certainly that'd crush most hedge funds out there.

But you have a chicken-and-egg problem. Without a track record nobody will give you capital. But without capital, you can't build a track record. Unfortunately, when it comes to credibility, backtests are pretty much worthless. 20%/year backtests are a dime a dozen. The vast majority of capital allocators are not anywhere near as sophisticated as this forum. All the technical statistical talk about careful hyper-parameters tuning and cross validation will just make their eyes glaze over.

I'll second the point made by most of the other posters. Try to boost the Sharpe, either by diversifying across orthogonal return streams. Or trading at higher frequencies. For example, I think you could extend this to single-name equities by reconstructing the VIX equivalent index on specific stocks. Even if the alpha is weaker per name, that would give you 100+ orthogonal streams, which should easily double the Sharpe.

When it comes to bootstrapping a new strategy, there's a gigantic difference between a 2 Sharpe strategy and a 6 Sharpe strategy. For one, the higher the Sharpe the easier and faster it is to validate statistically. Even at 2 Sharpe, it will take years before any investor is sure you're actually generating alpha. In contrast, a 6 Sharpe strategy can be turned on with very tight drawdown limits, and you'll pretty much know in a month or two whether it's working or not.

But more importantly, a high Sharpe strategy compounds at a way faster rate. That means you can start off with a much smaller capital base and quickly grow into a decent size. At one tenth Kelly sizing, a 6 Sharpe strategy will double invested capital every 3 months. Start with a $100k and in three years, you'll have $400 million. (Or more realistically will hit capacity constraints. But the point is, you'll no longer have a shortage of capital or track record.) The higher the Sharpe, the easier it becomes to solve the chicken-and-egg.
Index trading strategy, part II
Posted by EspressoLover on 2021-03-25 17:57
> I don't quite follow your point about super-fast growth of Sharpe 6 strategy. You mean by leveraging?

For a fixed Kelly fraction or drawdown tolerance, the return to a strategy scales quadratically with the Sharpe. E.g. doubling the Sharpe quadruples the the return. This happens for two reasons. First, higher Sharpe on the same return stream linearly increases unleveraged returns. Compare S&P 500 returns to a timing strategy where 100% predict long or short correct everyday.

Second, higher Sharpe linearly decreases the drawdown on unleveraged returns. An intuitive way to think about this is that as the distribution shifts to the right, a smaller proportion of the variance falls below the zero-bound. For a fixed drawdown tolerance, leverage scales linearly with Sharpe. Increasing Sharpe is like double-dipping- you increase both the returns and the leverage at the same time.

For example with a 6 Sharpe daily strategy, and a 10-year max DD tolerance of 30%, you'd target 377 bps of daily volatility. That earns you 137 bps of average daily returns. At that clip, you're doubling your capital every quarter.

> E.g. I am using VIX(3:55) and futures prices on the same minute as model inputs to generate positions. The very same parameters may give different direction just 5 min later because the inputs are different.

Hmm... That's pretty interesting. In this case, I'd definitely take @gaj's advice and try shorting the signal between 3:55 and close. At the very least it should be pretty enlightening. The sizable drop-off can only be explained two ways. One, the market moves strongly opposite to the signal in the tail-end of the intraday session. Two, the signal tends to move sharply in the last N minutes, and this move is highly correlated with overnight performance.

In the former case, you can use short the signal intraday to capture a high-Sharpe high-frequency alpha. In the latter case, you should be able to improve the performance by conditioning on whether the signal formed in the last N minutes. I.e. signals that suddenly switch just before close have much stronger overnight alpha.
Fitting Model with Latency
Posted by EspressoLover on 2021-05-12 18:19
IME mid-price delta is kosher for maker-taker thick books (i.e. sits at one-tick wide spreads most of the day). The convenient part is that it blends pretty seamlessly into mid-frequency where mid-price is de rigueur.

You can always validate by fitting the same regression on mid, bid, and ask deltas. My guess is you'll get near identical coefficients. In these environments, 90% of the time the bid will immediately follow the ask and vice versa. There will be a handful of datapoint that are two ticks wide, but these are pretty rare and exert minimal influence. You should generally either fit a separate model for these points, or just set your HFT alphas to 0 when spread > 1 tick.

W.r.t. multi-venue latency, are you trading across venues? Or can you assume that you're only trading actively trading on a single exchange? Even if it's not the latter, is there a way to structure it so that each exchange runs as its own independent strategy with its own independent inventory? This isn't feasible if you need to do something like buy cheap at X then sell dear at Y. But if you can manage it, it simplifies modeling.

Assuming you're under that framework, what I would do is train the strategy on its "home exchange". Everything is translated into the home exchange's frame of reference. Market data from foreign sources has its timestamp shifted based on the data delay latency. Home exchange data uses native timestamps. (You still assume roundtrip execution latency when backtesting, but this is just a matter of delaying simulated fills.)

This minimizes the messy business of timestamp shifting. At high frequencies the bulk of your alpha will come from native, unshifted data. Of course, you'll want to rerun the training across a wide range of plausible foreign data delays. That'll give you an idea how sensitive your strategy is to market data latency.
Hyperparam tuning of large models for timeseries
Posted by EspressoLover on 2021-08-02 15:15
I think there are two related, but ultimately different factors at play here. One is training error. How much fit decay you'd expect even if the underlying probability distributions remain exactly the same. Two is regime change. How much the probability distributions drift over time.

Cross validation is ultimately about the former, not the latter. Regime change is a lot messier of a beast to tackle. The way I like to get a handle on it is to fit a rolling model, then quantify the rate that it decays from month to month as you move away from the training period. You also want to make some effort to optimize for an ideal fit period. You're trading off regime recency from shorter training periods against lower training error from the larger datasets that come with longer periods. Somewhere in between is the sweet spot.

Finally, one thing to keep in mind in regime-driven environments, you want to bias towards robust local minimums in a neighborhood of flat gradients. Think of a fitness landscape. This is the difference between gently rolling hills and narrow spikes. Regime drift means you're living in a perturbation of your fitted point. So you want to find a point with a good neighborhood.

Similarly, when regime drift is prominent saddle points can be particularly nasty. This becomes a bigger problem on higher dimensional models. Ostensibly they'd seem okay, because perturbation can actually improve the performance. But practically, they make a model behave unpredictably. A saddle point can look regime-robust for a long time, until the perturbation hits the wrong dimension, and all of a sudden you see performance collapse seemingly out of nowhere. 


DeFi and decentralized market making
Posted by EspressoLover on 2021-08-03 20:59
The problem is that since the advent of Flashbots, the miners are reliably extracting 90%+ of the profits in those trades. Prior to widespread Flashbots adoption, you'd win those opportunities in priority gas auctions (PGA). Most times, blocks would print well before the PGA price hit economic breakeven. There was a lot of room for technically sophisticated players to net well above their gas spend.

Now with Flashbots, competition is a lot flatter. It's a sealed auction, that goes through an easy to onramp API instead of the standard p2p mempool. There's a lot more competition, and most people bid somewhere close to the breakeven price. Consequently, it's not anywhere near as profitable as it used to be.

I haven't looked into it directly, but I think there are still opportunities in BSC and Polygon, where Flashbots (AFAIK) hasn't been adopted by the validators. The other thing is, almost all the shitcoin buys are front-run, but virtually none of the sells are. That's because nobody actually wants to carry inventory. I think there is room for quants who know how to manage portfolio risk on this side of the equation.
Next generation of traders
Posted by EspressoLover on 2021-10-06 20:49
Selection bias. You're not seeing all the underperforming hamsters who were fed to the lab Boa.
Recs for options backtesting
Posted by EspressoLover on 2021-10-06 20:51
OLTP databases very rarely make sense for research. Data is WORM, there's no reason to pay the operational and computational overhead for ACID compliance.

I think (gzip) flat files are a pretty good starting point. No reason to go fancier unless you can identify a specific advantage of already have the infra setup. If you need to index over multiple dimensions just duplicate the data sliced multiple ways. Storage is pretty cheap.
Hiring woes for a small shop
Posted by EspressoLover on 2015-01-11 19:53
One thing I haven't seen mentioned in this thread: how much are you reaching out, post-interview, pre-acceptance? As someone who was in the position of an applicant coming out of school not too long ago, that made a big difference. I can guarantee that recruiters from PDT, HRT, and Citadel are not just sending the offer letter and waiting for a response. Typically they're posing as the candidate's friend, having long conversations about their offer options, taking people out to lunch, getting high-up people in the company to get on phone calls with the candidate, and flying people back out to visit for the day.

These types of approaches may be doubly important if you're a small shop. From the perspective of a new graduate, they have very little understanding of the industry landscape. At least they've heard of HRT. It may take effort getting them comfortable with your firm. There's also going to be a lot of questions, ones which most people will be too intimidated to ask during the interview process. Even if you're willing to just give honest advice about their various options, people appreciate that.

My two-cents: Aggressively court candidates after you've extended the offer. Maybe even hire a full-time recruiter from one of your rival firms.
Managed Futures Programs: RORs at 10% - is that the limit?
Posted by EspressoLover on 2015-11-07 00:49
CTAs which consistently put in over 1 Sharpe, either raise assets so quick that performance falls or the principals' personal wealth is more than enough fill the capacity and they convert to prop. Medallion is the archetypical +2 Sharpe CTA.
Gold?
Posted by EspressoLover on 2015-11-15 23:22
> Capital needs to be allocated to enterprise; that's the nature of capitalism. Is it your considered opinion that everyone should switch to tick whoring instead?!

How does the price of gold affect the allocation of capital to any enterprise? Besides gold miners of course. Whose only purpose is to make more gold to idle in vaults.
Gold?
Posted by EspressoLover on 2015-11-16 09:25
Sure, that makes complete sense in a world where TIPS don't exist. But we already have assets that directly track inflation in a much more straightforward way. Nobody cares about the gold market, besides people trading or mining gold. I doubt there's a single major banker or treasurer in the world that's getting an inflation forecast from the price of gold. By definition: There are people willing to wager a given price for a bet on the Patriots winning the Super Bowl. That doesn't mean those odds have anything to do with capital allocation.

The marginal buyer of gold isn't an asset manager hedging the risk of inflation. Its Peter Schiff-Glenn Beck paranoids who don't trust the gub'mint, think the Federal Reserver is run by the Elders of Zion, and have successfully predicted 157 of the past 2 cases of hyperinflation in OECD economies. Or its third world gangster oligarchs who are trying to hide and smuggle their wealth, or just plain don't trust financialized assets.

Gold may have some loading on inflation expectations, but the vast majority of the variance is a bet on whether the aforementioned lunacy increases or decreases in the near future. Which is why you get spikes in the price of gold during scary financial crises like 2008, even though they precipitate frozen credit markets and deflation. After TIPS were created almost all the adults at the table abandoned the gold markets. There's almost certainly no mutual information to be found in gold prices that isn't found in TIPS spreads.
Gold?
Posted by EspressoLover on 2015-11-18 08:16
> But the vix related party seems crowded

FWIW, if you hedge out the short beta exposure, long VIX has had consistently positive returns since the curve normalized at the end of 2012.
Gold?
Posted by EspressoLover on 2015-11-20 01:03
There's pretty strong evidence that contango in commodity markers predicts abnormal returns, even when naively implemented without any fundamental flows. See figure 4.

Data Snooping
Posted by EspressoLover on 2015-12-01 01:41
> I saw methodology for measure data mining bias by comparing the real performance to performance on bootstrap time series(X1000).

I read the blog links. It seems like a mighty painful way to sub-optimally re-implement cross-validation.
Data Snooping
Posted by EspressoLover on 2015-12-01 08:47
Both approaches are estimating the magnitude of generalization error from test-set variance. Cross-validation is doing it by training with real data, then comparing that to error round in real, but unfitted data. The random-bootstrap approach is doing it by training with real data than comparing to training with random data. The former approach is superior for a number of reasons:

1) You need a way to reliably produce random data that's sufficiently similar to real data. The source blog suggests bootstrap, but that's not robust. There may be inter-temporal structure to the data that isn't due to training set variance, yet still causes overfitting. E.g. imagine at short time horizons returns tend to mean-revert in real data. You could overfit on this data by finding a signal that buys securities with consistently >X% mean-reversion. This could purely be noise in the training data, producing positive returns in sample. Yet the random bootstrapping could still show much lower performance because at any given time there'd be much different distribution for recent mean-reversion in randomly shuffled data.

2) Comparing real to random only tells you if your system did better than one based on totally pure randomness. That's all well and good, but there's a world of difference from knowing that your system isn't 100% data-mining bias, and knowing how much of your backtested performance is attributable to it. That's important not only for setting risk-tolerances and transaction cost thresholds, but also for comparing different parameterizations of a strategy. If Strategy A uses fewer free-parameters than Strategy B, but has lower performance, knowing that they both out-performed random doesn't help you pick. With cross-validation you can directly compare in out-sample space.

3) If you're dealing with any-sort of non-convexity in training, than the random-real comparison is vulnerable to multiple equilibrium. Say it's completely data-mining, how do you know the real-parameterization didn't just stumble into a better than average optimization basin? In that case it will look a lot better than most of the random comparisons. Yet the system is still junk. For K>=10, CV, all the in-sample parameterizations are highly likely to be very similar. That makes it easy to reason about their out-sample performance. Moreover, even if they're not, cross-validation will still reveal data-mining bias even if there are highly-biased but infrequent basins.

4) NFL theorem tells us that we have to be giving up something for cross-validation over real-random comparison. It's true that CV requires us to "sacrifice" 1/K of the training data. In this case CV overestimates generalization error because it trains on less data than we would when training on the entire set. But if you're using K>=10, its most likely that your learning curve if basically flat at 0.9N samples for any sensible system.
Data Snooping
Posted by EspressoLover on 2015-12-02 21:46
The problem with block bootstrap comes when you think about how big to make the blocks. The limiting case would be blocks the size of the total number of observations. In that case you'd just have a single block, and "randomly reshuffling" would simply mean keeping the same single block in the same place. So all the random sets would be identical to the real set. In that case the performance of the real and random sets would always be identical, and you'd reject the system even if it was actually good.

Now let's say you have two blocks, your random datasets would simply be the original series and the original series with the first and second half reversed. Most trading systems would probably perform nearly identical on both scenarios, so again your random tests are likely to match or even slightly outperform your real set. What if you have ten years of data and you break your blocks down by year? Say its a system using 1-minute bars, the fitted trading signals will be nearly identical besides for a handful of discontinuities around the (re-shuffled) New Years. Again you're very likely to falsely reject a good system.

In fact this is a potential problem for any block-size above 1 observation. The random-reshuffling of points tests for data-mining bias when you're using a price series' history to predict the future returns. If Y's your dependent variable (next period return), and X are your independent predictors (derived from the price series up until the present), then reshuffling every observation assures that the joint-distribution of (Y,X) becomes independent. If a fitted successfully predicts a variable using statistically independent variables you know its due to data-mining bias.

But if you use block-size above 1, then (Y,X) isn't guaranteed to be joint-independent. At least for some observations its possible that the independent predictors are still attached to their original dependent variables. E.g. say the "right" system is always buy when last period has fallen by 1% or more. Even for just a 2-block, half the observations remain identical even after re-shuffle. While blocking may help fix the issue of falsely accepting a system due to differences in training set variance between random and real, it introduces the potential to falsely reject systems due to retained dependence in the comparison sets.

Again my recommendation would be cross-validation. To test if a system is 100% data-mined, compare the mean squared error of the out-sample predictions to the variance of the out-sample returns. If the difference between the two is statistically significantly negative, you can reject the null hypothesis of 100% data-mining bias. On the flip side, you can also compare the variance of the in-sample mean-squared error to the out-sample mean-squared error. If the difference between the two is statistically significantly negative, you can reject the null hypothesis of 0% data-mining bias.
Bluecrest
Posted by EspressoLover on 2015-12-03 01:51
> When managers decide to make money for themselves, all of a sudden this 10% volatility objective is... gone.

Principal-agent problem? Without transparency into the strategy an outside investor can't be sure if a sizable drawdown is due to normal variance or *super bad stuff*. When the manager's getting paid on OPM, he's not going to be so forthcoming admitting that the strategy broke.

Lesson from Bluecrest: Be very wary investing in hedge funds with internal prop groups. Would you eat in a restaurant knowing the chef makes completely different dishes for the owners?
Bluecrest
Posted by EspressoLover on 2015-12-03 23:54
@goldorack

> A sizable drawdown depends on volatility. If target volatility is fixed at 30%, drawdowns will just be proportionally higher, nothing else.

I completely agree with this, but you assume that outside investors can transparently observe the fund's target volatility. You could ask the manager, but there's nothing guaranteeing a truthful answer. If investors set drawdown targets as X% of targeted volatility, managers would be highly incentived to inflate the number. Investors might penalize biased vol targets, but in most cases it would take years of realized returns to reveal manipulation. Plus you're using the vol target to derive a p-value of catastrophic failure. Conditional on that, its highly likely the manager is an idiot or charlatan. In that state of the world his reported vol target isn't worth the glossy paper its printed on.

You could observe historical returns, but how do you know the current strategy is consistent with history? How do you evaluate regime change? If its a managed account, you could look at the positions and calc your own portfolio vol estimate. But that's beyond the capacity of most FoFs/pensions/HNWs. Plus it can always be gamed by the fund.

Absent the ability to reliably ascertain a fund's ex-ante volatility, a simple and prudent heuristic is to assume the average of the universe of all funds. Funds that exceed average vol operate with a higher probability of stop-loss, because they have no way to reliably authenticate a non-catastrophic drawdown. Funds that undershoot average vol are leaving incentive fees on the table. Pretty much every fund converges to the same vol target. As more funds target the same level, there's even less alpha to be reaped from evaluating the few strange ducks that don't conform. The heuristic becomes an iron rule. There's nothing magical about 10%, it's just a path-dependent Schelling point.
Machine Learning when train/test split is not random
Posted by EspressoLover on 2016-01-14 00:57
> What can you do if you suspect that you have train and test data that come from different distributions.

Cross-validate the training set. Compare the distribution of CV-error to the test set error rate. If you reject the null hypothesis that test error is equal to CV-error, then your suspicion is correct. If it is you'll also get an idea of the magnitude of the test/train difference. E.g. maybe CV-error is 0.01 and test set error is significantly different, but only at 0.0101. Sure the test set is from a slightly different distribution, but the overall impact on your model is trivial. And you can probably just ignore it.

> What techniques can you use to find the differences and to build a model that will generalize to the test set?

I'm making the following assumptions here: 1) The test set draws from a different but related distribution. 2) The test set is small relative to the training set, so throwing out the training set and just training/testing with a sub-divided test set is infeasible. You want to exploit the statistical power of the training set, while still accounting for differences.

The solution is to train a model in the training distribution, then incrementally update it using the test distribution. For simplicity let's say you use regularized SGD:

1) Divide your datasets into the following 3 sets: the original training set (X), a sub-set of the original testing set used for training (W), and a sub-set of the original testing set used for testing (V).
2) Select regularization and training hyper-parameters for X using cross-validation, call this Lambda-X.
3) Using Lambda-X optimize parameters on the entire X set: Theta-X.
4) Using CV on W, select hyper-parameters for SGD, when initialized at the point Theta-X. Call this Lambda-W.
5) Using Lambda-W, and initializing at Theta-X, optimize parameters on the entire W set: Theta W.
6) Apply Theta-W to V to evaluate true test error.

In short the above works because the optimal parameters for the training and testing distributions aren't identical, but probably do live in the same neighborhood. Just training with the small test set doesn't give us much discriminatory power in finding optimal parameters. But if we train on the training distribution we can get somewhere close to the optimal testing distribution parameters. If we start near that point, we don't need large dataset to find it.
Machine Learning when train/test split is not random
Posted by EspressoLover on 2016-01-18 20:30
The problem with your current approach is you're using unsupervised learning to determine the reason for model performance disparity. If it's like nearly any other financial return model, your R-squareds are well below 5%. By definition the vast majority of statistical structure in the feature distribution is irrelevant to the model. The unsupervised approach is like trying to find a needle in a haystack.

I think you'll get better results with a direct supervised method. Apply the training set random forest to predict the test-set. Take the residual between the predicted dependent variable and the actual dependent variable. Call that ResidualTest.

Now, train a new random forest using the original prediction as well as the independent variables on ResidualTest. That new random forest will tell you which of your features are driving the difference between your original model's performance on training and test data. By including the original predictions you'll also quickly discover if the results are driven by specific interactions. It may be as simple as the model sucks when feature Z < 0, and the test set happens to have lots of points with Z<0.
Machine Learning when train/test split is not random
Posted by EspressoLover on 2016-01-19 06:02
Ah, sorry I misunderstood the situation. So, you have full features for a large test test, but accessing the dependent variable on any given point in the test set is expensive? Consequently you have a much smaller uncensored test set? Throwing some ideas out there (in order of least pain in the ass to most):

1) Are you sure that test set performance is actually worse, not just an artifact of noise from only having a handful of test-points? I'd make sure that you can actually reject the null with statistical confidence. If you take the uncensored test-set points, and regress Y on Y-hat, does the t-stat indicate a significantly different coefficient than training set out-sample/bag?

(As an aside, are you un-censoring random test set points? If not, are you sure the way you're picking points isn't introducing bias?)

2) Instead of comparing the means/variance/covar of the underlying feature set, you should do that for the predictions of the individual trees in the forest. That's a better projection to what's directly relevant in the model. It could be that certain trees are behaving much different in test. If so, you can isolate a sub-forest that represents the screwiest trees. From there, compare the relative importance of variables in the sub-forest to the entire forest.

This approach might be easier if you re-train the original model with boosting, since the individual trees will tend to be more independent of each other.

3) My impression is that you can select additional points to uncensor, but can only do so at a slow or limited rate. If your goal is to ultimately adjust the model to work on the test set, then you should think of a smart way to pick subsequent test set points. I'd try to employ some type of online algorithm that attempts to predict ResidualTest. Initialize with the limited test set points you already have. Based on the current state of the model identify the censored points that are predicted to have the largest discrepancies.

Un-censor those points which are likely to have the most influence. Then iteratively train on the next batch, re-rank, and repeat. Off the top of my head I'd suggest SGD with X being a combination of the original feature set and individual tree predictions.
how shall I confirm which strategy is better according to their capital curves?
Posted by EspressoLover on 2016-01-26 04:47
They look nearly identical. Use the strategy that's least likely to be overfit, i.e. fewest free parameters or simplest logic.
prime broker for a start-up?
Posted by EspressoLover on 2016-02-16 19:34
Since we're on the topic of IB as a quant prime broker... Does anyone know why (or a way around) they cutoff MOC order submission for NYSE ARCA at 15:45? NYSE ARCA allows MOC submission until 15:59, and most decent brokers seem to support that. This is a major-ish pain in the ass if you're doing any sort of overnight stat arb on US ETFs.
Forex Limit Orders
Posted by EspressoLover on 2016-02-24 20:31
> For limit orders, I am assume there is no slippage and I get my fill at the price I set.

It doesn't work that way. Say the market's bidding 100 and offering 101. You're trying to buy and join the bid at 100. If the market suddenly goes up the price moves away, and you'll have to re-adjust. If the market suddenly drop, the price moves through you and you get filled. If you used a market order, you'd pay the spread, but avoid missing out on upside. With limit orders you participate in downside moves and miss out on some upside moves.

Limit orders are definitely not a free lunch. They also depend on how sophisticated your execution algorithm is. Joining a large bid has a higher expected cost than joining a small bid. Being quick about adjusting your limit orders on price changes gets you executed sooner. Things of that nature. In practice passive execution isn't significantly cheaper than aggressive execution. If it was liquidity providers would tighten their quotes until equilibrium was restored.
Forex Limit Orders
Posted by EspressoLover on 2016-02-25 22:49
Yes, that's true. Fair point about the mean reversion.
Forex Limit Orders
Posted by EspressoLover on 2016-02-26 02:46
The random walk model still qualifies as "adverse selection" in the philosophical sense. It's true that In a random walk, no trader is informed about the long-term drift of the price. When I put in a limit order, I'm uncertain about whether the next trade is going to be a buy or a sell. In contrast, the trader who submits the next market order does know this piece of information. He knows it by definition, since his own actions determine it. (And he is, again by definition, positioning himself to capitalize on it).

The nature of a continious double-sided auction is that passives wait for crosses, and aggressors create those events. Since the timing and direction of crosses is itself relevant information, the participants that precipitate them have an innate informational advantage. This is particularly true in a random walk model, where only the endogenous activity of the book influences the price.
Forex Limit Orders
Posted by EspressoLover on 2016-02-26 13:27
Yes. Exchange-supported order types are internally processed inside the gateway and should guarantee atomicity. (Actually this isn't always the case, depending on exchange and order type, but it *should* be the case. If it's not, it's because the exchange is incompetent and/or dishonest in some regard.). You can submit an IOC, and the matching engine will process the cancel immediately after evaluating the order for matches. You could always use a "synthetic IOC" by handling the cancel logic client-side. But then you're sending two distinct messages. It's possible that a third-party order could cross your resting quote in between.

In practice this is really not an issue unless you're doing high-frequency trading. Most of the time the time that it takes for an order to move from the back of the queue to execution is orders of magnitude longer than execution latency. This is an issue for HFTs because they tend to cluster their activity around events, like new level formation, when markets tend to move very quickly.
Vol arb or short vol CTAs
Posted by EspressoLover on 2016-02-05 01:45
I don't think 20 sigma drawdowns are characteristic of a well-designed volarb program. A lot of people may sell naked puts and call it "volarb", but a lot of people also call Michelob Ultra "premium beer".
Forex Limit Orders
Posted by EspressoLover on 2016-02-25 21:43
@momoop

I think your next step is to break down the average return to trades by horizon. You're measuring that your trades are making X pips when held to the next trading event. But you should be aware of what their average returns are between when you enter and the next {1 second, 30 seconds, 2 minutes, 10 minutes, 1 hour, 4 hours, 1 day}. Do the trades pretty much realize those 2 pips in a very short period, and then you just sit on the position for a long time until the next tradable event? Or do the trades move in that direction very slowly over a long time?

If it's the former, then you just have a high-frequency signal that happens to trigger very infrequently. That means you're competing against high-frequency traders, and that's a tough game. If you want to go down that route, you're going to need to push a lot more volume to pay for the infrastructure and overhead that you'll need. You'll have to figure out how to get your trading signal to trigger a lot more often.

If it's the latter, then you have a long-term signal, that just happens to be very weak. If this is the case it's a lot more likely that the signal is simply an artifact of statistical noise or overfitting. You're going to have to get more rigorous actually demonstrating statistical significance. Once you're rejected that null hypothesis, you need to figure out how to take the underlying economic logic of the tradable signal and refine into something with a larger magnitude.

@ronin

I'll add a caveat. A random walk does involve adverse selection, but in the absence of informed trading the cost of that adversity can never exceed the expected benefit of capturing the spread. In a pure random walk a limit order always has non-negative expected value

@svisstack, try it out for yourself with this R-script:

#!/usr/bin/Rscript

# Price is a random walk of $0.01 uptick/downticks
price.tick = ((runif(5000) > 0.5) - 0.5) * 2;
price.series = 100 + cumsum(price.tick)*.01;
is.next.up = c(price.tick[-1],F) > 0;

# Buy limit orders always fill if price moves down through the order.
# If price ticks up assume limit order fills for some finite odds
prob.fill.for.price.away = 1/10
prob.buy.limit.fill = ifelse(is.next.up, prob.fill.for.price.away, 1)
did.buy.limit.fill = (runif(5000)
# Compare unconditional forward returns vs. returns when limit order
# is filled.
price.series.fwd = c(price.series[-(1:10)], rep(tail(price.series, 1), 10));
price.fwd.return = price.series.fwd - price.series;
avg.return = mean(price.fwd.return)
avg.return.if.filled = mean(price.fwd.return[did.buy.limit.fill])

slippage.cost = avg.return - avg.return.if.filled
print(slippage.cost)


*Edit: Thank you il_vitorio, for pointing out a script error that I made in the original post.

numerai
Posted by EspressoLover on 2016-05-06 22:19
I'm pretty skeptical. I'd give this fund less than a 5% chance of having over $500mn AUM in 5 years.

The business proposition is have a bunch of quants independently work on signals then combine them together. Except that's what Two Sigma already does. And it doesn't need to encrypt its data because its has non-competes and deferred comp. Lack of domain knowledge is still a major handicap for almost any problem. Even with bleeding edge machine learning, handing some smart people a bunch of unlabeled features is not a recipe for success.

I'm sure in some sense Numerai's signals will have some marginal information to the rest of the market. But in quant trading the relationship between signal strength and profit is extremely non-linear. If you can achieve the same R-squared as the best participant, you have a multi-billion business. At 90% you have a multi-million dollar business. At 75% you have nothing.

Numeral may be able to harvest some unique signals. But there's no way that some smart hackers with no market knowledge will independently develop anything close to the bread-and-butter alphas which have basically been refined and passed back and forth between quant shops for the better part of two decades. Which I think speaks to why RennTech is interested.

As a standalone trading vehicle, they're untenable. But invest a couple million in a managed account where you can track their positions in real-time. (Or maybe even trick them into revealing the raw signals). You've just leased a potentially orthogonal alpha stream for Medallion. At 2/20 this costs you less than the annual comp of the receptionist.
numerai
Posted by EspressoLover on 2016-05-09 04:05
> Your sentence is totally unintelligible to me. What do you mean by "signal strength"? R-squared on what? What is a "best participant"?

Let's say you have a backdoor into RennTech's net alpha stream on some strategy for some stock. However the data comes over a noisy channel. Half signal, half noise. You have 50% of RennTech's R-squared (with no marginal information) on whatever the hell dependent variable they fit. (1 minute mid-price returns wouldn't be unreasonable). After shrinkage, your signal averages root(1/2)'s RennTech's average magnitude. How much money can you make? To a first approximation: zero. Definitely not half what RennTech makes.

Why? Because you have significant adverse selection. Every trading opportunity where ||Alpha|| > TCosts, falls into one of three categories:
1) RennTech's alpha does not exceed TCosts, and yours only does because of the noise. In which case the trade is negative EV.
2) Both of you and RennTech want to trade. It's positive EV if you get filled. But even if you have the same non-deterministic latency (which you don't), you only get filled 50% of the time. And these types of trades just don't happen that often. RennTech doesn't need to shrink their alphas, so it's likely that they trigger well before you. Your only real opportunities are big discontinuous jumps. That game's about latency, not learning.
3) Both you and RennTech have positive EV alphas. But RennTech doesn't want to trade for inventory related reasons. Exceedingly rare. RennTech has for all intents and purposes infinite money relative to Medallion's capacity.

In this toy model, the vast majority of your trades are negative EV. Medallion's alphas are worth billions, but 50% of Medallion's alphas are worth nothing. And this toy model is very close to what Numeral is dealing with. There only hope is to data mine marginal information that others don't have, while keeping non-marginal information out of their models. That's just not going to happen. Almost all the major sources of alpha are pretty much solved. Discovering something that RennTech, Citadel, Two Sigma, KCG, HRT, et al. don't know is nigh impossible. (And whatever is new to discover, is almost certainly not a well-defined feature in a "curated dataset"). Collectively the major quant shops have near perfect LOB alpha, near perfect residual reversion, near perfect order flow models, etc. Even being wildly optimistic Numeral may get 2-3% marginal information relative to any major traders, and capture 80% of the non-marginal information. But without anything special on the monetization or infrastructure side, that simply doesn't mean squat.
Brexit
Posted by EspressoLover on 2016-06-23 10:46
How to pretend like you give a shit about the election.
Brexit
Posted by EspressoLover on 2016-06-24 12:40
I feel like these things almost always overreact. A lot of window dressing. Nobody wants to be seen holding GBPUSD, FTSE or short vol tomorrow. Either by their boss or investors. "You've lost a lot of money, now do something for gods sake!" For some reason it looks worse to be down 7% with an open position in GBPUSD than it is to be down 10% and flat.
Brexit
Posted by EspressoLover on 2016-06-28 00:43
@nomaly

Two weeks ago the VIX had an unprecedented move up relative to very little action. That was driven by huge volume putting on downside hedges to prepare Brexit. Today was an unprecedented down relative to the S&P falling so much. I think today was the inverse move of those hedges being unwound.

In general, this month the VIX has been behaving very weird. Of course there's plenty of vol and uncertainty in the historical record, but I don't think the market has ever been so hinged around a single pre-scheduled binary event. Hopefully things are kind of back to normal.
Brexit
Posted by EspressoLover on 2016-06-28 01:17
@chiral3

Makes sense. A lot of these weird moves seem to unwind after-hours. Even today, VX futs shot up like a sling-shot between NYSE and CBOE close. Over the past couple weeks market irrationality seems to peak everyday between 3:45-4:00. That's pretty consistent with the hypothesis of structural and program flows driving strange behavior.
Why do option MMs misprice options intraday?!
Posted by EspressoLover on 2016-08-19 17:19
W.r.t. overnight returns: The effect is highly entangled with the earnings announcement premium. The majority of earnings surprises are to the upside. Management mostly guides expectations to under-promise and over-deliver. Since earnings are announced after-hours, the premium primarily accrues overnight. If you restrict the index to non-announcing firms, the overnight premium is significantly reduced.
Why do option MMs misprice options intraday?!
Posted by EspressoLover on 2016-08-18 23:22
It's not unusual for risk factors' to have their returns concentrated either intraday or overnight. For example the market factor and momentum primarily accuse overnight. Size and value: intraday. (See
here and here).

The reasons involve a variety of structural issues. Both microstructure and order flows behave qualitatively different in the morning versus afternoon. For a lot stat-arb desks this kind of stuff is their bread-and-butter. Right off the bat with your findings, we know that the returns to equity vol are inversely correlated with the returns to the market factor. Since the market is significantly more positive overnight it's not surprising that vol positions decay more during that period. I'd test for that first, your findings simply may be a proxy for that effect.

Second, I wouldn't assume that this means that OMMs are unaware of the fact, or even exposed to it. Market makers everywhere set prices, that doesn't mean that all market anomalies are the consequence of their ignorance. I guarantee you Virtu and KCG have heard of equity momentum. Yet the effect still persists. Electronic market makers keep very small inventories relative to flows. They may adjust prices on the margin to bias their inventory with a tailwind. But they're not taking large enough positions to materially arb away anything longer lived than a few minutes.

OMM's portfolios are generating significantly higher risk-adjusted returns then your findings. It wouldn't be worth it to waste capital chasing the overnight effect. Instead prices reflect the equilibrium that keeps OMM inventories nearly perfectly flat. If the anomaly does exist beyond statistical noise, it's almost certainly due to real flows, not OMM mis-pricing.
Why do option MMs misprice options intraday?!
Posted by EspressoLover on 2016-08-19 23:14
In practice, it's pretty common to use 10:00 AM continuous trading prices. Liquidity is atrocious at opening auctions. Most of the common overnight effects still hold up if you measure this way. Closing auction's pretty fair though. Liquidity's thick, adversity's low and market impact tends to be smaller than continuous trading.

Not many studies on auction impact (best I know of is here). But in general, estimating market impact at auctions is easy compared to continuous trading. If you have the order data, you can easily derive the change in auction price from adding an additional order. The only empirical question is the market response to the imbalance broadcast. But that's straightforward to regress.
Why do option MMs misprice options intraday?!
Posted by EspressoLover on 2016-08-23 22:34
@murfury

I finally got a chance to read the entire paper. My overall opinion: Two thumbs up! I'm impressed about how thoroughly you guys investigated the various angles. Couple quick points that stick out to me:

1) FTA: "A delta-hedged option portfolio has zero delta and thus zero beta". This statement is not true ipso facto. Because of the leverage effect, equities IVs have negative correlation to the underlying. Your paper actually even directly refutes this in Table 11. The delta hedged index-option portfolio still exhibits statistically significantly negative beta exposure. Not to be nit-picky, since you still do find a significant anomaly, even with the beta adjustment. But generally be careful about making assumptions about delta-neutrality implying beta-neutrality.

The other consideration may be to evaluate the index-options using 16:00 mid-quotes as an implied close. The results probably are not substantially different, but it could have an impact. This sticks out, because you're not finding an overnight equity premium in the futures, whereas others are in the ETFs. The difference in closing time may be the issue. Also a non-negligible proportion of earnings announcements occur between 16:00-16:15.

2) W.r.t. to client flows, I may be missing in the paper where you tested. But as far as I can tell, your measures of order flow suggest that it *does* drives the effect. In Table 7, the pre-close 5th period seems to have the most long-option order flows. In 3 of the 4 categories (besides equity puts), order flow imbalance is strongest towards long-vega during this period. Similarly the beginning of the day seems to be the most imbalanced towards selling options (i.e. selling vega).

Assuming that OMMs are mostly sitting on the other side of client-initiated trades, this is consistent with a flow driven explanation. "Real" traders seem to bias going long vega near the end of the day. This bids up IVs and options prices at the close. Options become systematically too expensive to buy at the close. Vice versa for the open. Overnight returns become a repeated game of buying high and selling low.

3) W.r.t. to earnings announcement. You may be able to get a rough estimate of its effect by using a proxy variable in your robustness regression. E.g. number of companies in the index announcing earnings that night.
Why I trade & the future
Posted by EspressoLover on 2016-08-28 18:59
Neutrino beams are old hat. Infinite improbability drives, powered by social media block-chains, are now de rigueur.
numerai
Posted by EspressoLover on 2016-09-13 00:03
> data work can be done cheaper abroad. so, this model has legs.

Maybe. But if that's the value proposition, Numeral is a hell of a way to get to that point. Opening offices and hiring researchers in Bangalore, Buenos Aires and Belgrade seems like a lot less complex way to take advantage of that. Throwing in all the SV buzzword crap like "distributed research", "cryptographic anonymity" and Bitcoin is just introducing a lot of untested points of failure into the core business model.

For that matter, there's no barrier stopping Akuna or AQR from already shipping quant work overseas. For the most part they don't. To the extent these firms have overseas offices, they're almost always either for back-office software or front-office work exclusively dealing with local markets. So I'd be pretty suspect about how much untapped global supply of cheap talented researchers there actually are.

Alpha and trading research tends to be way more complex and have a way higher quality threshold than most technical work. But third-world off-shoring struggles to produce acceptable results on even basic software projects. Again, I think there's a reason that Bangalore isn't a quant powerhouse. And I don't think Numerai will be changing this anytime soon.
Stripping down the robo-advisors: sparrow-brains inside
Posted by EspressoLover on 2016-09-18 01:12
W.r.t. condition correlation, in most markets this is almost always an artifact of high market volatility relative to idiosyncratic vol. Correlation increases monotonically sub-linearly with both beta and the ratio of market vol to idio vol.

Market volatility is much more regime sensitive and varies to a much greater degree than average beta or average idio vol. I feel like most of the time the intuition behind the phrase "correlations go to one" is wrong. It really doesn't have anything to do with stocks moving more in response to the market (beta). It also isn't stock-specific information becoming irrelevant (idio vol). Rather the noise from market volatility rises to the point where it's overwhelming the non-perturbed stock-specific information.

Saying that diversification stops working is mis-leading. Diversification still reduces risk by the same magnitude both in high and low correlation environments. But in the former high market volatility makes this risk reduction seem small in comparison. It's a little like saying that aspirin stops working when you're shot in the knee cap. Aspirin still delivers the same amount of pain-relief, but you have bigger problems than just a headache.
Prop firm expected RoR on margin capital
Posted by EspressoLover on 2016-09-23 23:38
(1) This is a pretty typical example of an equity market neutral fund. 4:1 leverage for an overnight beta neutral strategy in liquid developed market securities. Obviously depends on the specific risk characteristics.

If you or a friend have an IB account with over 100k, you can check their portfolio margin calculator on some sample portfolios. Prime brokers probably will give you better margin, but IB should be around the neighborhood. Depending on your investor though, they might have more stringent leverage restrictions than your prime broker. I wouldn't really recommend advocating for anything much north of 4:1 on an untested strategy, it's not necessarily a good look.
LENR anyone?
Posted by EspressoLover on 2016-09-27 16:19
I like to pretend that somewhere on the Internet there's a particle physics forum, with a 5 page thread of random speculation about Fibonacci charts and dollar cost averaging.
Why is the quant community skeptical about sentiment models?
Posted by EspressoLover on 2016-10-02 22:39
Because talk is cheap. We already have a very good signal to gauge sentiment, it's called market activity. And unlike twitter, it involves a pretty credible commitment to your beliefs. If Alice is betting millions of her own money against thesis X, and Bob writes a nice little story defending X, who do you trust? When sentiment works, it's usually because it just re-confirms existing market activity. Sentiment without market activity usually means nothing. E.g. Facebook made a cool robot that everyone's talking about, but isn't going to move the bottom line. In contrast market activity, without visible sentiment, is usually still very important. There's plenty of very well-informed participants who trade without uttering a word.

At the end of the day, sentiment-funds are competing in basically the same space as traditional stat-arb. I do think sentiment does have some minor amount of mutual information to a good stat-arb system. Maybe 5% optimistically. Most of that coming from interactors rather than direct impact. In contrast the traditional complex of stat-arb alphas probably have 100%+ mutual information to the best sentiment signals. All that being said, sentiment probably does have a place as a small sub-division at major, mature stat-arb desks. But a fund trades purely on sentiment alone has a snowball's chance in hell at competing in anything other than sexy buzz-word marketing.
Why is the quant community skeptical about sentiment models?
Posted by EspressoLover on 2016-10-06 01:01
> Sentiment is something you feel, not something you count. 

Disagree. Although I'm skeptical of the ability for (public) sentiment to generate alpha in markets, sentiment analysis itself is pretty proven. State of the art models exceed 80% accuracy.

As humans, we're naturally biased to think that our utterances are oh-so-deep and sophisticated. In reality, most language isn't really much more than the glorified grunting of great apes decorated with prepositions. There's a hell of a lot more EL James's than George Orwells. The vast majority of actual communication, particularly the spoken or impromptu variety, pretty closely resembles "word salad". Hence why even pretty simple bag-of-word models still achieve high accuracy.
Making software like THOR by RBC
Posted by EspressoLover on 2016-10-20 02:38
@a*

IEX has 50+ employees and $100 million in funding. If we insist on evaluating them like a tech company: that's as well capitalized as Google when it was already serving 10%+ of all search. Surely with these type of resources we'd expect "decent talent" to have produced *something* of merit. Where is all this vaunted technology?

Can anyone actually identify any of this supposed technological innovation. At least some contribution, that's an objective improvement, on the prior art. Maybe, I'm missing something here. Because all I see is some seriously broken market data, obsolete protocols, a shitty order router, ludicrously over-priced fees, a bare-bones selection of order types, and some awful performance metrics. Somehow all these bugs get marketed as "visionary" features.

What's astounding is that Knight and Citadel, as a side project using the technology of ten years ago, built an objectively superior exchange* at less than 1/10th the cost of IEX. The only tech the pioneering talent at IEX seems to be good at is the reality distortion field.

Edit Addendum: *DirectEdge
Making software like THOR by RBC
Posted by EspressoLover on 2016-10-21 00:12
@a*

Fair enough. I really respect your knowledge of the industry and judgement of competence, so I'm happy to concede the point about their tech talent.

But damn, if it doesn't seem like a waste. It reminds me of Microsoft's Windows team, so many brilliant people wasting so much effort on an irredeemably broken product. Isn't there something tragic and absurd about building a great tech stack, then just sitting it behind 60,000 meters of cat5 cable?

Maybe you are right that it's unfair to criticize with inflammatory language, and normally I wouldn't. But I think IEX as an organization has major ethical culpability. (Which also, you are right doesn't necessarily extend to the rank-and-file devs). I think it's unquestionable that only a small minority of their "real flow" (i.e. not the liquidity provider counter-parties) comes from informed, rational investors. The vast majority is driven by fear-mongering, irrationality and misinformation. Jim Clark made a big investment, then got his buddy Michael Lewis to whore out any semblance of journalistic integrity he had, and scare the bejeezus out of middle America. Now an otherwise unviable exchange exists, because the retail sub-segment of Zero-hedge reading paranoids instinctually click "Route to IEX" on every order.

The whole affair reminds me of the anti-vaxxer movement. Just hysteria and propaganda, completely detached from any evidence or research. And like anti-vaxxers, ignorance wouldn't be so deplorable if it didn't have actual human costs. And not just the wasted millions in exchange fees from retail flow that otherwise would have been internalized. But there's a sizable growth in the segment of the population that refuses to invest in equities, the most effective long-term savings vehicle in history. Why? Because 60 Minutes and Michael Lewis "proved" that the market is "rigged". Thousands will be condemned to unnecessarily live their elderly years in poverty, thanks to a disinformation campaign orchestrated to buy Jim Clarke a slightly bigger yacht.

At least I can take consolation in that IEX as a business seems doomed. Any upward trajectory they had is long gone. Their market share seems to have hit a hard ceiling at 2%, and they've actually been declining YTD. Not surprisingly selling a broken product based on fear-mongering seems to relegate you to serious niche status. If they're stalled at low-single digits, the whole business model is unviable. I see no way at their size how they will continue to afford top-tier talent. (How many rock stars does the Chicago Stock Exchange employ?). No growth means their valuation multipliers have to fall back to Earth. Low and falling valuations mean that stock options aren't a viable form of compensation. Declining compensation means a big exodus of talent. For a company that's built on being the next revolutionary financial "disruptor" , the clock is running out to deliver.

But don't worry, Katsuyama has one final "fuck-you" up his sleeve. He's doing his best to ruin the National Market System, by turning NBBO into garbage. If he wants to run the world's dumbest ATS, fine, whatever, go for it. But the 98% of the market that doesn't want anything to do with this absurdity, shouldn't be forced to buy tickets to the freak show. Prices that aren't disseminated with reasonable best effort and efficiency, should never be eligible for protected quotation status. Thanks to IEX's marketing and PR trumping intelligence and common sense, this is no longer the case. Speed bumped are now eligible for NBBO, despite completely contradicting the point of NMS.

A price that's subject to last-look, which effectively speed-bump'd quotes are, is in no way directly comparable to a price without the embedded option. NBBO doesn't account for the expected value of being short the embedded option. The whole thing is doomed to play out like Gresham's Law, bad quotes driving out good ones. Exchanges will just keep giving market makers more and more free embedded options, to "improve" NBBO, which will drive flow (and fees) to said exchange. NMS with speed bumps is even worse than repealing NMS. And even if IEX goes the way of the dodo, the genie's out of the bottle. CHX (another low-quality, low market-share exchange) is adding an ever dumber speed bump than IEX. Whatever the problem's with Reg NMS are, we're on the precipice of transitioning to a far worse regime, driven by a tiny, 2% sliver of the market throwing a totally uninformed hissy fit.

/rant
Making software like THOR by RBC
Posted by EspressoLover on 2016-10-21 21:06
"The United States stock market, the most iconic market in global capitalism, is rigged."
-Michael Lewis on 60 Minutes

Reasonable people can disagree about the effect of HFT or optimal market microstructure. But no reasonable person could conclude

A) That Lewis' statement is supported by any more peer-reviewed evidence than "Polio vaccines cause autism"

B) That the world's preeminent financial writer, broadcasting that message to 10 million viewers, doesn't broadly discourage stock investment.

C) That statement, the accompanying book, and the bulk of Lewis' mindshare in 2014, didn't effectively act as the pre-release marketing campaign for IEX.
Making software like THOR by RBC
Posted by EspressoLover on 2016-10-20 00:52
Sure, staging orders at co-located exchanges seems like the simple, easy and obvious solution. But remember IEX is the "Navy Seals" of financial markets! And remember it ain't easy to be a Navy Seal. Taking on HFT, just like taking out Osama bin Laden, requires some serious tough guy shit. Hardcore methods, like using a really, *really* long network cable.

Just using a deterministic software timer would be way too simple. Why only worry about the nanoseconds of jitter in a CPU clock, when we could deal with milliseconds of jitter from inter-site networking. Do you think Navy Seals take shortcuts?! Obviously we know IEX does some seriously advanced stuff, otherwise why else would they charge three times the fees of other exchanges.
Brexit
Posted by EspressoLover on 2016-11-05 01:09
Well, since I think the US election's market impact is in the spirit of this thread I'll leave this here. Some back of the envelope math. Trump's betting market probability rose from ~14% ten days ago to 26% today. The VIX rose from 13.5 then to 22.5 today. So to a first approximation, the market is pricing VIX at 67.5 in the case of a Trump victory.

Even if you shrink the interpolation by 30%, that still implies a VIX close above 47 in a Trump win. Implied vols at that level were only ever hit during the financial crisis and at the very peak of the US sovereign rating downgrade. The market's fear of the election seems... overwrought. At the very least there's there's clear discord between vol market, betting markets, and equity market valuations. At least one of them has to be wrong.
Brexit
Posted by EspressoLover on 2016-11-09 22:44
The cynic in me is wondering if the forecasts about Trump's market impact weren't largely politically coordinated. Either directly or implicitly. How hard would it be for a campaign to approach ideologically sympathetic sell-side analysts and financial journalists? Encourage the former to publish research forecasting steep declines and volatility if your opponent wins. Then have the latter hype stories on Bloomberg, WSJ, etc. highlighting that research. Those forecasts themselves become more plausible, which increases their coverage, and so on. Like a feedback loop to manufacture consensus.

If you could pull it off, it would seem to have a high political ROI. Even Trump's most ardent proponents had come to accept that market turmoil was inevitable, at least for a little bit. In 2012 almost every Fortune 100 CEO who endorsed, endorsed Romney. Trump wasn't endorsed by a single one. At least some of that shift is probably attributable to the expectation of financial chaos. That's a lot of fundraising and "respectable" support to pivot away from the traditionally pro-business party.

All of this would sound crazy even a few years ago. But the thing is we already have good evidence of campaigns back-channeling to journalists to influence favorable coverage. I'm not a fan of Donald Trump, but was it really rationally plausible to forecast a totally unprecedented response to the election? Admittedly hindsight is 20/20, but the analysts predicting double-digit declines seem either hysterically irrational or ideologically corrupted.
Brexit
Posted by EspressoLover on 2016-11-10 23:22
Sector rotation is insane today. Especially considering that index vol is so muted. Don't understand the tech selloff. Utilities, staples and REITs make sense on higher expected rates. But I haven't seen a plausible justification for why GOOGL, NFLX, AMZN, et al. are worth so much less today than yesterday.
Teza leaving market making
Posted by EspressoLover on 2016-11-11 05:05
Interesting. Prima facie I don't understand why they'd completely abandon prop. FTA, it's still clearing $80 mn/yr in revenue. Let's say even if you have to pare that back to $40 million to keep expenses contained at 50%, that's still $20 mn/yr in profit. Let's also say Misha just keeps it as a secondary business and hires some top talent to handle the day-to-day. That'd cost at most 60% in comp, Misha'd still clear $8 million a year from the operation.

In contrast a $1bn quant fund that puts up 10% ROI at 2/20 "only" brings in $40mn/yr in revenue. Even with modest expenses and comp, you're not clearing much more than two or three times the prop estimate. And that constitutes much more volatile cash flow.

Anyone have color on how Headlands and/or Radix are doing?

Brexit
Posted by EspressoLover on 2016-11-22 00:55
@rowdy

I know some people in the ad-tech business. I get the impression it's a pretty dirty business, and that FB and Google have a fair bit of mud on their boots. Things like clearly looking the other way on black-hat ads (banned keywords, 1-pixel display ads, bot impressions, etc.). Especially right before the close of a fiscal quarter where they need to beef up the numbers. (Actually I hear FB is the least bad of the bunch, ironic given Google's "do no evil" mantra)

I think right now ad-buyers love digital because it's got the halo of futuristic tech-utopa Silicon Valley culture around it. And yes, eventually nearly all advertising will be digital. But I can't help but expect, that all the ad-tech bullshit is really going to start becoming more obvious. Ad-buyers are going to migrate away from digital, or at least the existing tainted players, until it cleans up its act.

http://fortune.com/2015/07/01/online-advertising-fraud/
Climate change
Posted by EspressoLover on 2016-11-25 08:26
What about geo-engineering? I'm pretty sure a desperate world would at least try stratospheric sulfate aerosols. It's cheap and simple enough to be within the capability of even a single mid-size nation acting unilaterally. Then, if you get unpredictable catastrophic side effects from global dimming, the Rochester/Miami pairs trade could runaway in the opposite direction.
Brexit
Posted by EspressoLover on 2016-12-08 20:46
Anyone have any color on what's going on with vol markets? Vol's rising substantially despite market hitting new highs. Depending how you're measuring it, the relative move over the past two days is +4 sigma. I don't see any real news or factors that are driving this.

Besides maybe everyone's scared shitless that things are over-valued, but managers can't liquidate because their benchmarks are rising too fast. So they stay fully invested and just buy protection. I don't know why everyone would have came to that realization in the past 48 hours though. Anyone have any insights?
Brexit
Posted by EspressoLover on 2016-12-09 20:58
@sigma

Interesting... Adding to vol peculiarities, Russell skew is way lower than S&P skew right now. I guess one could read more into this weirdness, but Occam would suggest that maybe it's just a deluge of dumb money. Amateur hour at the CBOE.
level vs change in systematic strategies
Posted by EspressoLover on 2016-12-14 19:53
Sounds like the second strategy is just a proxy for trend. The back month futures almost always has smaller beta to spot price or curve average than the front month. So if carry has increased over the past month, it's most likely because prices are rising, with the front months rising faster than back months.

Trend is usually considered a separate factor to carry, and historically correlations are quite low.
FPGAs for a small prop group?
Posted by EspressoLover on 2016-12-22 20:34
Not an expert in this area by any means. But here's my two cents.

Putting logic in hardware sucks. Think about doing as much as you can in software, then just asynchronously handing over flat values to the FPGA hot path. E.g. having the FPGA do the order and inventory management is really a lot of unnecessary work. Keep all that in software, then just asynchronously update the FPGA with a maxQty scalar or hotCancelId set.

Sure, sometimes the value will be out-of-sync, and occasionally you may try to cancel an already canceled order or miss a trade because you're still counting that order against your potential inventory. But slightly bumping out position-limits is probably worth it to save hundreds to thousands of man-hours on FPGA development and debugging. This keeps developing against the FPGA pretty simple. It minimizes the surface area of the interface, and keeps the underlying behavior straightforward to simulate or test against.

As for FPGA developers, I really don't think trading experience matters at all. Especially if you're keeping as much as you can in asynchronous software. The FPGA specific logic should be pretty close to: pull X out of the last datafeed packet, if (X > Y) write Z to this pre-formatted order packet, then copy to network DMA. Most people with an academic CS background should probably have some Verillog or VHDL experience from their architecture courses. Even if not, it's really not that hard for an experienced C developer to pick up.
Short-Term Cap Gains Taxes and Compounding
Posted by EspressoLover on 2017-01-17 06:22
(Pre-emptive apologies for the Amero-centric thread. International discussion obviously welcome, but my perspective's American.)

Short-term trading is obviously tax-disadvantaged because it's taxed as ordinary income, which roughly works out to ~40% vs ~20%. But the more subtle disadvantage is that taxes are harvested every year. For high return portfolios this significantly impairs the gains from compounding. Over long periods, the compound drag can actually be substantially worse than the higher rates.

A toy example: Let's say you're running a short-term trading strategy with 50% annual ROI. After a 40% tax rate this nets out to 30% a year. Compounded over 15 years, a $1 million initial investment would grow to $51 million. Now let's say you had a tax-shield which keeps the same rate, but defers the tax bill until the end of the period. The $1mn compounds to $437mn pre-tax in fifteen years. After paying 40% of the realized profits, you'd net out $262 million. More than 5 times the final capital of the base case.

Some imperfect ideas to mitigate this, and their major limitations. (Not a tax lawyer, so obviously any of the below certainly doesn't constitute *advice* and may be (very) incorrect):

- Charitable Remainder Trusts: This seems like the best option. Allows the investor to defer taxes until trust distribution. Biggest disadvantage here is having to preset a fixed distribution schedule, which substantially limits flexibility and liquidity. Also you lose all the money if you die before the trust expires (meaning you need to keep upping life insurance coverage as the trust grows).
- IRAs: This seemed to work for Romney, Thiel and Dustin Moskowitz who all compounded $5,000 contributions into $100+ million portfolios. But there's pretty substantial limitations for traders: no margin, no short-sale, no trading more frequently than every three days.
- Puerto Rico Act 22: US citizens who spend 180 days in Puerto Rico are totally exempt from cap gains on trading. Obviously this not only limits the compounding drag, but any tax liability. (In the toy example, you'd wind up with the entire $437mn). Your wife and kids may be slightly less than enthused.
- Reinsurance Reserves: Convert the strategy capital into reserves at a reinsurer. Capital gains is timed based on when the reserves are sold, not when the underlying securities making up the reserves are sold. Don't know how viable this is unless you're already a huge investor like Paulson. Also not sure how extensive the risk limits are for insurance reserves.
- Basket options on the portfolio: I know RennTech was doing this for a while, but my understanding is the IRS has pretty much shut this down.
- Find a tax-exempt entity to run the strategy. Donate the original capital, have your management company run the strategy, and retain a high incentive fee. Even if you end up paying out 50% of the final capital, you're still way ahead of the base case. If you stop trading earlier rather than later, you're out the entire original principal though.

What else am I missing here? I'd figure a quant community would have a lot of experience here. Mainly since they tend to mostly engage in short-term trading, and have high return portfolios (and hence high tax-compounding drag).
Short-Term Cap Gains Taxes and Compounding
Posted by EspressoLover on 2017-01-17 10:23
Keeping the same rate, but deferring the realization actually means that the US Treasury comes out ahead. Consider the original toy example. Assume the government's long-term discount rate is 3% (current 30 year yield). In the base case (taxes paid every year), the IRS collects $33 million over the 15 year life-time. Discounting the cash flows yields an NPV of $23 million in tax liability.

In contrast in the deferred scenario the IRS is collecting a single tax bill at the end of the life time. But it's substantially larger than the base case: $174 million in 15 years. That's an NPV of $111 million in tax liability.

The way the math works, Uncle Sam actually comes out ahead in deferring tax realization for any investor with an ROI higher than [discount rate] / [capital gains tax rate]. For 3% treasury yields and 40% short-term cap gains rate, that's a 7.5% hurdle rate. If you can consistently out-perform 7.5% annualized returns, it's actually patriotic to defer tax realization.
Short-Term Cap Gains Taxes and Compounding
Posted by EspressoLover on 2017-01-17 10:47
Sorry. To be clear, I was referencing the toy example from my original post. In this case, the arbitrarily assumed lifetime of the strategy was 15 years. After which presumably the fabled investor retires from active management, cashes out into a low-fee Vanguard fund, moves to Boca (in the low money scenario) or Palm Beach (in the high money scenario), and takes up golf and watching NCIS re-runs.

Obviously this is just a make-believe scenario. And as far as I know, no exact tax structure with the described properties exist. (Plus can the strategy actually continue to compound without hitting capacity constraints? What about generalized performance decay, drawdown risk, etc.? What if the investor dies before realizing, stepping up the cost basis through inheritance?). But the basic principles are all the same. Tax deferment can actually be a net benefit for the IRS, if the investor re-invests and generates high return.
Please explain prop trading deals to me...
Posted by EspressoLover on 2017-03-13 12:05
Like FDAX said, if your Sharpe ratio is high, then the drawdown risk is de minims. 33% > 20%, so why wouldn't you want to take home more money.

The other dimension is that usually prop groups have exchange memberships, clearing arrangements or trading infrastructure better suited for HFT-type strategies than hedge fund seeders. Low fees and fast execution can literally make or break the viability of many strats.
spanning risk factors
Posted by EspressoLover on 2017-03-29 21:04
I don't think I understand the question. Why would it be desirable to span all risk factors:

"Johnson, could you tell us a little bit more about the risk profile of your new systematic strategy. What would you say are the major risk factors that it's exposed to?"
"I'm proud to say, all of them"

I think it's certainly desirable to be exposed to all major risk *premiums*. But not all risk factors have associated premiums. And even if they do, the size of those premiums may not justify the risk or cost of capital. To borrow from your list, what's the premium associated with inflation? Sometimes it's better to be long inflation, sometimes short. But there's no consistent historical evidence that one side consistently out-performs the other over the long-term.

I'd rather have a strategy that, while harvesting maximum returns, is as exposed to as few risk factors as possible.
TCA
Posted by EspressoLover on 2017-04-19 23:47
ITG sells TCost curves for most major markets. That's pretty much the standard for at least a large segment of US equities.
Sensible? Markit: "The Use & Abuse of Implementation Shortfall"
Posted by EspressoLover on 2017-04-20 00:05
Re: "Adverse momentum"

The simplest case here would be a trade based off some alpha signal with roughly the same horizon as the execution schedule. E.g. some stat-arb portfolio trading on intraday alphas is going to be a lot more aggressive than an index fund rebalancing. Even before touching a single share the former expects the price to be moving against the implementation cost.

But probably the more common case would be a large portfolio who's already dumped a lot of order flow into the market. By the end of a large meta-order they've already revealed a lot of information about their size and intentions. That's introduced an adverse drift into the price, mainly because now they're competing against their previous counter-parties, who are trying to offload inventory.
Trading arrangement
Posted by EspressoLover on 2017-04-27 22:53
I don't think those numbers add up. At 8% returns and 1% daily vol, the Sharpe is 0.5. Yet max drawdown is only 5X daily vol. Unless the arrangement's drawdown resets at the beginning of the year (which doesn't make sense), I don't think this is consistently feasible.

Intuitively the S&P has a similar Sharpe. In the past twenty-five years, MaxDD has only been <5.0*DailyVol during four years. The median ratio is 11.3, and in six of those years the ratio was north of 30. Yes, obviously the return distributions aren't the same. But the differences would need to be pretty extreme. The strategy's daily returns have to either be heavily right-tailed, or extremely mean-reverting. Otherwise you may get lucky a year, but getting stopped out in two is nearly a foregone conclusion.

If you take the numbers from the original post (with half the daily vol, but same returns and DD tolerance), it's a little more workable. Then it's 1.0 Sharpe with MaxDD of 10.0*DailyVol. So, let's hack this by bumping S&P's daily returns 4 bps to make it a 1 Sharpe process. It still gets stopped out nine years out of twenty-five. You probably survive a year, about 50/50 you survive two. But in five years there's a 90% chance you're dead.

I think the best approach to making this workable for both parties, would be to negotiate successively rising DD tolerance for every month the strategy's profitable. Tight stop losses make sense for an untested trader, but the longer he consistently delivers the more they should be relaxed. It's in both parties interest. The trader gets to stop dealing with the stress of playing Russian roulette, and the investor reaps larger returns by allowing proven traders to step up risk.
Trading arrangement
Posted by EspressoLover on 2017-05-01 12:23
Ahh, misunderstood the meaning of "var"... Value-at-risk, not variance... So, taking that into account puts to rest the drawdown issues, but the shoe gets put on the other foot. It makes the returns target really hard to hit. VaR is obviously defined different than vol, but to a first approximation it's still O(vol). At a minimum daily VaR is at least 2.5x average realized daily volatility. Realistically the ratio's probably even higher.

Taking the numbers from the first post, the trader has to make $8M a year while keeping daily volatility below $200K. That's 2.5+ Sharpe as a minimum target, which is really aggressive for a discretionary trader. Quantum Fund at its peak was still below 2. SAC since inception is about 2.5. John Arnold was maybe 3.0 at his best. Besides for world-famous traders, the only discretionary traders I can think of who hit targets like that are A) interacting with proprietary flows (e.g. sell-side desks or traders at oil companies). Or B) intraday click-traders who are quickly opening and closing positions (which doesn't work in this context because of the pre-approval).

The cynic in me suspects that there's a clause which says the trader doesn't get paid (or only paid a pittance) unless the fund makes it's target for the year. In which case the "swindle" seems clear: Find talented traders. Offer them a deceptively attractive deal. But set sufficiently low risk limits and high hurdle rates, such that it's nearly impossible to win. Bonus points for overly inflating the AUM, to make the target seem easier by comparison. ("All you have to do is make 8% ROI"). Harvest alpha from skilled traders while avoiding paying any incentive fees... Phase 2: ?... Phase 3: Profit.

But who knows, I'm just speculating and obviously don't know anything in particular here.
PA thread
Posted by EspressoLover on 2017-05-02 22:49
Generating alpha's hard enough. Reliably generating it in your spare time is probably not worth it besides for the preternaturally gifted.

That being said, I don't see why anyone with this phorum's average savvy and sophistication would ignore the major "alternative risk premia". Things like HML, MOM, TSMOM, carry, roll yield, PEAD, BAB, SMB RMW, etc. The type of things that have been known for decades, documented as working across a variety of markets, and have relatively simple implementations freely available at SSRN.

Yeah, going forward they probably won't work as well as they did from 1970-2015. But it'd be pretty shocking if they just disappeared completely. No, they won't turn your PA into Medallion, but even a modest boost to returns has a pretty huge impact over a O(25 year) retirement horizon.
PA thread
Posted by EspressoLover on 2017-05-05 20:37
> I am not sure what you mean by relatively simple implementation

Obviously I'm not giving this advice to my dentist. But I think if you can 1) program medium-sized scripts, 2) read and understand quant finance papers, and 3) understand the markets with some level of professional intuition, it's not that hard. So, pretty much every NP poster.

If your aim is just to replicate well-known premia, there's a lot of things working in favor of simple implementation:

* You don't have to backtest all. That's already been done for you by the researchers. You just have to construct present-day portfolios.
* That also means you don't have to worry about historical data, and all the associated cost and issues. You only need contemporaneous data, or very recent historical data.
* If a factor's so well known, chances are that it's input data is widely available for cheap or free. Book value, market cap, market beta, earnings calendars, are widely disseminated. Biggest pain in the ass is maybe writing a web-scraper.
* Reconciliation is pretty easy, because the major factors usually have published returns which are frequently updated somewhere (Ken French, AQR, etc.). Just make sure your recent live returns are near in line with some source of truth.
* Turnover on most of the classical premia tends to be pretty low. Most only rebalance monthly, and even then there's usually not much turnover. You don't need a spiffy automated system that continuously trades. Just dump trades at the end of the month, eyeball for correctness and send a batch of orders.
* T-Costs in major developed markets are basically nothing at this horizon. Unless you're very fortunate, your PA doesn't have to worry about market impact.
* In most developed markets, short selling is easy and pretty cheap.
* A typical Decile[1 Minus 10] factor on the Russell 3000, takes positions in 600 names. Obviously that's not feasible. But taking an unbiased subset of 10-15 long and short symbols is going to approximate the portfolio with pretty low variance. Unless you're a major portfolio, 10-15 names is more than enough liquidity.
* Historical performance is well-known. Just apply a sensible shrinkage estimator to expected long-run forward returns. All of the major premia are pretty much orthogonal to each other and the market. So portfolio allocation is dead-simple.
* The exact details of most factors are pretty meaningless. They're pretty much just Schelling points, decided by the arbitrary decisions of the first academic to publish on the topic. These aren't ultra-fragile strategies that require precise execution. So even if your implementation gets a few details wrong by accident or necessity, it's unlikely to have a significant impact.

The biggest challenge is just being disciplined. When a factor under-performs for five years, that's sixty rebalances of bad vibes. There's a strong psychological impulse to abandon it (which is a large part of why these things tend to keep working). There's also the temptation to tinker. "Oh, well it seems like [X] doesn't work when [Y], so I'm going to add this modifier or filter". But once you do start doing that, you're now trying to generate alpha. All those simplifying conditions go out the window, and unless it's your full time job, it's probably not going to work out well for you.
PA thread
Posted by EspressoLover on 2017-05-09 03:50
@darkmatters

Good questions. This paper has a deeper dive into the tax efficiency of the major equity anomalies.

The major "fundamental anomalies" have pretty low turnover. On the order of 17% per year, which doesn't put it that far away from the S&P 500. On the short leg, you pay ordinary rate, even if you hold for over 12 months, so they're not quite as tax efficient as S&P500. OTOH, most of these anomalies accrue almost all their gains on the long leg (which makes sense, as the market rises over time).

The "trading anomalies" also aren't as tax-bad as they seem at first glance. The turnover's high. But most tend to generate substantial tax-loss realizations. E.g. MOM is repeatedly selling losers and buying winners. Not only does this tend to defer tax realization and concentrate gains in 20% long-terms cap-gains, but it generates very valuable short-term tax loss carry-forwards which can be used in other parts of your portfolio.

Futures and forex are taxed at a blended rate of 28% regardless of holding period. Implementing an anomaly in commodities, rates or FX space (e.g. TSMOM or carry) means that even a tax-naive strategy is still relatively efficient. Futures, even shorts, can be utilized in an IRA. So, for example if you want HML in an IRA, you can buy the long equity leg, then beta-neutralize using ES. The research indicates that the single leg still generates sizable returns for most anomalies. The low-turnover anomalies would only be minimally affected by the 3-day waiting period.

Finally, if you're not liquidity constrained, and using random subsets as approximating portfolios, there's potential tax efficiencies with modest bias-free modifications. Tax losses can be realized and gains deferred by re-sampling on the losing leg. E.g. if the market is up for the year, pick a new subset on the short leg, and keep the subset for the long leg.
Company buying back its own debt
Posted by EspressoLover on 2017-05-23 13:48
Noble has $2bn+ in bank loans. Loan covenants almost always require being paid back at par before any bond buybacks. Otherwise it defeats the purpose of seniority in the capital structure.
CME's Market by Order rollout
Posted by EspressoLover on 2017-07-28 07:37
Think of it from the perspective of a market maker. With an order based feed you know your position with certainty. (Just tag your order with it's MDOrderPriority tag at arrival on the feed). In a level based feed you have to estimate your queue position. That's difficult, messy and involves a fair degree of uncertainty.

If you're trying to manage the adverse selection of your quotes, knowing queue position is really important. Especially in thick-book instruments (like most of the liquid CME futs). Less uncertainty means less adversity. Less adversity means that liquidity providers can quote larger, quote more often and be trigger-shy about cancelling.

In practice, most level-based market makers usually assume something close to worst-case queue position. Being overly optimistic is a quick recipe for being picked off. Most cancels occur in the back of the queue anyway. So worst-case is pretty much the only tenable approach.

But the thing is, even if worst-case is the modal case, some non-negligible proportion of the time you actually do better. So a lot of orders end up cancelled, that otherwise wouldn't be given transparent information. Giving market participants knowledge of their queue positions results in less cancels. Which means more orders alive at any given time. As a second order effect, less spurious cancels increases the chance that an entered order gets executed at an advantageous position. That increases the expected profitability of the marginal order. That should push the break-even back-of-queue size out further.
The Great Moderation
Posted by EspressoLover on 2017-08-18 20:12
Volatility and trading activity seem to be stuck near all time lows. Not that there aren't some sparks here or there, but the market seems enormously resilient. Nuclear war, terrorism, monetary tightening, nosebleed valuations, insane political drama that would be too implausible for a Tom Clancy novel. At most it drives one or two days of risk-off, then the market shrugs it off and hits new highs.

I would guess that for most people here, myself included, this state of affairs kinda sucks. So time for rampant speculation and bullshitting. Because what else are you gonna do right now. First question, what's up to cause all of this? It seems like there's a million and one explanations for what's happening: animal spirits, decline of active investing, shortage of safe assets, smart beta, Chinese capital outflows, shifting demographics, the Yellen put, DM stocks being the only decent investment left, algo trading smoothing out disruptions, sector rotation replacing risk-on/risk-off, etc. What's your favorite explanations?

Second question: Is this time different? History says vol may be low now, but markets will eventually revert back to a very stable long-term average. Are we just in a temporary cycle, or is there a permanent shift? This probably ties in closely with your answer to the first question. If you believe that this is due to passive indexing or smart beta, these are technologies that weren't widely utilized until recently. In which case this time may very well be different. The equity risk premium (along with its contributions to volatility) may very well be a thing of the past. Erased by widespread adoption of rational disciplined and systematic investment strategies.

Nobody's going to have a definitive answer, but there's a lot of insightful people here. So, what say you?
The Great Moderation
Posted by EspressoLover on 2017-08-24 23:38
It's a seductively compelling theory, but the empirical support is pretty weak. I decided to look at daily S&P 500 returns by decade going back to 1950. Here's standardized excess kurtosis by decade:

1950s: 9.3
1960s: 11.9
1970s: 5.3
1980s: 58.7
1990s: 7.8
2000s: 10.8
2010s: 7.3

And the same for monthly returns, which is a little cleaner and not so dependent on the single largest days:

1950s: -0.6
1960s: -0.2
1970s: 1.2
1980s: 2.6
1990s: 0.5
2000s: 1.6
2010s: 0.5

There's not really any clear evidence of a secular trend towards fatter tails. At least not in a statistical sense. The peak of tail-driven volatility is still 1987. Even 2008 pales in comparison for leptokurtosis. If the story is that financial innovation fattens the tail, surely three decades of market evolution should have pushed us to new heights. Looking at the distant past: yes, the fifties and sixties are relatively thin-tailed. But my guess is this is mostly an artifact of when the time series starts. I'd really doubt that the two world wars, 1929 crash, and Great Depression would be platykurtic.
The Great Moderation
Posted by EspressoLover on 2017-08-22 02:18
Very insightful replies all around. Threads like this are what makes this phorum great!

Like most here, I'm leaning hard against the "this time is different" hypothesis. That being said, I don't think it can be ruled out off the bat. Academic finance is pretty clear about two things: 1) the equity risk premium is too large to be rational. 2) Only a small proportion of the variance comes from changes to future cash flow expectations. Most of it can be attributed to fluctuations in the discount rate.

It seems plausible that at one point in the (maybe, quite far) future, investors will get over their equity-averse superstitions and permanetely bid the equity risk premium to near zero. In which case, a major sub-component of market volatility will simply disappear. Who knows when, or if, that will ever happen. But if it does, the cultural sea change will probably look a lot like a massive market share grab by passive indexers. Homo economicus forgets about trying to pick the hot manager of the hour, socks away his investments in a low-fee index fund, and only checks his statements once a decade.

(For the record, I'll entertain this theory, but don't really believe it. I think we're just in a massive bull market. Money tends to chases five year performance. Most active managers, particularly in alts-space, are <1.0 beta. Index funds, being 1.0 beta, are going to be where hot money lives for now. Until, as ronin mentioned, we see a big pullback.)

On the other note, I would contend that actives still create more index vol than passives. Even for funds with full-investment mandates. Actives can still make trades that are dollar neutral, but beta directional. For example a defensively positioned manager rotating out of consumer cyclicals into utilities. It doesn't directly affect a cap-weighted index, but there's still second-order effects at play. That's a pretty clear signal that's going to be internalized during index price discovery.

Neither passives or full-mandated actives participate in index price discovery. They're perfectly price inelastic, and have to remain 100% invested regardless of valuation. (Notwithstanding their investor commitment/redemptions.) However, unless all stocks have a 1.0 beta, then there must exist arbitrageurs keeping individual stock returns in line with [beta*market-return]. But that process also works in the other direction. The same arbitrage activity will push market returns in line with [sum(beta*stock-return)]. I'd argue that active rotation can still generate index volatility through this channel.
Bitcoin
Posted by EspressoLover on 2017-09-06 23:17
Seems like cash-settled futures are going to open a whole Pandora's box of problems. Which pricing source are they going to use? If you get a Bitfinex tether-like issue, then a single exchange's BTC/USD price can be significantly distorted. Also because Bitcoin trading is anonymous, illiquid and volatile, banging the close is going to be super-easy to get away with. Then what about another fork scenario? Do you settle based on the price of BTC alone, or BTC + fork? What if both forks claim to be the successor to Bitcoin, and there's no consensus opinion? What if some miner coalition captures 51% of the pool?

So, many things can go wrong, and that's just off the top of my head. Let's bring all the mind-bending corner cases of crypto problems into the mainstream financial system, where it can spread contagion all over the place. Considering how easy and seamless it is to transfer BTC (isn't that the whole point?), why the hell didn't they just make the futures physically settled?
Bitcoin
Posted by EspressoLover on 2017-09-07 21:00
Correct me if I'm wrong, but don't most of the major exchanges allow BTC deposits/withdrawals without KYC verification? At least up to a certain limit, in which case it's just a matter of creating a whole bunch of sock puppet accounts. I'm almost positive this is true for Bitfinex, which is where BTC/USD price discovery mostly occurs anyway. Even if GDAX is secure, their price is going to follow Bitfinex's lead.

If this is the case seems like anonymously banging the close is pretty straightforward. Convert USD to BTC on a clean wallet through a KYC institution. Tumble through Helix, and transfer into a dark net wallet. Deposit BTC from the dark wallet into a Bitfinex slush fund under a fake name. Bang the close in the spot market. Withdrawal to your dark net wallet. Tumble back to clear net wallet. Convert clean BTC into USD through KYC institution.

This shouldn't even raise laundering flags. The spot manipulation side very likely loses money. So, you'll be withdrawing less BTC than you put in the clean wallet to begin with. I'm pretty sure, that's not a transaction pattern consistent with laundering/AMP. Remember, the money making side of the equation comes from the CBOE, which never goes within 100 miles of the wallet shenanigans. The whole scheme seems pretty anonymous to me, unless NSA gives CFTC it's backdoor to Tor.

Bitcoin
Posted by EspressoLover on 2017-09-08 23:41
@jslade

I believe you're drastically overestimating how traceable BTC is. If the US banking system is a 1, and Monero's a 10, BTC is at least a 7. You're claiming that BTC is just as "clean" as the traditional banking system. That's ridiculous, BTC fundamentally more amenable to criminal activity by design. That shouldn't really come as a surprise, it's pretty much the point. Cryptocurrency's core constituency are hardcore ancaps who read Snow Crash as a howto guide.

Afaik, there's only been a single arrest based on blockchain analysis, and the user made no attempt to conceal or mix his coins. As long as transactions and timing is randomized, BTC mixers are provably secure. No major mixer has ever been compromised. Helix has openly operated a money laundering service for three plus years and has anonymized billions of dollars. Not one single user has ever been arrested.

Nor is this a thin channel, a Helix user can tumble 21 BTC per hour. That's $68 million per month. Even if somehow all mixers get compromised, you can still launder by hopping blockchains. Convert BTC to Monero, obscure Monero, then convert back to BTC. Absent preventing every single BTC institution from touching Monero/Zcash/etc. there's no way to prevent this.

If BTC anonymity was compromised, then it wouldn't be the case that every single DNM uses it. If there were so many arrests, why haven't they all moved to Monero. As we speak, thousands of coke dealers are moving millions in profit over the BTC blockchain. How can they keep operating with impunity? AlphaBay processed a billion in transactions over four years. It was only shut down because the admin used his LinkedIn email on the password reset message. Why can't 75%+ of Ross Ulbricht's coins be attributed despite the FBI having root access to his devices? Why haven't the WannaCry hackers been arrested?

Bitcoin
Posted by EspressoLover on 2017-09-09 08:11
@jslade

Sorry. Did not meet to offend you or come across abruptly. Obviously you have much more expertise than me: you're in the industry and I'm just a guy that browses /r/bitcoin from time to time. Also, I thought it was implied that the points I didn't respond to were implicitly conceded. Notably I significantly underestimated the extent of regulation on the major BTC exchanges. Again, you would know much more about the ins and outs of the extant major exchanges, so I wouldn't even bother disputing that. Mea culpa, if that wasn't clear.

That being said, even though you've persuaded me on a fair number of points, there are still a few things I disagree on. Sometimes expertise can be a liability, in certain cases that kind of exposure seduces people into groupthink. (E.g. structured credit experts circa 2006 certainly turned out to have some major blind spots). In fairness, I could easily be wrong about any of these points, and am definitely open to revising my priors based on further argument evidence.

> Hopping blockchains? How you gonna do that without using an exchange (where you will be KYCed up the ying yang)?

Non-US crypto-only exchanges have no legal obligation to KYC. This isn't just hypothetical, Shapeshift (based in Switzerland) requires no personal information whatsoever. And even if every legal jurisdiction in the world outlawed anonymous crypto exchanges, there's no reason they couldn't run purely over the dark net. If you don't touch fiat, you don't need any legal or banking presence.

> it has no bearing on futures based on BTC-USD spot price. 

So let's assume that the wall between BTC and USD is hermetically quarantined. Just consider actors who stay purely within crypto space. There's certainly enough endemic wealth available (how many bitcoin millionaires are there now?). Unless crypto transactions can be de-anonymized (more thoughts on this below), then avoiding any conversion to/from fiat puts you completely beyond the pale of regulators. You might disagree with the accuracy of the antecedents, but I don't think you'd deny the logic of the syllogism.

So let's try to make a meaningful assertion restrained to pure crypto actors. (You've convinced me that dealing with BTC/USD directly is not an easy route.) I'd contend that this group can *still* monkey with BTC/USD spot. This is pretty speculative (and you probably have a better informed opinion than me) but I'd expect that pumping BTC/ETH hard would have strong spillover effects on BTC/USD. I would expect a similar story for some of the major tertiary coins such as BCH, DASH, and LTC. Although to be fair this is definitely a lot less effective than just directly banging the close at Gemini.

> https://news.bitcoin.com/chainalysis-ransomware-arrests-coming/

There's no actual arrest mentioned here. All I see is a CEO in series A funding, talking up his product. The article is nothing more than a vague promise that one day in the unspecified future Chainalysis' product is definitely going to lead to the arrest of some hackers, who may or may not be the WannaCry guys. Just wait and see, it's definitely coming!

So far, there's been nothing to indicate that blockchain analysis is anything more than vaporware. On the other hand there's a confluence of vested interests pushing the hype train. There's the monero/zcash shills who want to bewilder all the criminals away from bitcoin. On the other side are suits like the Winklevii that need to clean up the public's Wild West like perception of bitcoin. Then there's the standard SV bros who realize that "big-data for cryptocurrency" is an easy sell because it includes not one, but two red-hot buzzwords. Finally you've got CS researchers who are using this to bilk the Feds for grant money and consulting fees.

Of course blockchain analysis works under constrained theoretical cases. But as of now there's nothing to show that it can defeat even garden-variety anonymizers. Again, I could be wrong. Maybe blockchain analysis really is ready for primetime. I think a good test is to see if Chainalysis delivers, and we see the WannaCry hackers (or any major BTC based hackers) get arrested in the next few weeks. (And not for some idiotic opsec error, like attaching a clearnet email to a PGP key). If so, I will come back here and eat crow.

> If you disagree with me, feel free to attempt to fix the spot price of Bitcoin and make bank on the futures markets.

At the risk of coming across as inflammatory again... If you disagree with me, feel free to use the blockchain to dox the identities of DNM operators/vendors and make bank blackmailing them.
Bitcoin
Posted by EspressoLover on 2017-09-13 01:50
@rowdy

Assuming this is directed at me... so I'll give my $0.02. Caveat emptor, I'm no expert in cryptocurrencies, never claimed to be and likely will never become one. I have an armchair interest, and sometimes like to speculatively apply generalized principles from other areas where I do have some experience.

1&2) I think I get the gist, but I'm embarrassed to admit not being exactly sure what "social clip" means... hopefully this gets to your question:

I quickly sampled the consolidated order book of the major exchanges at data.bitcoinity.org. Right now there's about 1650 BTC of resting liquidity within 1% of the mid-price/last-trade. At $4500 a coin, that's $7.4 million in capital to spike/crash the price across all the major exchanges. I have no idea the size of the OTC market, but I doubt it's larger than the size of the exchange liquidity. So, $15 million would be a good amount of notional capital to really move the market, at least for a short period.

To move the book -1%, let's say your VWAP basis is -0.65% below the original mid-price. Let's also assume you're terrible at liquidating post-settlement, and you cover your short exactly back at the mid-price. Add another 0.1% in exchange fees, and generously 0.05% in various other costs, risk compensation, etc. You end up taking a 0.8% loss on the spot side of the scheme.

Let's also be generous again and say you need 2X the capital, because you're banging ETH/BTC instead of BTC/USD, the market's thicker at settlement, settlement samples a time range, etc. That's an expected loss of $240 thousand to manipulate the settlement value 1%

On the futures side, let's say Bitcoin contracts are only as liquid as the least liquid FX major at the CME. I think if BTC futures take off, it's safe to say there will be at least the same amount of speculative mania as there is in the New Zealand Dollar. That trades $2.3 billion in notional a day. I think it's reasonable, on settlement day, to slowly build a position at 5% of ADV at a net t-cost of 0.2%. That's a $117 million notional position, with an incurred $235 thousand in t-costs.

Manipulating spot gives you a 1% expected return on the futures settlement. Altogether you net $695 thousand for taking a little bit of intraday BTC risk. Assuming 2.5:1 margin on spot and 6:1 margin on futures, that's a 2.7% one day return. If you use OTM options at the futures exchange, you could probably do 4%+. And this is all making very conservative assumptions about the liquidity in the futures market.

This works because exchange-listed crypto derivatives are inevitably going to be much more liquid than spot crypto markets. Very few institutional funds are going to have the legal ability to directly trade crypto, let alone the technical acumen to do so without major security risks (imagine hacking the private key to CALPER's wallet). So if bitcoin does become an "Asset Class", all that money will go through exchange-listed derivatives.

2a) "I think you'd find it hard to continue to trade with the OTC guys taking the other side of your winning trades as the number of counterparties that can take size are relatively small."

Actually on the spot side of the trade, your counter-parties would be making significant profit. The patsies when banging the close are in the derivatives markets. For example in the above example, the spot market counter parties net out $195k in profit. It works out that way because the manipulator has to close out his spot market position, almost always at a worse price than his cost basis.

3) Don't really know much details about Monero vs ZCash, or have a strong opinion about one vs the other. From a 12,000 foot perspective, I'm aware Monero is based on obfuscation whereas ZCash is based on ZKP. So, ZCash is provably secure, whereas Monero is not. OTOH with ZCash you have to trust that the original private key was actually destroyed. (BTW there's a Radiolab episode about the creation of the public key, that's very entertaining). Otherwise a malicious actor could keep counterfeiting new coins and there'd be no trace. Depends on your personal preferences, YMMV.

4) Any mixer is cryptographically deniable assuming:

-A: The operation itself is trusted and doesn't keep logs
-B: The client's coins make up <50% of the tumbled coins
-C: A malicious attacker doesn't make up >50% of the tumbled coins
-D: Tumbled coins are paid out in random transactions with random fees
-E: Tumbled coins are returned with random timing

It's trivial easy to show that this results in untraceable transactions. Suffice to say these are sufficient conditions, but not necessary. Some of the constraints can be tightened or even eliminated using fancier schemes.

Unless Helix is the world's stealthiest honeypot, and/or the government is willing funding half the revenue of the world's largest money laundering operation, all of these criteria apply to the most widely used commercial bitcoin mixer.

5) Is it possible? Certainly so. And like I said, I will come back and eat crow if we do see major arrests from deanonymizing blockchain analysis. Doubly so if it's Chainalaysis that delivers it.

But I was debating the assertion that publicly available blockchain analysis can easily defeat commercial grade tumbling. *If* that was the case it wouldn't just be law enforcement. The doxxing hypothetical wasn't just for trolling. If currently published blockchain analysis produced practical results, every two-bit hacker in the world would be blackmailing every DNM operator, ransomware distributor and any party who previously hacked an exchange.

The flush and catch scenario is only possible if LE has access to commercial-grade blockchain analysis that the public doesn't. They're sitting on it, silently collecting a massive pile of evidence, and are patient enough to wait years without any arrests. (Hansa was only a honeypot for a few months). And also no one in the operation is leaking the software or commandeering it for himself to doxx DNMs on the side. Possible? Yes. Plausible? Ehh... Anyway, this is just saying that blockchain analysis is easy enough for the NSA to figure out, but no one else. That's quite different than the original assertion that practical blockchain analysis only requires graph theory 101.


Hope that all makes sense. Doubly hope that I didn't make an embarassing error somewhere that completely ruins all my arguments... I'm open to changing my mind given sufficient evidence or well-reasoned logic, just not argument authority. Some or all of this could be glaringly wrong. But if that's so, could you please do me the courtesy of explaining *why* it's wrong.
Bitcoin
Posted by EspressoLover on 2017-09-13 07:38
Btw,

Another way to launder USD into a BTC wallet without traceability. Doesn't even require mixing or any wallet shell games: Buy some AntMiners.

Just make sure to connect to a pool with a Tor hidden service. Even if you're losing negative 15% ROI on your mining operation, that's much less than the typical cost criminals pay for fiat money laundering. Point being, even if blockchain analysis was flawless against tumblers, there'd still be a very reliable (albeit more expensive and inconvenient) way to acquire anonymous coins.

Bitcoin will always be an attractive playground for those looking to hide and shuffle dirty money.
Bitcoin
Posted by EspressoLover on 2017-09-16 06:46
@rowdy

Thanks for the detailed response. Enjoyed the paper, learned from new things. I think at this point, most of out disagreements have to do with facts on the ground, rather than general principles. To help crystallize the boundaries between our opinions I've tried to lay out very concrete beliefs at the end of every point.

2a) @patrick's response had a pretty good explanation about how these things work.

Well, I was just trying to construct a conservative lower-bound. Assuming that BTC futures become much more liquid than the baseline scenario, that gives way more futures liquidity to push around the spot. You could easily see 10%+ returns.

But even if not, 2.7% is a pretty juicy return considering the minimal market risk. (My uninformed understanding of BTC lending is that it comes with substantial risk of wipeout, especially when the rates are 100%+). You're talking about maybe 30 minutes of pre-settlement price risk while you build up your futures position. And if you're a market maker or directional trader with an already large position going into settlement, virtually no incremental risk. Assuming monthly expiry, you're making 37% return, with maybe 5% drawdowns, and only showing up to the office twelve days a year. I don't think they'll be any shortage of enthusiasts.

This type of manipulation acts as a persistent tax on the non-manipulating derivative investors (who on net are getting shitty settelements). But the real risk to market stability is when things go pear-shaped. Normally if the spot market makes a big move, the futures market acts as a dampener. Arbitrageurs can buy the spot and hedge by selling futures until the two come back into line. But that relationship breaks immediately prior to settlement. Plus all the extreme gamma hedges, assuming options settle at the same time. Markets tend to become very jump-y, particularly when futures markets are much more liquid. It's really easy for this type of situation to trigger a cascade of margin calls and stop losses.

Empirical prediction: Conditional on US listed Bitcoin futures that trade over 100 million notional ADV. The correlation between [pre-settlement return] and [post-settlement return] will be less than -0.1. Here [pre-settlement return] means the percentage returns from [spot price 15 minutes prior to start of expiry settlement procedure] to [settlement price]. [post-settlement return] means the percentage return from [settlement price] to [spot price 15 minutes after end of settlement procedure]. Sample size will be however many points are needed to reject either null hypothesis of less than -0.1 OR greater than -0.1 with a t-stat of p<0.05

3) That makes sense about Zcash. Didn't realize that it sucks so bad. Definitely changed my mind, Monero all the way.

> I just got from your comment that you thought Monero / Zcash somehow didn't provide any incremental anonymity or privacy relative to BTC.

Didn't mean to convey this. Definitely agree that Monero is certainly more anonymous BTC. Especially when you take into account the chance of making a mistake (forgetting to mix, connecting to wallet over clearnet, etc.). If I had to do something I wanted to hide from the authorities, I definitely would use Monero. But then again I also wouldn't be breaching credit bureau databases or sending sheets of acid through the mail.

Which gets to the more relevant point: will the existence of Monero crowd out BTC's usage by criminals. A lot of the BTC suits seem to be relying on this prediction. It would clean up the BTC ecosystem, which in turn would make regulators ease up, improve public images, make straight-laced investors comfortable, etc. But I just don't see it happening. Monero will definitely take some black market share, but BTC will be king of the underworld for a long time.

Even though BTC has cryptographic risks that Monero doesn't, these risks are orders of magnitude less than the OPSEC risks that even well-run criminal operations take. For every drug dealer caught by blockchain analysis there's a thousand caught by the postal inspector. Switching cryptocurrencies is just not a priority. Plus there are path dependencies, that lock in BTC because it was there first. A lot of DNM users only deal with BTC, technically illiterate ransomware victims won't be able to deal with a tertiary altcoin, hackers will continue to prefer the exchange's bitcoin wallets because they're more valuable, etc.

Then on top of all this, criminals are not just evaluating cryptos as a medium of exchange, but also as a store of value. The most effective laundering scheme is just to sit on your coins until the statute of limitations expires. Most criminals aren't pulling all their profits out in real-time. Most of it just quietly accrues in untouched wallets. Ross Ulbricht had virtually nothing in fiat assets. BTC is a much more established asset than XMR, and hence safer for long-run savings.

Empirical Predictions: One year from today, of the top four most active DNMs, at least three will accept BTC. Of the top ten most active DNM vendors across these markets, at least seven will still accept BTC. Of the four largest ransomware attacks over the next year, at least two will accept BTC.

5) [out of order, I know] Everything I know about law enforcement I learned from watching The Wire. First LE pretty much has the resources to solve any particular crime, but falls well short of the ability to solve every crime. Second, criminals may be dumb and impulsive, but prosecutors and police chiefs are Machiavellian, ambitious and image-conciounse. Given the choice between easy busts that put drugs and money on the table for the Six O'clock News, OR slow-boil, hard-slog investigations that might turn up empty, but could make a meaningful impact; they will choose the former nearly 100% of the time. When it comes to skirting the law, security by obscurity is absolutely a valid approach.

Blockchain analysis may lead to a small percentage more arrests on the margin. But it's highly unlikely to hurt any of the largest or most sophisticated criminals. It certainly won't be effective enough to shut down a significant percentage of BTC black market activity. The War on Bitcoin Laundering will be no more effective than the War on Drugs. You're right that blockchain analysis combined with careful detective work certainly could lead to some serious busts. But that won't happen, following up on some statistical clustering analysis is too long, too expensive and too uncertain for career-focused US Attorneys. Bureaucrats don't have grit. The original Silk Road investigation took more than two years, even after Ross Ulbricht posted his real name on WordPress.

Watching the money works in fiat-space, because (outside cash) the only way to move money is to get a bank to sign off on it. Banks require intense capital commitment, are notoriously difficult to start, number very few relative to end-users, can be arbitrarily seized by the authorities, and require extensive legal presence. In crypto world the atomic entity is the wallet. Wallets require virtually zero capital, are trivially easy to create, number far more than end-users, cannot be seized unless the private keys are captured, and require no legal presence or documentation.

Laundering through the banking system is tough, because corrupting a bank is very difficult. This leads to a virtuous circle where very few banks are corrupt, so policing the few bad ones is easy. Also the state can make arbitrary, onerous demands on banks, like KYC/AML. Corrupting wallets is trivially easy. Arbitrarily long chains, cycle and shell games of wallets can be made with little effort and no legal risk. Transfers are completely undocumented. If wallet A is two hops away from AlphaBay, is it laundering money, or does it belong to a coffee shop that a drug dealer regularly shops at? A dirty wallet can transfer small random amounts to thousands of random clean wallets to muddy the waters. Security by obscurity is a very effective approach with wallets, whereas it just doesn't work with banks.

Empirical prediction: Take the four highest-profile bitcoin-related hacks in recent news: WannaCry, Bitthumb, Nayana, and Equifax. No more than two will have major arrests within one years time. No more than one will be discovered due to an investigative chain starting with blockchain analysis.

4) Maybe it's possible that mixer's will prove to be the Achilles heel of black market bitcoin activity. But I doubt it. There's zero indication that any major mixer has been compromised, either internally or externally. I say that because there's still zero arrests related to mixers. Criminals aren't worried about the hypothetical possibility of getting snagged by their mixer, because they take larger tangible risks all the time.

You are absolutely right, Helix could be a silent honeypot, or may be keeping storing detailed logs to plea bargain with later. But so far it hasn't failed yet. Even if it did, the effective risk for a random user is de minims. Security by obscurity. Worse case, maybe we see two dozen low-hanging arrests out of thousands of users, but that's it. The DOJ ain't the Cheka. All they care about is money and drugs on the table, grabbing some headlines and moving one step closer to GS-1.

Criminals may be dumb, but they are Darwinian. No one's making AlphaBay's mistakes again, because AlphaBay's admin is now in Federal Pound-Me-in-the-Ass prison. If one mixer gets compromised, people will layer multiple mixers. If most of the trusted mixers get compromised, the black market will move to something trustless like CoinJoin. If a CoinJoin backdoor is discovered, some new implementation will patch that. Hackers can innovate much faster than LE can catch up. Yes, some criminals will go to jail in the learning process, but unless you sweep a major percentage of the criminals all at once, it won't disrupt existing use patterns. Citation: 100 years of history from the plain old analog War on Drugs.

Regarding volume capacity, the major mixers have more than enough for anyone's need. You don't need to launder your entire revenue, just your living expenses. BTC makes a fine retirement nest egg. A Helix user can process 21 BTC an hour. That's more than enough to allow a dozen person crew to live on $1 million a month. Even if Helix is compromised, that traffic and mixing capacity, will move to whatever the new Schelling point is.

Empirical predictions: Within one year time, Helix will still be operational with no publicly known compromises or associated arrests. If Helix is compromised, no more than 25 user arrests will directly result from it. The largest mixer (trusted or trustless) will have capacity of at least $250,000 per month per user. At least one major bitcoin mixer will exist with no publicly known compromises or associated arrests.
Bitcoin
Posted by EspressoLover on 2017-09-17 03:32
@rowdy

You are right about AlphaBay, I was wrong. Tried to hard to shoehorn in an Office Space reference.

Makes sense. You could easily prove right. Not really sure how much I disagree with you anymore. My last retort is that I think the BTC ecosystem is an entirely different beast than the traditional banking system. I think the AML/CTF investigators are going to find themselves not equipped for their new roles. Outside the NSA (who don't really give a shit about anything besides terrorism and maybe El Chapo level drug lords), I don't really think any Federal agency is technically competent enough. But this is just ungrounded speculation, so we'll see what happens...

#smartcontracts

Whether they're practical is another question. A lot of it sounds like science fiction, and maybe it is. But the possibilities are cool to think about.

Polycentric law

A smart contract may still have to defer to some authority, but the participants at least can freely choose their authority. With fiat-law you're bound to a pre-determined jurisdiction. In principle you can use arbitration clause, but the fiat-law judge always has final say and can override it.

For example many fiat authorities refuse to enforce contracts betting on sports. If we wanted to bet, we could find a trusted crypto-authority who will honestly enforce those contracts. It's possible that that entity could be corrupted, in which case he would lose clients, and other crypto-authorities would take market share. Legal corruptions is deterred because the loss of a trusted franchise hurts more than than the gains from defecting on a single iteration of a repeated game.

Self-consistent law

You can have a single crypto-judgement projected onto multiple contracts. The crypto-authority can still be corrupted, but it has to be consistent across every subscribed contract. Fiat-law can arbitrarily make unconstrained judgements. Say the authority is biased towards Alice and Dave. Alice correctly bets Bob that the Patriots will win the Super Bowl. Carol bets Dave the same thing. Fiat-law can just declare that Alice's contract is enforceable, but trump up some BS reason that Carol's is not. The crypto-authority can of course lie and declare that the Patriots didn't win. But in this case they can't simultaneously screw Bob and Carol.

If you're talking about multiple contract crypto-law can constrain the results to a single hyper-plane. Fiat-law can pick any point in the space of all outcomes. That constraint may substantially lessen the returns to corruption, and deter bad behavior.

Digital assets

Obviously if you're talking about real-assets, you can't override fiat law. But there's a lot of purely digital assets where smart contracts can execute without needing any external authority whatsoever. One example would be if DNS moves to the blockchain. You could hold a binding auction for a given domain name, which would automatically enforce its rules and transfer.

Another example might be a software company. They could take a crypto-loan that's collateralized by their own code. They might hand their creditors an encrypted copy of their git repos. If the loan isn't paid back under certain terms, then the smart-contract might release the private keys to the creditors. Who then in turn, would have access to source code.

Even with real assets, you can denote crypto-title with colored coins. If the broader crypto-economy agrees on that scheme, it starts to exert real influence. Maybe a real estate developer shirks his crypto-liabilities. In that case he may still hold fiat-title, but lose crypto-title to the property. By doing so he basically would lock the property out of access to the crypto financial system. Can't borrow against it in the crypto-markets, can't lease it to tenants who use smart contracts, can't sell it to DAOs, etc. If a substantial portion of the economy becomes crypto-denominated, then crypto-title starts to represent a very serious proportion of the value of even real assets.
The Great Moderation
Posted by EspressoLover on 2017-09-27 14:04
Maybe. But 1929 probably falls on the tech side of that spectrum. The roaring 20s were mostly driven by mania for petrochemicals, electrifications and automobiles. Then 1873, which precipitated the worse depression of the 19th century, was also pretty tech'ish. At least if you consider railroads tech.

The relatively contained crises of 1884 (over leveraged bank loans) and 1890 (LatAm debt) were pretty close to the leveraged-asset end of the spectrum.
Bitcoin
Posted by EspressoLover on 2017-10-16 00:36
The segwit2x hard fork is almost here. Last time there was an event like this (the Bitcoin Cash fork), there was a pure arbitrage opportunity. At least at Bitfinex, longs received forked BCH, but shorts didn't owe any BCH at the time of the split. The free lunch was to take offsetting positions, and get BCH for free.

(More details from Matt Levine here)

I'm looking for a similar exploit with the segwit2x fork. As far as I can tell, Bitfinex has improved their accounting, in a way that doesn't allow for the arb on the upcoming fork. For any crypto maestros out there: 1) Am I missing anything with Bitfinex, is there any arb left open by their procedure? 2) Are any other other exchanges handling the situations in an exploitable way like Bitfinex did with the BCH fork?

Edit Addendum: Somewhat related, didn't realize there was also another fork coming Oct 25 - Bitcoin Gold (BCG). It appears that Bitfinex isn't supporting this fork. So the simple arb is to short BTC at Bitfinex, then hold an offsetting position in a wallet. After the fork, split the wallet, use the BTC to cover the short, and end with free BCG. (Minus the 1 day lending fee.) Though I don't know how much the forked coin will be actually be worth, given that basically no one's supporting it.

As for Segwit2x (B2X), from what I can tell Poloniex isn't supporting it. Same story holds here, short at Poloniex, hold in wallet and split your own coins. A lot more counterparty risk with Poloniex, but the rewards seem pretty high. The B2X futures at Bitfinex are forecasting a price of 0.15BTC. As far as I can tell, this also seems to be the case with BitMex. And maybe GDAX, though it was unclear if they're going to unconditionally credit B2X, or just credit it in the scenario that it "takes over" BTC.
Bitcoin
Posted by EspressoLover on 2017-11-04 00:23
Yeah, actually. There appears to be two tenable approaches.

Option 1 is some trade that looks like this. Short BTC futures at Bitmex (which explicitly is not supporting Segwit2X). Buy spot BTC and hold at Bitfinex, GDAX or in your own wallet. Lock in the B2X price with the BT2 tokens at Bitfinex (which are basically pre-fork forwards for B2X). At the fork you should receive B2X from the long spot position, which is then used to pay the BT2 position. Unwind the futures and spot position, which should hedge against any market movements in BTC.

Right now the futures and spot prices are trading at a significant basis. Which they weren't at the time of my post. So to really be risk-free you have to lock in B2X with BT2. Otherwise if B2X ends up pricing below the basis you paid at entry, you will lose. (The basis probably will revert to zero almost immediately after the fork). As of now the basis is priced at approximately 0.06 BTC, whereas the BT2 tokens are trading at 0.15 BTC. So the above arb should guarantee 0.09 BTC per unit of notional BTC. (Minus transaction and funding costs, which should be comparatively minor).

The annoying thing about this trade is that even though your position is hedged it's at two separate exchanges. So if you trade it with leverage, you'll have to monitor BTC's price. If it jumps up or down, you'll have to rebalance funds between exchanges to avoid margin calls. Given the leverage available at Bitfinex/GDAX and Bitmex, you can probably lever this trade up 2:1. So you're looking at a 15-18% return on capital.

Option 2 is a little less of a pure arb, but offers substantially higher potential returns. Rather than explicitly holding over the fork, you can bet that the spot/futures market will eventually converge to rationality at least immediately prior to the fork. The solution is to trade the basis directly betting that it will converge to the implied B2X price. You can even drop BT2 from the equation, because it would have to fall by over 50% before the basis was overpriced. So far it's been pretty narrow around 0.15BTC.

The nice part about this is you can run the whole trade at Bitmex, by using the swap to trade spot. That's an option because you're betting on the basis to converge prior to the fork, so you don't have to worry about Bitmex's lack of B2X distribution. Worse case if it's 24 hour prior to the fork, and the basis hasn't converged, then you can still put on the trade from option 1.

This property allows you to deploy a lot more leverage. Both because Bitmex allows for 100:1 leverage, and because you can cross margin the futures against the spot. Even if BTC jumps up or down, you're not at risk for a margin call. Your only mark-to-market losses come from a change in the futures-swap basis.

It'd be highly unlikely that basis moves to less than zero before the fork. Since it's trading at 0.06BTC now, you can use 15:1 leverage or more and still be pretty safe. Let's say you exit if the basis converges to 10% of the implied B2X price. In that case you'd be looking at a PnL of 0.075BTC on the notional. That'd be a return of 112% on capital. But unlike the pure arb you do have risk if the basis widens, and doesn't converge before the fork.
Bitcoin
Posted by EspressoLover on 2017-11-15 19:41
> Right now it feels that there is a lot of appetite for institutional investing in crypto, but then technology on which this should rest isn't really there.

Agree. The best business plan in a gold rush is to sell shovels.

I have no idea why no one is trying to build a serious spot exchange. It's not like the technology doesn't exist from actual exchanges. Even third world stock exchanges are pretty sophisticated now, because the major players license their systems out. These existing exchanges are just total garbage. Even GDAX, which is supposed to be the "professional exchange".

I think it'd be really easy to build something with the following features using off-the-shelf components:

- Actual co-location, not this AWS horseshit. Put it in 350 E Cermak, so it's easy to trade against the CME futures. (Plus this lets you sell microwave feeds from the CME for $$$)
- Full order book and trade data, on a real protocol like ITCH. No more web-socket snapshots.
- Matching engine with decent and deterministic latency and behavior.
- DMA only available to institutional, certified clients. Everyone else should have to use execution brokers. The exchange shouldn't be at risk of crashing because their frontend Node.js server gets hit with a DDoS.
- A real clearing system, where you're not constantly worried about the exchange losing your entire deposit. Stop having end-users directly hold balance on the exchange. Use well-capitalized institutional clearing brokers.
- And on that note, all crypto deposits should be in segregated multi-sig wallets. One key held by the exchange, one by the clearing broker, and one by a designated Big 5 Accounting firm which keeps they keys in cold storage. The latter should never get involved or take their keys out except in the event of a hack or dispute.
- I think given the above, you could probably get an insureCo to underwrite a reasonably priced policy against a hack. Having someone like Lloyds guarantee the crypto deposits is a huge step to getting the institutional money comfortable.

Anyone who set this up would easily garner nearly 100% of spot institutional market share.

EDIT Addendum: Not worth making a new post about. But CME just announced that BTC futs data is coming over the Equity channel, not the FX channel. Wonder if there's any rhyme or reason to this. (Besides just low-level networking issues)
Bitcoin
Posted by EspressoLover on 2017-11-25 17:47
> Clearing became unbelievably slow and transaction fees surged. How would that play out when big money comes in. 

What's the status of the lightning network? I thought this technology was suppose to fix the issue, hence the big push for SegWit. Did lightning turn out to be vaporware?
Bitcoin
Posted by EspressoLover on 2017-11-30 22:08
@rrp

Thanks for the insight and link. Intuitively that makes a lot sense. It did seem like the capital commitment for payment channel would be a non-starter. As a neophyte, it's hard to evaluate things like this in a vacuum. It's always very helpful to get this kind of no-bullshit opinion from an expert.

While you're here, let me float another half-baked idea. People want Litecoin's speed, but Bitcoin's price exposure. Why not create a "shadowcoin" backed by BTC. Set up the blockchain in the vein of litecoin or ethereum so that transactions are way faster and cheaper. But keep the price of the shadowcoin pegged to a very narrow window around 1 BTC.

Kind of like what Tether is to USD, but not totally shady. Use the cryptographic properties of bitcoin to do it in a trustless, decentralized and provable way. Kind of analogous to ETFs, with a pre-defined creation/redemption mechanism to keep price in line. The "vault" would just be public on Blockchain explorer, so that anyone can verify that the shadowcoin is honest. The founders could even get paid by setting up the shadow blockchain so they skim a certain percentage of the creation/redemption fees.

The fact that this doesn't already exists tells me there's something faulty with my reasoning. Either there's no real demand (people (irrationally) only want to hold pure BTC, even if a shadowcoin is provably backed), or the details to fill in the handwaving is way harder then it looks.
Bitcoin
Posted by EspressoLover on 2017-12-01 20:54
Thanks for the color, rowdy. Was not even aware of those terms until your post. Seems like people have already thought about this a lot, before it occurred to me. One of the depressing things about a planet of 7 billion people is how few original ideas actually exist.

To confess I've been kind of obsessed with this idea the past few days. I've been doodling Alice/Bob diagrams nonstop. But hearing your opinion on the topic, I'm officially shuttering work on EspressoChain. Smiley
Bitcoin
Posted by EspressoLover on 2017-12-08 01:33
>  I assume that there would be some kind of traffic into the network, but maybe that's only the case when transferring in and out of the exchange. 

The other factor is inter-exchange arbitrage. That definitely clogs up the block chain. Say you're market making on Bitfinex, and hedging with GDAX. You're short Bitfinex and levered long GDAX. BTC jumps 20%. Now you get a margin call at Bitfinex, and have to route funds from your GDAX account.

This kind of activity scales up super-linearly with volume and volatility. The transaction network is so fucked once the futures list. Even if all the CME/CBOE activity is dollar denominated, there's going to be huge knock-on effects in the spot markets. Every time these spot exchange get DDoS it only makes things worse. Everyone moves BTC to other exchanges to hedge.

I think the smart trade is to buy LTC ahead of Dec 10. Sophisticated crypto arbers already use Litecoin to move money between exchanges. BTC, LTC and ETH are the only coins that all major exchanges support deposit/withdrawals on. BTC transaction times and fees are an order of magnitude higher. And Ethereum is currently clogged up with pictures of cats. (This is not a joke.) Goes to show the flaws with supporting Turing complete transactions inside the block chain.

When it comes to a practical medium of exchange, Litecoin's coming out the clear winner. When you're staring down a million dollar margin call, you don't want to wait four hours for an official confirmation. Taking residual LTC/USD price risk on your margin capital is much less worrisome. We're in the middle of Tulip mania, and the price may suddenly spike up or down. But I want to be the guy investing in pots and soil.
Bitcoin
Posted by EspressoLover on 2017-12-09 02:12
> The blockchain technology itself is a separate thing which probably has some valuable applications

How do you have blockchains without some implied monetary asset? The incentives behind Nakamoto consensus only work because a 51% attack would debase the currency that miners collect. (s/miners/stakeholders/ for PoS chains). If you control that much hashing power, then it's better to just keep collecting coins. But if the coins aren't worth anything, what's the incentive for new blocks to come out honest?

Okay, so maybe we're talking about some walled garden, where miners are more or less trustworthy? But in that case why go through the hassle of having a blockchain. They're only needed if you literally can't trust any other party in game. Even if only O(1/N) parties are trustworthy there are better schemes with better security guarantees, higher performance and easier maintainability.

One simple solution is just to require every transaction to be signed by a near unanimous quorum of O(sqrt(N)) randomly selected peers. The birthday collision paradox will assure that double spends are detected with 1-episilon probability. Unlike a blockchain this 1) doesn't require extravagantly wasteful proof-of-work and 2) only requires clients to keep a sublinear sampling of the full history.
Bitcoin
Posted by EspressoLover on 2017-12-08 04:45
CryptoKitties, an online game that debuted on Nov. 28, is now the most popular smart contract -- essentially, an application that runs itself -- on ethereum, accounting for 11 percent of all transactions on the network, according to ETH Gas Station. That’s up from 4 percent on Dec. 2 for the network, which uses the distributed-ledger technology known as blockchain.

The game is actually clogging the ethereum network, leading to slower transaction times for all users of the blockchain, which is a digital ledger for recording transactions.

"The pending transactions on the ethereum blockchain have spiked in the last 24 hours, mostly from CryptoKitties traffic," Nolan Bauerle, director of research at CoinDesk, said in an email.

Bloomberg

Pending transactions in the queue has exploded from 5000 four days ago to over 20,000 today. Average transaction fees have spiked by 500%.


All because of friggin' cats. This does not feel like the money of the future. Ether's looking less scalable than even Bitcoin right now. Turing complete smart contracts on the blockchain just don't seem to work. It's like if every piece of code in the world all had to share time on the same shitty 386. The marginal consumer of Gas is probably always going to be whatever Ponzi scheme is currently hot.

Everything that can be done on the EVM, can be done with a dumb blockchain. It just takes a little clever application of functional encryption, ZK-SNARKs and one-time trusted key-generation. Most of the computation can be done offline, and contract vaults just need to periodically sign blocks with a hash of their internal state. The difference is that the offline approach only consumes O(1) network resources.


Bitcoin
Posted by EspressoLover on 2017-12-12 02:32
Okay, sure. But at least give me credit for that monster of a Litecoin call.

In other news, these CBOE futures blow. CFE always shoots themselves in the foot by setting their tick size way too large. Who the hell wants to pay $10 bid-ask spread when GDAX quotes at 0.01 most of the time? No wonder their volume sucks.
Bitcoin
Posted by EspressoLover on 2017-12-12 19:59
That's an interesting point. For all their faults the crypto exchanges really got two things right.

First, the tick sizes are wonderfully small. If an order book's sitting at a 1-tick spread with fat touch sizes, that's a price floor on the cost of liquidity. The market almost always has >1.0 demand elasticity for liquidity. Long term it's bad for everyone - even providers. The past 10 years has been an interruption to a century-long trend of exponentially rising trading volume. And I really think it's because we stalled at decimalization two decades ago. Sub-penny pricing on the lit market is way overdue. And futures are even worse. (I know this is formally set by the regulators, but the exchanges could easily lobby for a change.)

The second is that the crypto exchanges give away the market data and ancillaries, but make money on the trading fees. The real exchanges have an inverted business model: breakeven on trading fees but milk everyone on the ancillaries (data feeds, co-location, DMA registration, etc.). It's a great way to squeeze quarterly earnings when markets are tepid. But ultimately the market mostly consolidates to a few big players who can afford the fixed costs. Less participants = fewer opinions on market prices = less trading = less volatility = even higher market data fees. It's a vicious cycle that just kills long term growth.

Anyway, these two issues are almost large enough to make up for the terrible Node.js and ridiculous outages. Let's be real: a good amount (maybe most) of the flow in the "real markets" have no fundamental economic justification behind them. It's just noise traders and risk-lovers seeking speculative action. If FX stops moving they'll trade index futures. If index futures stop moving they'll trade tech stocks. If those top moving, they'll trade VIX. If VIX stops moving, they'll trade crypto. Hopefully crypto mania's enough to give the real exchanges the kick in the ass they need to get back to the kind of pro-growth outlook they had circa 2000.
Is anything interesting happening in quant finance right now?
Posted by EspressoLover on 2018-01-30 11:50
Like the title questions says...

Is any group/desk out there doing anything cool and interesting right now? Something like convertible bond arbitrage in the 90s, stat-arb circa 2001, structured credit circa 2006, HFT circa 2008-2011 or vol-arb circa 2014. Or has low vol, long-beta mania and trading firm consolidation taken the winds out of everyone sails?

My situation's such that I'm kind of out of the mainstream gossip networks, but afaik things look pretty boring right now. Crypto's maybe the only thing I can think of. (But c'mon do blockchains have to be the hype vehicle for literally every single field on Earth?) Anything you guys are excited about right now? Random speculation, reckless gossip, and uncorroborated hearsay are definitely welcome. Even if you have to pepper it with some BS, get me pumped up about something!
Is anything interesting happening in quant finance right now?
Posted by EspressoLover on 2018-01-30 18:02
> Alternative Data
> Machine Learning
> Cryptocurrencies

At the risk of moving my own goal posts... Do you think anyone's making real money on these fads? (Besides collecting AUM fees from hyped up investors)
Is anything interesting happening in quant finance right now?
Posted by EspressoLover on 2018-02-01 12:14
@chiral

Thanks very much for the very well-thought out response. There's so many nuggets of wisdom there, that I kind of want to unpack everything. I'll try to restrain myself to the highlights...

> The decade from 1998-2008 was really the only time where a pure quant wasn’t completely subjugated.

If I'm grokking you right, you seem to be saying that 1998-2008 was the aberration and post-2008 is probably closer to long-run ergodic normalcy. This viewpoint (which is definitely somewhat of a bummer) never occurred to me. I always thought of quant-ishness' rise as driven by secular trends: cheaper computing power and increasing market transparency. And its post-2008 decline as a cyclical trend: QE, high regulation, deleveraging, etc. Hence the expectation that long-run normal will look a lot more like 1998-2008 - secular trends march on, and cyclical effects fade away. But your counter-narrative is pretty compelling, so I'm reconsidering...

> Sentiment continues to drive a ton of absolute movement in asset prices. The noise floor is just too low still for anything else to rise above it.

You mentioned something like this in another thread, and I'll just remark that it's a great insight on the current market regime. It never occurred to me before, but after hearing it seems so obvious and descriptive.

> While regulation has been a pain, the pattern recognition problem of identifying patterns in regulatory filings, balance sheets, and rules has become a huge textual analysis problem on the regulatory corpus

Another great insight. Lemons to lemonade

> Disintermediation via fintech is only just beginning...

Don't disagree, but at least as an outsider, it doesn't really seem like quants have any comparative advantage in this space. Is a company like Stripe or SoFi really doing anything more complex than what MBAs do in Excel everyday? Maybe they're doing it with more scale than the regional consumer bank. But that's more an engineering problem ("do this simple calculations, but at 100,000 transactions per second") than a quant problem ("how the hell should we even go about pricing this thing"). If anything, adtech seems like a more natural home for an emigre from quant world.

> However, it’s kind of a fun thought experiment to try and solve classical problems quantum mechanically.

I'm getting off on a tangent here, but I'd recommend Quantum Computing since Democritus. Scott Aaronson's approach to quantum mechanics reminds me a lot of what you're saying. (Well more with a CS perspective than a physics perspective). But it's a good take on QM qua QM. What can we say about QM as a pure mathematical system? What are the most fundamental differences between quantum and classical systems? Why probability amplitudes instead of probabilities? How big are quantum states relative to classical states? Compared to the standard textbook treatment, which is more about how to use QM to solve problems, this book's trying to posit why might QM have these properties in the first place. Aaronson makes a pretty convincing case that QM isn't just some weird set of arbitrary rules, but there might be pretty deep philosophical reasons to expect reality to be quantum even a priori.


@mtsm

Yeah, I'm definitely coming from a buy-side/trading biased background. It probably seeps through, but definitely still interested to hear about stuff in sell-side/pricing world.

Thanks for posting that list. It's pretty interesting, and some of the points hadn't occurred to me...

* It will be interesting if systematic macro really gets off the ground. Especially given how poor most regular macro managers have done the past few years. What's relatively unique about that space is how small the data sets are. If you're forecasting monthly moves in major FX pairs, you're not getting much more than 5,000 data points even if you go back 40 years. So it's definitely understandable to see why starting with good priors is really important. At least compared to some large data set where you have enough degrees of freedom to mine for signal in a black box.

* Alt-data could definitely be the next big thing. But I wouldn't bet the farm on those signals having high R-squared. It's kind of tough, because even if you have something like credit card transaction data, you don't necessarily know what's a good vs. bad number. Earnings come with analyst forecasts, so it's easy to divide bearish from bullish results. But most alt-data doesn't have any equivalent baseline for comparison. Like yeah, maybe credit card transactions indicate a 4% uptick in sales this quarter. But how do we know that wasn't already baked into market expectations? It seems like regular old meatspace analysts will be able to contextualize that data better than algorithms.

* Distributed ledgers are the only thing on that list I'm unequivocally bearish on. (Which sorta sucks, because back office deserves to get a cool project once in a while.) Distributed ledgers certainly have specific uses and unique advantages. But I think 95% of the demand for the tech is simply coming from senior execs who've been sold on the idea that it eliminates the cost and headaches of running a persistent service. This seems plausible because bitcoin operates pretty seamlessly without a Bitcoin LLC paying huge AWS hosting fees and waking up their CTO at 3 AM. But that's basically because the miners get paid assloads of money to make that happen, like orders of magnitude more assloads than a DBA team.
Combinations
Posted by EspressoLover on 2018-02-15 16:50
Just for future Googling reference, this is called a partial sum of binomial coefficients.

I'm assuming:

> members of group A can appear 4 times overall, while members of group B only twice

Means: The combo can include 0,1,2,3 or 4 distinct A elements, and 0, 1 or 2 distinct B elements. Please note the 0s, meaning that the empty set is a valid combination. (If that's not the case adjust the below math accordingly.)

Every valid A combinatorial can be paired with every valid B combinatorial. So the answer is just [A-partial-sum] * [B-partial-sum].

A is a special case, since it's actually just the full sum. A's total sum is just 2^||A|| = 2^4 = 16.

B is a true partial sum, and you there's no shortcut to just adding up the individual binomial coefficients.
4-Choose-0 = 1
4-Choose-1 = 4
4-Choose-2 = 6
B's partial sum = 1+4+6 = 11.

Therefore the size of the total combinatorial set is 11*16 = 176. (If zero selections aren't valid, then the answer is 15*10 = 150)

https://en.wikipedia.org/wiki/Binomial_coefficient
Combinations
Posted by EspressoLover on 2018-02-15 17:38
Understood. Your terminology was fine, misunderstanding was mine. I guess I'm being tripped up on this:

>  no element can be repeated in the same pair (so no 1,1), order does not matter (so 1,2 is identical to 2,1) 

Just to clarify: each time we pick, we pick a pair of elements - one from group A and one from group B. (Otherwise does that mean we can pick a pair with two elements from group A?)

I'm assuming this means it's possible for elements to appear in both group A and group B. (Otherwise why do repeated elements need to be explicitly excluded?) I.e. something like:

Group A = (1,2,3,4)
Group B = (2,4,6,8)

If that's the case, what side do we "count" towards when we pick an element from the intersection? In the above example if we pick (2,4) does the use of the 2 count toward's group A's budget or group B's budget? Or both? If 2's in the first position of the pair does it always count toward A's budget? (And if that's the case, then order does matter.) Or do we have leeway when selecting intersected elements to apply it to the most advantageous budget?

Sorry, for being so pedantic... Just trying to make the problem as precise as possible. This basically sounds like a variant on the knapsack problem.

Combinations
Posted by EspressoLover on 2018-02-16 19:04
Thanks for the clarification. I think I have the answer that you're looking for. There's good news and bad news...

Let P be the set of all the elements from any group. For now, we'll just treat everything as part of one huge set.

Let Q be the set of all 2-combinations from X. 2-combination means any pair of elements, with no repeats and ordering doesn't matter. (Also you can trivially extend this to N-combindations if you want triplets, etc.). ||Q|| will be about ||P||^2/2 in size.

Let S be some enumeration of all the elements in Q. Doesn't matter how we enumerate the 2-combos, just we have to put them in some ordering. Let's just say alphabetical ordering.

Now we basically have a Linear Programming problem. We want to solve for some 1/0 vector X of length ||S||. X corresponds to whether we include the specific element from S in the final set or not. I.e. if the fifth entry in X is 1, then we include the fifth 2-combination from the enumeration, S. If it's 0, we exclude it.

The objective function is to maximize sum(X). We want as many entries to be 1 as possible, because that means we've included the maximum possible number of 2-combinations and hence have selected the larger subset.

Finally we introduce some constraint functions which correspond to the per-group limitations. Let's say we want to restrict any member from group A from appearing more than M times. We define a new constraint vector, C. C also is a 1/0 vector of length ||S||. Each entry in C is 1 if the equivalent 2-combo from the S enumeration contains an element from group A. 0 if it doesn't. E.g. if group A = {1,2,3,4} and the fifth element of S is (1,3), then the fourth entry in C would be 1. If the seventh entry of S was (5,6), then the seventh entry of C be 0.

Using this we add a constraint to the LP problem. X * C <= M. (Where M is the number of times we want to restrict elements from that group from appearing. * just means the vector dot product.) Non-solutions that contain a group's element too many times will be rejected by this constraint. You can add similar constraints for however many groups you have.

And that's it, just solve the LP and you have your solution. More specifically this is just a multidimensional knapsack problem.

The bad news is there's no a closed-form solution. In fact even worse, this problem is NP-complete, so it's not even computationally cheap. Particularly if your set sizes are large. The good news is that multidimensional knapsack problems have been around a long time. We know how to solve them, and there's plenty of free software that will do it for you. The second good news is that, even if the problem is NP complete, I strongly suspect that your set sizes aren't that large relative to what a modern Xeon can crank out in a few minutes.

The final good news is that even if this isn't the case, there are a lot of good approximation algorithms that run in polynomial time, and probably will give you a "good-enough" solution even if it isn't optimal.
Short-Term Cap Gains Taxes and Compounding
Posted by EspressoLover on 2018-02-19 10:07
Bump...

1) Anyone aware of new opportunities from Trump's plan?

From what I've seen, the lower corporate rate might offer a petit opportunity. When corporate rates were 35%, it almost always made sense to structure investments as a passthrough. But at 21%, you might get more juice out of compounding than you pay in double taxation.

In the toy example from my original post, structuring the strategy as a non-passthrough corporate would produce a post-(corporate-)tax annual ROI of 39.5%. If the initial $1 million grew inside the corporate structure for fifteen years, it would hold $147 million. Assuming dividends are taxed at a personal rate of 23%, you'd get paid out $113 million. More than double the straight pass-through option.

The passthrough deductions don't seem like anything great for traders. Even if you fully utilize the 20% deduction, it lowers the top marginal rate from 40% to 32%. And getting that deduction is no easy feat. You either have to 50X the deduction in real capital expenditures, or 4X the deduction in payroll. Obviously neither of those really apply to a trading operation. Best I can think of is finding some corporate with a lot of people on payroll, who's not utilizing the deduction. Then cut some deal to "employ" all their employees as a sort of middleman "service provider".

Finally there's the 100% CapEx deduction. Maybe there's some interesting games to play here. What I can think of is deferring trading-related taxes with offsetting CapEx. E.g. if you make $1 million in 2018 from trading, spend $1 million on some long-lived asset with steady cash flow. Like a commercial building or something, which you can lever up. That sets your tax liability to zero for the year. Assume you can borrow 90% against the asset, you're only taking $100 thousand out of the trading operation. It defers most of the trading related taxes, so you avoid most of the tax-compounding drag.

Anything else interesting in this front?

2) Related to plain old pre-Trump tax law... Anyone have any color on using the Active Financing Exception to Subpart F? Normally a majority-American owned offshore company is "collapsed" into a passthrough entity for tax treatment purposes. But as I recently learned the law requiring this, Subpart F, has an exception related to "active financing".

The classic example is a bank engaged in genuinely offshore activities. But at the very least it seems like a securities brokerage is also covered (as long as it only touches non-American securities). Assume you're running a quant trading operation that covers international markets. I would think you segment off the non-domestic trading into an offshore company, and use this to shield income under the active financing exception. Particularly if you're closer to the HFT end of the spectrum, where the operations do seem quite similar to what a securities brokerage does. However I couldn't find any specific guidance on this topic, so curious if anyone has looked into this?


Short-Term Cap Gains Taxes and Compounding
Posted by EspressoLover on 2018-02-27 19:13
@nerotulip

That's a great idea, and you piqued my curiosity. The good news is that it seems like Roth IRAs are exempt from Subpart F. So you setup an offshore C-corp in a jurisdiction without corporate taxes, and avoid both US income and corporate taxes. This way you also get around the leverage and settlement restrictions. As long as the corporate entity is limited liability, you're not pledging IRA assets as collateral.

The hitch comes from avoiding "prohibited transaction". Basically the IRA cannot engage in any transactions with the IRA holder, immediate family or their fiduciaries. The logic here is to prevent someone from transferring value into/out of the IRA by doing something like selling an expensive asset to the IRA for $1 or having the IRA pay themselves a salary.

So you definitely cannot receive management or incentive fees on the investments. And even leasing the software from yourself or an associated entity also doesn't smell kosher. However, I think there may be a couple of workarounds. One is to "release" the software under the MIT license or some other copyleft. Just don't publish it anywhere. In this way, there's no actual transaction. The IRA entity is simply utilizing an open-source piece of software, that just so happened to be created by the disqualified party.

Another would be to do something akin to Romney. He put his interest in the partnership inside the IRA. Since there was no actual revenue streams associated with the partnership at the time of formation, the tax rules allowed him to value his interest at $1, well under the contribution limit. I think you could also do something with the software license itself. As long as it's not being actively used at the time. The downside is that it limits you from using the software in any context outside the IRA, because that would be a prohibited transaction.

However I should say once you get this far out, with this or any of the other more out there proposals, there isn't really so much "law" in the classical sense. The statutes are ambiguous, case precedence is sparse, and the underlying concepts don't really map on to these scenarios. (It's not even really clear if capital gains from quant trading qualifies as "active" or "passive" incomes.) You can run through the wording of the law umpteen times with tax accountants and lawyers, but it's going to basically come down to the subjective opinion of the tax court judge.
Visualizing the Data on 6356 American Stocks – with R source code
Posted by EspressoLover on 2018-03-28 09:09
But if the "experts" are right, it should be impossible to find such stocks since all they are allegedly driven by the same macroeconomic factors.

This is a common misconception of factor pricing models, like CAPM. They don't make any claims about the total magnitude of idiosyncratic risk. Only that diversifiable risk shouldn't be priced. Regardless of how uncorrelated single stocks are to the index, it doesn't impact the portfolio of a well-diversified investor. In fact I'd expect a CAPM-driven world to have idiosyncratic risk. If it isn't priced, then it doesn't impact valuation, and there's not incentive for corporate management to keep it contained.

I'm not endorsing the CAPM, or saying that it's absolutely true. But I think it's fair that if you're going to assail the "experts", that you at least give them a fair shake.
Bitcoin
Posted by EspressoLover on 2018-04-03 00:10
Bump.

I have a colleague who's setting up a regional crypto exchange. All the competitors are pretty shitty, even by crypto-exchange standards. So the goal is to beat all the competition in terms of technology.

He's looking for a white box solution. Ideally something that supports a real limit order book matching engine, and something with a decent push-based datafeed. It'd have to support localization in the frontend, both for the language and the exchange brand.

Anyone have any recommendations here? Feel free to either respond here, or drop me an email. Thanks.
Bitcoin
Posted by EspressoLover on 2018-04-04 13:00
@kloc

Thanks for the recommendation.

A lot of factors behind why GDAX, et al. isn't competitive in this region. Poor localization. No support for the local fiat currencies. The banking systems in this part of the world are often subject to capital controls and/or poorly integrated. GDAX doesn't work with the dominant regional payment systems. Plus the would-be exchange already had a pre-existing forex brokerage, so it already has a well known brand.
Seeking alpha out of the Casino/Gambling Websites...
Posted by EspressoLover on 2018-05-21 19:29
Ethics aside, one of the interesting takeaways from the horse racing story is that you're better off attacking parimutuel games. All things being equal you want to beat lotteries, poker tournaments and lit exchanges rather than blackjack, bookies and internalizing dark pools. In the latter, you're taking money from the house, and their incentive is to shut you down and cut you off.

Even if you have a great system, you end up spending most of your effort fighting the house, avoiding detection and standing up sock puppets. If you're hitting a parimutuel system, you're only ripping off the other players. In all likelihood, the house will try to do everything they can to help you, since you're generating volume.

That being said, this is one of my favorite "Beat the Dealer" stories. Some Russians bought the source code powering a large percentage of American slot machines, and found an exploit in the RNG. I have a feeling that as software continues to eat the world, gambling included, that we'll see a lot more cases like this.
Mathematical Success in Quantitative Trading
Posted by EspressoLover on 2018-05-31 21:22
First, I think it's important to distinguish "pricing quants" from "trading quants". By and large the two often rely on pretty different techniques, so it's hard to make sweeping statements about both. That being said I'm a trading quant, have very little experience with pricing, and the OP seems to be asking purely from the standpoint of a trading quant. So the rest of my post comes from a trading perspective.

Abstract algebra, real analysis, and pure math have virtually no practical use in this field. Yes, a lot of people with this background get hired. But that's basically because skill in these fields is highly correlated with raw cognitive ability. It's the same reason why some firms recruit chess grandmasters.

If you want to learn fields you might actually use, I'd suggest (in rough order of importance, depending on your area) statistics, linear algebra, time series analysis, statistical learning theory, classical finance, machine learning, convex optimization, signal processing, metaheursitics, game theory, algorithms, information theory, microeconomics, probability theory, econometrics, numerical analysis, and NLP. If you read one and only one textbook, make it The Elements of Statistical Learning. If you grok everything in that book on an intuitive level, you will be ahead of the sizable majority of trading quants.

The vast majority of successful strategies are using basic techniques. Most of the actual math could probably be taught to a bright high school senior. That being said, and echoing @ronin, the importance of understanding the fundamentals is to know why a certain technique is appropriate and what it's specific shortcomings are.

From a philosophical standpoint, I see a few interrelated reasons why pure math is mostly useless for quant trading. The first is because off-the-shelf techniques are almost always sufficient. An innovative strategy may use a new data source, or an innovative approach to frame the data. But by and large the actual modeling techniques has probably already been worked out in some other field. For example, we take for granted that least squares is the MLE for normalized error terms. That originally required some rigorous abstract math to derive and prove, but it was done well before anyone in finance arrived on the scene. In contrast string theorists or cryptographers often find themselves needing to push the boundaries of previously known math.

The second is that empirical validation is more important than formalized validation. Most of the time our historical data is more "reliable" than our underlying assumptions. I'd rather trade a strategy that backtests well, than a strategy derived from first principles. Whatever axioms we start with are at best a rough characterization of reality. Most of the time we want the data to drive the model, rather than vice versa. That means fewer parameters, simpler structures, and weaker pre-existing assumptions. All of which usually entails simple mathematical models.

The third is that financial markets are "wet" and noisy. In comparison something like General Relativity is completely described by 10 field equations. Those alone are all you need to completely model the theory with arbitrary accuracy. In finance there are "stylized facts" that describe a lot of behavior. But beyond those you get all kinds of kooky shit happening. Like people doubling some unrelated stock's price because its name sounds like yesterday's hot IPO. With a "dry" problem like GR, you can very heavily lean on your assumptions. You can derive results using arbitrarily long and complex chains of reasoning. But in a wet domain, even well established stylized facts break down if you dissect them too much. Its like trying to take the derivative of a non-smooth function. Careful mathematical rigor takes a backseat to intuition, analogy and empiricism.

Similarly, image recognition is another very "wet" problem. And by and large, we see very little mathematical rigor. The state of the art and most successful teams pretty much stick to open-ended non-parameterized methods. Then throw a shit ton of computing power and training data at it. Hardly anyone wastes time trying to formally prove anything. Very few, if anyone, can definitely say why certain techniques work better than others. At least not from a theoretical perspective. Ask why one neural network architecture is used over another, and you pretty much just hear that it "did better on ImageNet".
Mathematical Success in Quantitative Trading
Posted by EspressoLover on 2018-06-12 17:04
@chicagoHFT

You seem to be making a logical fallacy, where either everyone knows A) nothing technical, or B) the deepest and most pure fundamental mathematics. That's bullshit, you don't need ring theory to understand and use tensors. General relativity was discovered before ring theory even existed.

I guarantee you that Yann LeCun knows less abstract algebra than the average math grad student. Myron Scholes probably doesn't know any abstract algebra. Andrew Ng might not even know what abstract algebra is. And I'm not even sure if John Merriweather can count past 20.

You don't have to take my word for it. Here's a podcast with Nick Patterson, who was one of the most senior people at Renaissance. Surely, if abstract algebra was important for quant trading, we'd certainly see it used at the most successful quant fund of all time. Especially because that fund was founded by algebraists.

And yet... (answer starts at 30:00):

The most important type of data analysis is to do the simple things right. Here's a non-secret about the things we did at Renaissance. In my opinion the most important statistical tool was single regression. One target, one variable... We have the smartest people around, string theorists from Harvard, and they're doing single regression. Is this stupid? Should we be hiring stupider people and paying them less. The answer is no.

And the reason is no one tells you what variables you should be regressing. What's the target? Should you be doing a non-linear transform before you regress? What's the source? Should you clean your data? Do you notice when your results are obviously rubbish?... The smarter you are, the less likely you are to make stupid mistakes. And that's why we need smart people doing something that appears to be doing something technically easy, but actually isn't so easy.
Best “unknown” quant funds/shops
Posted by EspressoLover on 2018-07-01 18:49
Scalability is a major consideration. Many many groups produce Medallion+ returns but on tiny capital bases. Doing 100% a year on $1mn isn't exactly legendary-worthy. Also regarding Two Sigma and the like, you have to remember that many hedge funds keep separate insider-only funds. They put their best strategies in those structures, and don't report results. This is the same story as Renaissance, but the major difference is that because of its history Medallion is more famous than RIFF.

> (not pure HF/market-making) 

Medallion's fairly high-frequency. Last I heard it was doing something like 3% of the ADV on US futures and equities. Given its capitalization, and reasonable leverage assumptions, that would imply sub-hourly turnover. As for market making, I would guess a substantial amount of its order flow adds liquidity. Both in the mechanical and economic sense. Overall I don't think Medallion's that different than a firm like Jump, besides operating at larger scale and pushing the horizons out a bit further.
Best “unknown” quant funds/shops
Posted by EspressoLover on 2018-07-09 20:46
@u1234

Thanks! for the color.

@sigma

Sharpe of 27 isn't really unusual for HFT. It's quite common for shops like Virtu to report never having a down day. That implies a Sharpe of 16 or more. At some point it doesn't really matter, since more of the standard deviation is driven by day-to-day variation in the volume rather than position risk.

@levkly

It's feasible But difficult. You'd need to diversity across a large set of orthogonal factors/exposures, and ruthlessly hedge out any exposures that you don't have signal on. Some back of the envelop math...

Let's start by saying you have the equivalent of 225 orthogonal factors. Obviously there's a lot more than this number of securities in the world. But with a $4 billion fund levered 5X, to get 1/225 of your portfolio requires $90 million in exposure, so most of your returns are driven by blue chip stocks, liquid futures, and major FX pairs. 225 would be the upper-limit even for a globe-spanning operation.

Let's say the typical factor has a daily volatility of 50 basis points. This is roughly the idio vol for a typical US stock. This number doesn't matter too much, the logic mostly holds if you want to scale it up or down. At a three week horizon, that's 193 basis points of volatility.

3.0 annualized Sharpe means a Sharpe of 0.72 at the three week horizon. Across 225 uncorrelated factors that's an individualized Sharpe of 0.048. Using the above volatility, the average trade needs to net 9.2 basis points of return after costs. That doesn't seem huge, however using the three-week horizon window the net signal needs to achieve an R-squared of about 0.2% on the orthogonal factor returns. That's pretty substantial at a weekly horizon.

Obviously this is a toy model, but I think the numbers don't look substantially different for reasonable extensions or modifications.

That being said, I'm relatively skeptical that the bulk of Medallion's returns come from strategies at these horizons. For one, there's the ITG lawsuit from 2009. Basically they were trying to arb out discrepancies between the Posit dark pool and lit exchanges. That certainly looks very HFT-like to me. For another, before merging into Medallion, Nova was doing 10% or more of the volume on the NASDAQ back in the 90s. It seems unlikely given the shifts favoring electronic trading that they would have significantly substantially retreated. Another factor is the amount of computing power they supposedly have. You don't need that many FLOPs to evaluate interday strategies.

I do believe, that there's probably some definition of "average holding time" for which the statement is true. Especially if they're counting very long-lived portfolio hedges or cash instruments. Longer-term strategies tend to be hedged and levered, whereas short-term strats bias flat and don't hedge nearly as much. At any given time the median position in the portfolio, weighted by notional, could be held for a long-time. But I'm fairly confident that a substantial proportion of Medallion's profits come from intraday signals and strategies.
Best “unknown” quant funds/shops
Posted by EspressoLover on 2018-07-10 16:51
@u1234 and @wpdupjuj

Those are all fair points. I've updated my priors.

> My advisor has sent many students to Setauket. All of them must be using techniques very close to what we picked up in our research

I'd caution against leaning too heavily on this. RennTech is a huge operation, and its current systems are the cumulative result of more than a thousand researchers. A single person's experience, or even a group of people's experience is likely to fall short of the whole picture. It's a little like the fable of the blind men each touching a different part of the elephant.

I certainly believe that Medallion has some strategy that utilizes whatever this technique is. However it'd be difficult to say how substantial a proportion of the fund's profits are attributable to this technique. And I'd think it'd be outlandish to say that the answer is all or even most of them. Medallion does a whole bunch of things very well, and it's been doing so across a huge span of time in terms of market regimes and structure. The secret sauce definitely has more than one ingredient.
ML Methods: finding the right hammer
Posted by EspressoLover on 2018-08-20 21:30
I've had a fair bit of luck with ML approaches and techniques. One thing though is that quant trading tends to a very different beast than most problem domains. In particular when it comes to predicting forward returns the signal-to-noise ratio is very very low. So in general, most techniques don't work out of the box. At the very least you need to have a pretty deep familiarity with the model, and have an intuitive understanding of what kind of inductive biases you're assuming. Often you need to get under the hood and actually tweak the implementation.

Some random thought making up my 0.02 bps on this topic. As always YMMV.

- Always start with the simplest methods first. 99% of the time an OLS is more appropriate than a deep convolutional network.
- Start with the cleanest datasets possible. The more complex the model, the cleaner the data needs to be. Since the actual signal is so small, even a small error can make it so that your technique spends all its explanatory power on the bad data instead of the actual alpha.
- Always keep data as pure out-sample. Don't even look at until you have a final product. Sometimes overfitting can happen on the level of the researcher picking models, rather than inside the model itself.
- Be careful using techniques designed for classification. Remember in classification there is no penalty for overconfidence. In trading, there's a tremendous penalty for overconfidence.
- Bias towards fitting shorter-horizons. Shorter-horizons are less noisy, less prone to overfitting, and exhibit more stable behavior over time.
- Random forests are the closest thing to a free lunch in all of ML
- Unsupervised learning tends to not be that useful. Markets are not only noisy, but they're efficient. Whereas in most problem domains the most visible features tend to be the most predictive, in trading obvious things get arbitraged away quickly. For example let's say we're looking at a bunch of order flow features. If we do PCA, the biggest eigenvectors are probably things telling us what the total volume is in the market. That doesn't give us much useful alpha. Whereas the relative balance of buy/sell order flow, which is critically predictive, probably has comparatively small eigenvalues.
- For that reason deep learning also tends not to be that effective. A lot of deep learning is using some form of unsupervised learning to compress the data, before training on the extracted features.
- Ensemble methods are not only powerful, but also robust against changing market regimes. The original Netflix prediction paper is a great example. Fit a lot of different models with different techniques, hyper parameters and feature sets, then combine them together.
- Along the same lines stacking models is also very powerful. Start with simple, linear techniques, then fit more complex models on the residuals. If 50% of your variance can be explained by a three-dimensional hyperplane, a tree-based implementation is going to waste a tremendous amount of degrees of freedom doing something that's dead simple for least squares.
- In a lot of ways ML is often just a substitute for feature engineering. Many complex models could probably be replaced with linear fits of better features. (Not saying that the tradeoff isn't worth it, I'd rather spend 100 hours of compute time than 10 hours of research time.) But it helps to be aware of the tradeoff between modeling and feature engineering and direct your time towards where returns are highest.
- If you're doing anything latency critical, be aware that most common ML representations are wasteful and computationally expensive to evaluate. If you must have it, you need to be clever about avoiding touching these things in the hotpath.
- [Placeholder for further additions if I think of any...]
Technology Advances
Posted by EspressoLover on 2018-10-04 20:34
> I still think there's potential in growth hormone secretagogues and things like this for anti-aging

The current market structure, regulatory regime and even broader culture is really ill-suited for anti-aging R&D. Metformin has pretty substantial evidence of slowing human aging. It's also dirt-cheap, super-safe, approved by every regulator, been used for decades, and has basically no side effects. If we can't even get the medical system to push metformin, why would anyone spend the resources and risk to develop some next-gen speculative compound.

> admittedly i read this fast but none of these possible breakthroughs seems to have happened

Over the past few decades most of the major hype has pretty much disappointed. There really hasn't been many breakthroughs, but a lot of things have improved so incrementally that they're barely noticeable. Computers, networks and consumer electronics are the headline. But even outside this there's been a lot of improvement. Cars are way safer and more reliable. Manufactured products are dirt cheap and made better. Cancer survival rates have steadily improved. Food is cheaper, fresher, more varied, available year-round and much less likely to make you sick. Air pollution and water pollution has gone down a lot in most developed countries. LED's have improved lighting along every dimension.

If I had to guess the near future probably looks a lot like the recent past. I doubt we'll live to see a period of breakthrough after breakthrough, like 1890-1960. But most things will probably just keep slowly and continuously improving. For example, I think fully autonomous cars are decades away. But we'll probably get widespread "super cruise control", and each year it will probably be able to do an increasingly higher percentage of driving. However getting from 95% autonomous to 100% will be a total bitch.

Quantum computers won't have much an impact anytime soon, but Moore's law will keep delivering way longer than anyone expects. Sell the hype, buy the quiet engineers in the backroom who are too busy and diligent to bother with PR.
Are we in a quant meltdown akin to August 2007?
Posted by EspressoLover on 2018-10-31 17:54
Some of the numbers coming out of equity hedge funds are looking really ugly this month. Obviously we've suddenly fallen into a global equities correction. I'd expect the long/short funds to get clobbered as they're mostly going to be long biased in a late bull market. But relative to their historical range, it looks as if the equity market neutral funds are doing even worse. So it's not just a general issue of losses on beta.

I know earnings responses have been really wonky, and that's probably fucking up a lot of interday quant strategies. Also sector rotation volatilty has spiked faster than index vol. At least from the 10,000 foot perspective, things are kind of looking like August 2007, or at least August 2011. That is normally placid quant factors going through violent moves because of synchronized de-risking by the major multistrats.

But this is totally just supposition, and I currently have very little transparency into the day-to-day of this space. Was hoping other phorumers with insight could add some color, or at least join in the unfounded speculation.
Are we in a quant meltdown akin to August 2007?
Posted by EspressoLover on 2018-12-10 19:16


Just going off Barclays indices, November wasn't as bloody as October, but still seems pretty bad. Especially considering the S&P was up nearly 2%. Most of the aggregate hedge fund performance seemed saved by emerging markets and event-driven, which are pretty long-biased.

Equity market neutral was down 1.9%, the worst month in over five years. And that's already after a pretty bad October. Another bad sign is that the "smart money" funds like Citadel, SAC, and Millennium did worse than the median fund in November.

Seems like a continuation of systematic deleveraging of sophisticated investors. Since US consumer sentiment is still buoyant, and institutional investors are skittish, we're likely rotating towards retail positions. At least in the short-term the market's probably going to keep "dumbing down", kind of like price discovery in reverse.
Are we in a quant meltdown akin to August 2007?
Posted by EspressoLover on 2018-12-14 17:04
@maggette

Sorry for being opaque. All I meant was that institutions seem to be deleveraging to a much greater degree than retail. So, we'd expect the latter to be the marginal buyer, and the former the marginal seller. That would entail a tendency for positions relatively favored by retail to outperform those relatively favored by institutions.

The upshot is that institutions tend to be better informed than retail. That creates a paradox where higher alpha participants are likelier to face losses than their dumb money counterparts. At least in cases where the fund flows are moving faster than the alpha realization horizon (so lower-frequency stuff).
Aggressive HFTs in ES
Posted by EspressoLover on 2019-01-14 21:06
@gaj

Shit, you're right. Bad example...

> Could an event in a single stock be a trigger event for ES? That sounds very unlikely to me. What's more likely is that a series of events accumulate until the signal is strong enough to fire in ES.

Here's kind of my mental model. You have some sort of net alpha, which is the sum of two sub-signals: X and Y. X is a smoothly diffusing stochastic process, whereas Y is discontinuous jump process. Most of the time when net alpha crosses a threshold, it's the direct result of a jump in Y. Not always (sometimes you diffuse through the threshold) but usually. Most participants have basically the same Y signal, but somewhat orthogonal X signals. If Alice is the only one to trade because her private X is stronger than Bob's, it's probably still in response to a public Y jump.

Never really used the cash equity data that much, so take this with a grain of salt. But, like you said, price evolution in the single name equity space is probably closer to a smooth stochastic signal. To the extent that price discovery in single name is independent, then even discontinuous ticks in stocks are going to look pretty smooth aggregated over 500 names.

This type of trading would rarely trigger an ES move by itself. It might adjust traders' threshold of what size jump they respond to. But even if traders are looking at different signals, you're almost always racing someone. Sometimes the size of the jump is so large that the cash basis signal is irrelevant. Everyone wants in. Or maybe not, and a jump only triggers a minority of the HFTs. But even if only 20% of the HFTs jump in, that's still a hell of a race to try to win. Almost always, you gotta fight with latency.

But in terms of what @anonq said, there can be action in the cash space that isn't smooth/independent. Sometimes price discovery across single names can wind up being pretty coordinated. You can get a cascade, where some of the names in the index tick down, then more start ticking down, then quickly almost all of them move. This is the result of equity HFTs sweeping the market once price discovery telegraphs that the entire market's moving down. A coordinated cash move like that could definitely trigger an ES move.
Aggressive HFTs in ES
Posted by EspressoLover on 2019-01-14 18:35
Used to play this game in ES (though was never very successful). Like others have mentioned you can still derive a tremendous amount of signal from the evolution of the order book and flow just within a single instrument. In addition, ES does predominately lead, but it’s far short of 100%.

On the CME alone, you can derive a fair bit of signal from the relative basis with NQ, YM, ZN. Even 6E or CL depending on the current macro regime. The thing is ES has a 1 bp spread, plus taker fees are relatively small. So you don’t need a very large magnitude signal to cross the spread. (Edit: Removed dumb example)

You can dive deeper into interactors to isolate periods where price discovery is occurring on other instruments. Very simple example: cash tends to relatively lead the index around the opening auction when participants are digesting overnight events in single-names. More sophisticated is to look at the comparative order book evolution between the pairs.

Effectively what happens a lot of the time, is that there are “trigger events”, where it becomes obvious that the remaining liquidity at the touch is easy money. The typical example would be a bid showing 1000+ contracts then getting smashed down by a giant sell to <10. Grabbing the remaining liquidity is pretty close to a sure shot in terms of profitability.

ES is a pretty thick book. Even if you’re using more sophisticated and subtle signals, they mostly contribute by titrating the threshold to trade trigger events. In the above example you may normally lift the bid if size falls to 25 or under, but go up to 100 contracts if NQ is cheap and order flow’s trending short. Latency becomes not just an issue in terms of getting fills, but also adverse selection. If you’re slow, your fills mostly happen when others don’t want to trade on an event.
Decreasing forum activities
Posted by EspressoLover on 2019-04-26 17:25
I'll push back and say, I don't know how much of it is finance specific, as much as it is the changing face of Internet culture.

The independent bulletin board is a dying medium. In 2003 it was the predominant mode of discussion on the web. Young people nowadays don't like curating their bookmarks to keep track of forum websites. They prefer dealing with only one or two mega networks like Reddit, Twitter or StackExchange. Just login once, then let the algorithm do most of the filtering work.

Also users like the constant dopamine hits you get from upvoting and gamification. Rather than long-lived threads with protracted discussion, the front page on these sites turns over *fast*. You get constant blue links by hitting F5. And if you agree/disagree, rather than putting in the effort to write a reply, you just click a button to feel a visceral satisfaction.

I don't really think its necessarily a bad thing that this site missed out on that transition. I could just be old and cantankerous, but the quality of Internet culture is atrocious compared to what it looked like pre-social media. I'd rather tradeoff quantity for quality. Would it be better if the forum was overrun with memes, culture war topics, recycled reposts and self-righteous pity parties? It's nice to have a little niche of the web that hasn't succumbed to the 2010s' version of Eternal September.
how much does membership in SPY raise a stock’s price?
Posted by EspressoLover on 2019-05-06 19:54
This paper estimates ~9%.

I will say, this topic is hard. Especially in modern times, when many sophisticated traders are trying to capture this premium ahead of time. It's not like the premium's going to fully realize on the announcement date. It's fairly predictable which names will end up getting added/deleted.

Anticipating this, traders will buy up positions for stocks at the precipice. So you have to measure far enough back to capture the anticipatory run-up. But if you look at the returns from longer horizon, you start running into selection bias. Do companies experience high returns because they were about to be added to the index? Or is it that a company with high-returns become large/visible enough to get added at the next reconstitution.
Detrending using CAPM
Posted by EspressoLover on 2019-08-02 17:45
I'd try a variety of approaches, then use cross-validation to see which produces the best fit and how big the magnitude of difference is.

FWIW, usually a simple approach like just rolling the last three month's of linear-regressed beta works pretty well. You're not going to get very different beta estimates even with much fancier state-space models. At least for single-name stocks. The covariance structure tends to be pretty stable over time. For weird pairs like VXX/SPY you do need to be a little more clever.
Trade Wars etc...
Posted by EspressoLover on 2019-08-15 15:54
I don't know. It just doesn't seem like the total magnitude of bilateral US-China trade is large enough to have a serious economic impact. Particularly on the US side.

* US exports to China are only 0.6% of GDP. (0.8% of Chinese GDP)
* US imports from China are 2.7% of GDP. (3.7% of Chinese GDP)
* I can't find a China specific breakdown, but Asia as a whole only accounts for 8% of S&P 500 revenue.

Even with 100% tariffs and minimal substitution, these numbers alone aren't large enough to cause a recession or bear market when US growth is consistently running at 3%.

I guess there could be second-order effects, and those are what's really spooking the market. In some sense I feel like the back and forth on China is a canary in the coalmine. There's this factional war within the Trump White House between the Navarro/Miller populists and the Mnuchin/Kushner pro-business free-marketers. I think that investors feel like the harder Trump pushes on trade the more likely he is to spend a potential second term focusing on immigration and antitrust instead of deregulation and tax reform.

The other side of it is that while US-China is only a tiny fraction of global trade it has the risk of a sort of policy contagion. Like if the US shuts down Chinese imports, then China responds by devaluing the yuan hard. So now other manufacturing exporters have the devalue to compete. Now Europe gets pressured to raise trade barriers because its domestic manufacturers are getting clobbered. And the US raises tariffs on even more countries in response. And so on, until it erupts into a global conflagration.
Quant equity’s future
Posted by EspressoLover on 2019-08-22 15:38
We're in the middle of the longest bull market in modern history. It's not surprising that a market-neutral strategy isn't doing great. Analogously during the Internet bubble, many absolute return strategies performed mediocre. By 1999, people were declaring long-short to be dead for good...

And then after the bubble popped, many of the funds that stuck around ended up having their best five years ever. The lesson is that when you get very late into a bull cycle, animal spirits take over and market rationality often takes a back seat. In particular the value/quality/balance sheet signals used by longer-horizon quant strats are pretty powerless in a world filled with WeWork or Pets.com.

Anyway, my point is let's wait to see at least one full bull/bear cycle before jumping to any conclusions.
Opinions on index fund overcrowding
Posted by EspressoLover on 2019-09-06 01:44
To be honest, I don't think the concern Burry raises can be dismissed out of hand.

The danger zone is specifically on the small illiquid stocks that are in the index. Whereas the stocks that are too small/illiquid to make the index cutoff should be fine.

So for example the 400th biggest stock in the US is in a much worse position than the 600th largest. The former is still big enough to be in the S&P 500. In a hypothetical unwind scenario, it's gonna get slammed because it's liquidity is tiny compared to the massive tsunami of capital in index funds. That's not a concern for the 600th largest stock. Even though it's illiquid, it's not part of the S&P 500, so doesn't get directly affected by the unwind.

Just some back of the envelope math... SPY/IVV/VOO alone represent $500 billion in assets. Take a sleepy mid-cap name like DHR. It's 0.36% of the index. So, just the aforementioned ETFs are collectively holding nearly $2 billion in DHR stock. What happens if there's a sudden unwind event or flight-to-quality out of ETFs? The ADV on DHR is only about $225 million.

So if 25% of ETF head for the exit, you're talking about effectively liquidating 200% of the ADV in what? A day, a week, an hour, ten minutes? As far as I'm aware the APs can create/redeem shares continuously in real-time. But even if not market makers can go naked short for intraday positions. And ETF investors have become quite comfortable relying on what presumably looks like massive, cheap, reliable, continuous liquidity.

In my opinion it's mostly an empirical question. How many constituents have serious disconnects between their index weight and their liquidity? How "flighty" are the major segments of ETF investors? And how fast are their stop-losses/margin-calls/capital requirements likely to be triggered? How much stat-arb capital is available to absorb single-name dislocations?

I don't know what the answer to these questions are. And my gut sense is that it probably isn't crack the top 10 in terms of market risks to lose sleep over. In particular many (most?) ETF investors are individuals in long-term retirement accounts. That capital base is pretty much as stable as you can get. But I can see a potential channel, where if the stars align it might be a major problem. At the very least, it's an interesting question that probably deserves a closer look.
Equity factors Sep 9-10
Posted by EspressoLover on 2019-09-11 05:13
This was a pretty good summary.. The equity factors have been moving very contemporaneously with the yield curve on an intraday basis. Just eyeballing it the strength of the cointegration is too string to be coincidental.

I don't know why steepening in rates would be rotating market neutral factors. But given the respective depths of the market, it seems almost certain that the causality runs from the former to the latter. Also important to also keep in mind that despite being a huge two-day selloff, the drawdown's pretty just last month's returns.

Edit Addendum: This article seems to suggest that it's being driven by CTAs reversing the positions that they piled into in August. That makes some sense to me.
QuantFinance is dead
Posted by EspressoLover on 2019-09-18 17:23
> I would also argue that traditional risk-shifting and financing strategies have been disintermediated by VC and private equity,

This is a very under-appreciated point. It's not obvious, but I think in the past ten years private equity has really eaten our lunch.

Much of the raison d'être of high finance comes down to the stylized fact that investors, for both institutional and psychological reasons, tend to be much more risk-averse than is rational. To the extent that you can "fool" them into ignoring their fear of risk, you can create real-value. That's the main insight of the classic noise trader paper.

Quant alchemy's typically accomplished this by transmuting risk into some form that's obfuscated from the constraints. The pension mandate says you have to be 50% in investment grade bonds? Okay we'll just pool and tranche this bucket of high-yield junk until it technically meets the rating agency requirements. You feel uncomfortable buying stocks on margin? No problem, we'll just build this weird option that's effectively 200% leveraged with this esoteric barrier clique that's essentially meaningless but lets you sleep better at night.

Private equity's found a simpler solution to this problem: just eliminate liquidity and transparency altogether. For the irrationally risk-averse, those features are actually bugs. In the long-run private equity returns are pretty identical to small-cap stocks. But on a month-to-month basis the portfolio valuation can be massaged to smooth out the volatility. Look at how much VCs bend over backwards to make sure their companies never have a down round.

If you look at an entity like Softbank it looks pretty similar to a Bush-era SPV. A sprawling nexus of contingent claims, obfuscated exposures, and opaque ownership backed by the aura of geniuses behind the curtain. Basically a way to transmute underlying investments in very risky ventures into a respectable-looking package that a pension manager would be comfortable with. When someone like PIF makes a $45 billion allocation to Vision Fund, that's pretty likely taking food out of quant mouths.

Mar 2020 Performance
Posted by EspressoLover on 2020-03-18 19:47
The challenge for quant strats is that the co-movements are no longer being driven by risk-on/risk-off, but rather the desperate need for liquidity from the real economy. Financial sentiment is taking a back seat to distressed actors liquidating financial assets to cover operating cash flow shortfalls. Retail investors have always buffered selloffs in the past decade by buying the dip. Now many of them have real businesses on the verge of missing payroll.

Safe-haven assets are getting liquidated right along with everyone else. That's why you're seeing selloffs in both treasuries and equities. Betas are getting compressed to 1. Liquidating utility stocks frees up just as much cash as semiconductors. That's probably fucking up most stat-arb models, because without context beta compression looks a lot like idiosyncratic dispersion.
Mar 2020 Performance
Posted by EspressoLover on 2020-03-23 18:38
Second that. I can say from personal ancedata, that at least in one relatively inconsequential corner of the global markets, HFT has been a rocket ship the entire month. Both market making and liquidity-taking strats have done very well.

HFT tends not to take large positions, even on a gross basis. So large adverse moves don't really cause significant damage. In a similar vein Sharpe ratios tend to be very high. For most strats, if alpha doubles but volatility quintuples, that's a recipe for forced deleveraging. In HFT space that just means your 20 Sharpe strategy becomes 8 Sharpe with twice the PnL. Another point from an ecosystem level: whereas quant portfolios tend to crowd into the same positions, HFT market makers, if anything, have anti-correlated positions. Only one algo can capture the front of the queue on any given book at any given time. So, even if one HFT portfolio is forced into liquidation, it's unlikely to trigger systematic deleveraging.

It could be my personal biases, but one thing that seems to distinguish the stat-arb desks that are weathering this relatively better is the ability to "speed up". Stealing a quote from Lenin: there are decades where nothing happens, and weeks where decades happen. You may get used to rebalancing once a day, because that's close to optimal in normal conditions. But in this regime, if you don't trade faster to match the market's cadence, it means that you're eating all the extra volatility, but missing most of the alpha opportunities in the whipsaws.

That said, what I'm really curious about is Medallion's numbers. I think it'll shed a lot of light into the debate about what really drives the fund.
Mar 2020 Performance
Posted by EspressoLover on 2020-03-27 19:49
I feel like if I was a firm the size of Winton, I'd always keep a tiny fund seeded that's the exact inverse of my main fund.

That way, any time you sustain big losses, you can always say "some of our funds did poorly, but others had very good performance over the period." And of course don't point out that the latter has about 1/1000 the AUM as the former. Seems like a cheap hedge in terms of PR, marketing, and branding.
Crude Oil
Posted by EspressoLover on 2020-04-23 14:52
@gaj

That's funny. That was exactly my first thought when I first heard about it.

In addition to the denominator issues, a lot of low-latency C-style systems will use 0 or -1 to mark a field as N/A, since it saves cache lines by avoiding a std::pair. The one redeeming thing is that since this is commodities space, most participating firms probably trade calendar spreads, so their systems might be equipped for negative prices.

I would hope any systems not designed to deal with zero or negative prices had their circuit breakers puke as soon as they saw a bad entry on the MBO data feed, but circuit breakers are rarely a well-considered part of the ATS tech stack.
Mar 2020 Performance
Posted by EspressoLover on 2020-04-23 18:14
I don't really think it makes any sense to believe that the bulk of their alphas comes from providing liquidity in the sense of being an HFT market maker.

For one thing the PnL is too large. Virtu is one of the largest market makers, and its annual trading revenue is around a $1 billion. Medallion's already made $4 billion in the first three months of the year. Second, they were highly profitable with similar characteristics well before the era of electronic trading. Finally you don't see RennTech running the business lines that you'd expect from a market maker. They don't have an internalization pool, they don't clear their own trades, you usually don't see them as exchange members, they use external brokers.

But I think it's worth it to ask what does it actually mean to "provide liquidity". I'd characterize a market as liquid when it has the ability to absorb imbalances in order flow without significant dislocations to the price discovery process. Therefore I'd say that in an economic sense a liquidity provider is a participant, who's demand schedule is elastic relative to non-informative changes in the price.

A market maker is the extreme example of this. A one penny difference in price literally makes the difference about the entire direction of their next trade. But there's still ways to provide liquidity outside this narrow definition. For example vanilla pairs trading is mostly about providing liquidity. If XOM rises 1.0%, the pairs trader will buy BP at 0.9% and sell it at 1.1%. The pairs traders' demand schedule is not as elastic as the market maker's, but still very elastic relative to the market.

There's also the wrinkle that liquidity provision can vary based on time horizon. Consider someone rebalancing a large portfolio using a very patient passive execution algo. Those resting limit orders provide liquidity at the short-horizon. On a 60-second basis, the fills are mostly occurring in the opposite direction of the market's order flow. Yet on longer horizons, they provide no liquidity. Past 24 hours the demand schedule is completely pre-determined and perfectly inelastic to price.

I'd guess that the bulk of RennTech's strategies generate alpha by providing liquidity at horizons/imbalances that are just outside the purview of market makers and HFTs. You have to remember how little inventory HFTs actually hold. If there's a 2-sigma deviation to the rolling hourly order flow, there's no way the HFTs can absorb that.

There's some characteristics you'd expect from that type of trader. First, Sharpe ratios would still be much higher than the market, but unlike HFTs they have to take meaningful risk. So they'd still having losing days, weeks or even months. Second the margins per dollar traded wouldn't be as tight as market makers, so they'd expend less effort on things like trying to clear their own trades to scrape fractions of pennies. Third the magnitude of their profits, though more volatile, would be larger than market makers. Larger positions over longer horizons leaves more alpha to collect.

Finally they'd need to combine both statistical acumen and the technical agility to efficiently process huge and complex datasets. HFTs can largely ignore the former. It's pretty easy to verify that a 20-Sharpe strategy either works or doesn't work. For the latter, if you're Winton and targeting monthly alpha, the technical challenges are pretty minimal. Competing in this space requires execution excellence in two relatively disparate cultures. In most firms, usually either the statisticians or the engineers wrestle control and relegate the other side to a neglected cost center. If you develop a corporate culture that threads this needle, I think that's a real moat that might explain how a single fund's dominated its competitors for three decades.
Bitcoin
Posted by EspressoLover on 2020-05-12 20:45
Can the resident crypto experts weight in. Is there really any fundamental justification for why halving is bullish for BTC (besides simply the fact that people believe its bullish)?

My understanding is that the current rate of seniorage is basically de minims at this point. At least relative to the exchange rate volatility.
Bitcoin
Posted by EspressoLover on 2020-05-15 17:00
Thanks, Elain. That makes a lot of sense.

But some back-of-the-envelope math makes it seem like it wouldn't have any serious impact. (And please correct any misunderstandings I make in the following.) There's 18 million BTC in the current supply. The halving reduced the seignorage rate from 1800 BTC/day to 900 BTC/day. At monetarist equilibrium, that implies the annual inflation rate fell from 0.1% to 0.05% a year.

A 5 basis point reduction in inflation would not seem to meaningfully change the market value of BTC as a store of value. Annualized volatility on BTC/USD is on the order of 50% or more. As an investment vehicle, halving increases the long-run expected Sharpe ratio by less than 0.001.

As an analogy, it'd be like Vanguard lowered the fees on its index fund from 0.05% to 0.04% and for some reason people were expecting that to double the value of the S&P 500. Am I missing something here?
Decreasing forum activities
Posted by EspressoLover on 2020-05-25 16:29
There's a subreddit now at /r/nuclearphynance.

It's inactive now, but in case the forum hosting fails at some point in the near future, we can use it as an ark. Figure eventually it can also be a funnel to bring young blood into the community.
momentum crash june 2020
Posted by EspressoLover on 2020-06-09 14:48
This was basically Moskowitz's hypothesis back from his paper on momentum crashes in 2013. The idea being that the short leg of (cross-sectional) momentum does particularly bad during the recovery phase of crises. I'm not sure if this applies to June 2020, but just eye-balling the market it seems to be the case.

The general idea is that during following market upheaval, the worst losers tend to embed a "crisis risk premium". It took a lot of balls to buy CCL in mid-April, in a way that's not fully reflected by its CAPM beta. In normal times, MOMO is primarily capturing an idiosyncratic anomaly related to price discovery in single-names. (Either under-reaction or over-reaction or whatever the story of the month is.)

But during a crisis, particularly on the loser-side, that effect gets overwhelmed by general exposure to a non-linear crisis factor. In normal times most of the losers have their own unique, stock-specific stories. In 2020, most of the losers were losers primarily because they were the names with the highest exposure to Covid. The upshot is that the basket becomes a lot less diversified, and therefore more volatile. Two is that you're effectively reversing Baron Rothschild's advice by selling when there's blood in the street.

One thing that I wish existed, which would be super-interesting to observe, would be a liquid options market on style factors. It'd be pretty interesting to see how the "Momentum-VIX" changes over time.
momentum crash june 2020
Posted by EspressoLover on 2020-06-30 20:43
IMO, there's a lot of window-dressing effects at play. Nobody wants to be the fund manager holding airline stocks in the middle of a pandemic lockdown. Especially when your performance is already deep in the red YTD. Your investors have had enough of your shit, and now's not the time to act cute and make bold bets. At best you look reckless, at worse you look clueless.

Secondarily I think there are also behavioral dynamics at play. Especially fragmented-self and regret minimization. Someone who behaves in a socially proscribed manner, such as buying CCL in the middle of a pandemic, will feel especially remorseful if/when they get burned. People who lose money, but in conventional and socially approved vehicles, typically feel much less regret. At its core, human decision making exists to serve tribal apes.
Is anything interesting happening in quant finance right now?
Posted by EspressoLover on 2020-09-14 15:32
Bump. In the past two years how has this discussion changed?

This thread brought up a number of emerging areas. Crypto, alt data, consumer facing fintech, systematic macro, Bayesian DCF, AI/ML everywhere all the time. What can we say about this stuff with a couple years of hindsight? What panned out? What disappointed? What worked, but only in an unpredictable way? What managed to stand the test of Covid volatility?

But it’d also be great to hear about anything new that *wasn’t* on the radar back in 2018. Surely the past six months must have catalyzed some innovations and shifts. Alpha from exploiting Robinhood muppets? Structured credit’s response to rising defaults? Financial innovations to deal with the extreme unpredictableness of real estate? DeFi's as the next frontier in crypto?
Is anything interesting happening in quant finance right now?
Posted by EspressoLover on 2020-09-14 21:07
@Sharpe

It was brought up by @mtsm as a brief point in the first page. Don't know anything about it besides that. It's funny, because I had the exact same question as you back then.
Stock market and election
Posted by EspressoLover on 2020-08-28 15:19
In 2016 there was a strong correlation between the Mexican Peso and Trump's odds in the betting markets. Doubt if it still holds to the same degree this election cycle. Trump seems to get along okay-ish with AMLO. But you could start with historical Betfair data, then try to find the best proxy in the liquid financial markets. (I bet post-IPO Palantir will work.)

The challenge with this election is the big source of instrumental variable bias in the form of Covid. It makes it hard to disentangle the direction of causation. If Covid gets worse (better), that's bad (good) both for the market and Trump's prospects. If you don't take that into account, you'd over-attribute the elections impact on the market. This is always kind of a baseline issue in any re-election. The economy affects both the incumbents chances and the market. But there's never been this huge amount of economic uncertainty on a two-month horizon. Modeling 2020 poses a particularly nasty three-body problem.

From what I've been reading, the election the market's watching isn't the White House but the Senate. As long as Congress is gridlocked, there probably won't be much difference between a Trump and Biden admin. Unlike Bernie or Warren, Biden's unlikely to aggressivley use executive power to push economic policy.

If Democrats control both chambers and the White House, then it's almost certain that capital gains and corporate taxes will see significant hikes. Depending how big the blue wave is, maybe even tech antitrust, FTTs, and some sort of watered-down M4A or GND. A lot of analysts seem to be saying that the best case is Biden with a Republican Senate, because you avoid both progressive legislation and Trump's twitter trade tantrums. Since that's the most likely outcome, in some sense the market may be short "election gamma". A big disruption in either side's favor could impair returns.

All this being said, in 2016 the pre-election behavior of the market totally contradicted post-election realization. The S&P 500 was pretty strongly negatively correlated with Trump's odds. For example you see a big dip the week before the election when Comey announces the re-opening of the email probe. And even the night of the election, as Trump's odds kept rising the index futures started crashing, until suddenly they reversed, rallied hard and basically kept going up and up for the next two years.

I just don't know how efficient markets are for these type of one-off events. Elections don't happen frequently enough to enforce discipline. A lot of it is just traders projecting their pre-existing biases. And they rarely pay a big enough penalty when they get it wrong.
Stock market and election
Posted by EspressoLover on 2020-11-08 02:37
>  I bet post-IPO Palantir will work.

Well..., I was super-wrong about this.
Know Your Counterparty: Ethics and Market Making
Posted by EspressoLover on 2021-04-16 20:21
Hi @riskparity. Welcome to the forums.

Your blog is probably of interest for some of the users here, but you're being a little bit spam-y. It's not necessary to repost the same topic in multiple boards. Virtually everyone here who reads one board, reads all of them.

Also instead of posting a new topic everyday with a separate link, please consolidate into a single post. Especially if it's just a naked link without any discussion. The forum doesn't have that much traffic, and your links will definitely not fall off the front page in the span of a day.
Decaying phorum activity - last parting gifts for the next generation
Posted by EspressoLover on 2021-06-09 23:58
Just spitballing here, wonder if it's feasible to create a proxy site that sits as a thin wrapper around the forum with a modern spam filter. Basically pass through the session and content, but filter out any posts or threads that code spam.

That way, we wouldn't actually need to update nuclearphynance.com itself. Create nuclearphynance.[tld] and filter at L2. No need to move over users. Same conversations, just two different frontends.
100&Change
Posted by EspressoLover on 2016-06-08 18:26
Anything other than undoing antibiotic resistance is the wrong answer.
Hotels in NY
Posted by EspressoLover on 2017-01-03 04:45
Wyndham Midtown is good. New York Palace is very nice and frequently has good rates. The W Downtown is also worth considering. Even though it's far downtown (right across from the Freedom Tower), it's like the infinity point in affine geometry. All the major MTA lines converge within walking distance. Even though you travel far to go uptown, you never have to worry about the crosstown pain in the ass. Plus this is 2017, you're going to want to spend a good amount of time downtown anyway.

Honestly though, the best deal is probably AirBnb. You can get a newish, luxury-ish condo with twice the space at nearly half the cost as a four-star hotel.
Best Movie One Liners
Posted by EspressoLover on 2017-01-14 06:45
TIL: The line in the book was "I want to have your abortion". The studio wanted to change it for being too offensive. Fincher caved, but only on the condition that they had to unconditionally accept the replacement.
Very Interesting paper on how we don't know shit about how deep ANNs are working
Posted by EspressoLover on 2017-10-02 17:44
Really interesting paper. Thanks for posting.

Intuitively, I'd speculate that this is due to the greedy layer-wise training in deep nets. The more incrementally piecewise a model is trained, the less applicable VC dimension analysis is. The full parameter space of the model can shatter the set, but deep nets aren't trained by arg-max'ing everything at once.

I'd imagine that if you think of each layer training step as a standalone problem, then the generalization error is well behaved. The "signal gradient" at each step is much larger than the "noise gradient", so continuously repeating training steps takes us in the right direction. Signal-driven variations in the fitness landscape are usually much more stable and smooth than noise-driven variations. However if the training set is pure noise, the "signal gradient" disappears and eventually the model converges to fitting the noise.

I don't think the underlying mechanics are too different from vanilla boosting like AdaBoost. Similarly you can use very large models, large enough to shatter the training set. And in the case of pure random data, the model will eventually completely fit the training data. Yet in most real world applications, AdaBoost is surprisingly resilient to overfitting. Training incrementally and greedy-wise seems to siphon off signal from noise.

All of this is just random speculation, of course. So, who knows.
Spam Bots
Posted by EspressoLover on 2018-04-26 09:48
The phorum seems to be gradually invaded by more and more spam bots. They just jumped from the General to Trading, and pretty much render a board non-usable since their posts obscure most of the first page.

I'd hate to see such a great community get destroyed by some bots. But I suspect that if they continue, enough people will stop bothering avoiding the spam posts and the phorums will just die. Is there anything we can do. For my part, I check the board fairly regularly and am willing to do clean up if given the privilege to delete threads.
What does Peter Thiel try to achieve by attacking google/alphabet
Posted by EspressoLover on 2019-08-02 17:40
Like Arnold, Thiel is constitutionally prohibited from holding executive office because he's not a natural born citizen. Absent a constitutional amendment, which is basically impossible at the current juncture in American politics, he can't be president.
Solution for Spambots
Posted by EspressoLover on 2019-09-01 19:34
Saw an interesting suggestion on HN about dealing with spambots.

You just add a hidden "link" or "URL" field to the message page. If the field is non-empty, then just send the post to /dev/null. It won't affect humans, because the field's hidden in the HTML. But most spam bots will just blindly spam their campaign links in any relevant field. Best of all, the change can be made with nothing more than a few lines of PHP with no external service dependency.

A lot of the HN posters said that if you're running a mid-sized web forum, the trick stops 90%+ of spam.
COVID19
Posted by EspressoLover on 2020-03-15 22:46
To preface, I think the virus is really potentially horrific in terms of human costs. Very likely the worst global public health crisis since the 1918 flu.

But it's hard for me to imagine the S&P being below 2500 in two years. A V-shaped recovery, while not guaranteed, seems like by far the most likely outcome. I'm curious what others see as the specific drags on the economy after the quarantine and first wave is over? AFAIK the 1918 flu did not produce any long-lasting economic or financial malaise. Generally I believe that the current selloff is driven by short-term liquidity demands (e.g. a lot of business owners need to make payroll on zero revenue), rather than true revisions to valuations.

2008 doesn't seem analogous. In that case, there was genuinely unsustainable economic activity: over-construction of residential housing and hyper-exponential consumer spending powered by rising home prices. The financial crisis was the proximate cause of the recession, but ultimately growth couldn't bounce back because those construction jobs and sub-prime home equity loans never came back. I don't really see any equivalently massive distortion in today's economy. (Though I could be wrong...)

Corporate debt's probably the weakest pillar in the system. If I had more time to dig into stock selection, my metric would be which names can best survive having zero operating cash flow and zero access to re-financing over the next four quarters. But my sense is that most of the excesses are confined to the energy sector or private market. Large cap US equities, I think are actually pretty stable. And banks are *much* better capitalized than they were in 2008.

The entities that would take serious losses aren't as interconnected as the culprits in 2008, and therefore less likely to spread contagion (of the financial sort). If Softbank or Blackstone blow up, I don't think that's going to ripple through Main Street in the same way that Citigroup did. (Life insurers could be an overlooked systematic risk, because they'd get hit both from obvious mortality spikes as well generally high exposure to low-quality credit and CLOs.)

To add a caveat, I'm not a macro guy. I don't even play one on TV. So take this opinion with a grain of salt. But for now, I'm shrugging off my naturally bearish disposition to buy while there's blood in the street.
COVID19
Posted by EspressoLover on 2020-03-16 19:35
> Yeah, because all that Fed repo money is going to mom & pop stores, local bar & grills, and the gyms that are gonna have to be closed.

I disagree. First, I think we're all suffering from a visibility bias. Industries like restaurants and stores are very visible, but don't make up that significant a chunk of economic activity. "Food away from home" only makes up 5% of US consumer expenditures (and nearly half is already not in-restaurant sales). The entire fitness industry only makes up 0.2% of GDP. The "Fees and admissions" sub-category of entertainment only makes up 1.3% of household expenditures. Only 3% of household expenditures are spent on travel.

At the end of the day, big impersonal business is a much more important part of the economy than small retail businesses. Construction, healthcare, and information technology accounts for much more economic activity than restaurants, cafes or gyms.

Second, there's definitely a clear transmission mechanism between monetary stimulus and small business. It's called mortgage rates. Most restaurants and brick & mortar retail businesses are essentially real estate companies. If the cost of mortgage capital falls by 25%, that's an enormous stimulus to the average restauranteur or shop-keeper. Particularly when liquidity constrained during a period of diminished revenue. Less cash flow to the mortgage or lease, means more cash available to make payroll. 

COVID19
Posted by EspressoLover on 2020-03-18 23:17
Research conducted independently seems to keep reaching the conclusion. Chloroquine's pretty damn close to a cure, as long as its administered early enough. It's also a simple molecule, that should be easy to ramp up production on.

I think that means the pandemic's probably winding down in 6-10 weeks. The main bottleneck is either getting enough testing capacity to quickly intervene at the first sign of symptoms. OR producing enough chloroquine that we can just have the entire vulnerable population take it prophylactically.

(Also, FWIW results from WHO show that as little as 30% alcohol is effective at sanitizing Covid-19. So it anybody's short on hand sanitizer, feel free to throw some cheap booze in a spray bottle.)
COVID19
Posted by EspressoLover on 2020-03-18 23:44
The Chinese have a had a number of successful trials. The French have had good results (especially combined with Azithomycin) both in terms of reducing symptoms and infectiousness.

There's also a lot of in vitro results that are pretty indisputable. (Which of course is no guarantee in actual treatment.) And of course, there's the large body of pre-existing research showing very high effectiveness against related coronaviruses like SARS and MERS.
COVID19
Posted by EspressoLover on 2020-03-19 17:48
@kitno

You left off Remdesivir.
COVID19
Posted by EspressoLover on 2020-03-20 06:00
We had two bags of lopinavir, seventy-five pellets of tamiflu, five sheets of high powered chloroquine, a salt shaker half full of azithromycin, and a whole galaxy of multi-colored antibiotics, antiemitics, antitussives, antihistamines... and also a quart of pepto-bismol, a quart of paracetamol, a case of Sudafed, a pint of ibuprofen and two dozen oral rehydration packets.

The only thing that really worried me was the ibuprofen. There is nothing in the world more helpless and irresponsible and depraved than a man in the depths of a cytokine storm.
COVID19
Posted by EspressoLover on 2020-05-25 00:15
One problem with simply looking before and after a change in lockdown orders, is that it doesn't seem like the timing or details of lockdown orders actually changed people's behavior that much. When you actually look at the cell phone tracking data, most people started staying home at some point in March. But there were no discontinuities on the actual date that local or national lockdown orders were put in place.

Facebook's tracking data seems to confirm a roughly similar result. We can also compare across cities, for example Stockholm to Nashville Apple's data show almost identical stay-at-home behavior. Despite the former having one of the most permissive lockdown orders in the Western world, and the latter one of the strictest. It seems pretty much like you have a population of risk-averse people who were going to stay-at-home even if the government didn't explicitly tell them to, and a population of risk-tolerant people who probably aren't scared of legal repercussions anyway. There's not that many people on the margin, who aren't worried about infection but fear getting arrested.

The point is that it doesn't seem plausible that lockdowns explain any more than a small fraction of the variance, either cross-sectional or time-series, in transmission rates. With that in mind, it's pretty implausible to just compare transmission rates between two points in time or space, and ascribe the differences to lockdown status. Almost certainly there are other factors (including just randomness) that overwhelm the impact of lockdown policy.
COVID19
Posted by EspressoLover on 2020-06-03 06:39
Here's my amateur thoughts on the bull case. The duration of major equity indices is 25+ years. Only a small fraction of most large-caps' NPV come from earnings over the next 4-8 quarters.

Let's assume a hard lockdown continues until the end of 2021, and corporate are zero until Q1 2022. How much should that impact valuations? How much should we discount stocks relative to their January highs? Based on DCF analysis, I think the answer is about 5-15%. Which implies that current prices are about fair. And that's not even taking into account higher P/E multiples that come with the reduction in the risk-free rate that almost assuredly accompanies that scenario.

The logical question is why doesn't this same reasoning apply to all recessionary bear markets. And I think the answer is that the typical recession is also accompanied by a clearing of mal-investment accumulated during the expansionary phase. The hangover not only comprises the loss of earnings during the recession, but the necessary trimming of fat from all the poor decisions that were made during the ebullience of the late-stage boom.

In 2007, the economy was allocating way too much capital to the finance sector, particularly mortgages, home building, raw materials, and unsustainable consumer spending powered by home equity. Citigroup, even today is still 90% below its 2007 highs. In 1999 the equity markets was in retrospect obviously throwing away capital into the giant money pit of the Internet bubble. In 1980, there were many business models and capital structures that were wiped out because they made sense at 8%, but not 16% yields.

This current recession seems very different. At one point the lock downs will be over, and there's not really any reason the economy and asset markets can't return to status quo antebellum. The recession wasn't caused by irrational exuberance, but an exogenous and temporary supply shock. Business models that worked in Q4 2019, should still work in Q1 2022. There might be some exceptions, where the culture kind of shifts permanently. Like air travel or commercial real. But by and large these sectors don't constitute a large share of large-cap US equities. I don't really see any 2020 equivalent to Citigroup or Cisco or Continental Illinois.
COVID19
Posted by EspressoLover on 2020-06-04 21:11
@gaj

That's probably my biggest bearish risk as well. That being said, if we're specifically talking about the impact of Covid, I don't think the debt picture changes that much.

CBO projections say that COVID adds $5 trillion to US public debt by 2025 (with most of that occurring in 2020). So public debt to GDP goes from ~90% to ~105%. That's a big one-time jump, but doesn't change the long-term picture that much. Even before Covid, the CBO projected that number to be 180% by 2050.

What I'm saying is there are two separate questions. One is should the market be trading at a more significant discount relative to February highs? Two is was the price in February too high irrespective of Covid?

I don't really know the answer to the latter question. There are a lot of arguments on both sides. But I don't see many compelling reasons why the Covid discount to the fair price (whatever that may be) should be more than 10-15%.Maybe, maybe, maybe, if one thinks there's underlying rot in the system and Covid will catalyze a crisis that otherwise would stay dormant for a long time.

To tangent off on the debt/GDP question in general, the most compelling reason not to worry is Japan. At least in terms of forced deleveraging Despite having more than double our debt levels for several decades, interest rates are still zero, inflation is de minims, the currency is stable, and there's no fiscal or monetary crisis in sight. There's pretty good macroeconomic evidence that the decline in real interest rates is primarily driven by aging demographics.

The demographic picture isn't changing anytime this century. That augurs that real rates will stay zero-ish for a very long time. What really matters in determining the debt burden isn't the nominal value but the coverage ratio. Permanent zero rates would imply that the economically sustainable debt burdens is much higher than the historical 100-150% upper bound that we've seen in earlier eras.
Advances in linear algebra, information theory, signal processing, optimal control and statistics
Posted by EspressoLover on 2020-09-09 14:34
My vote would be for compressive sensing. In terms of finance applications, I think there's a lot you can do with sparse factor decomposition.

When it comes to equity models, we know that traditional PCAs crap out after four or five eigenvectors. Most of the remaining covariance structure is probably sparse, which isn't suited for traditional decomposition techniques. So we hand-encode factors like industry and country exposure. I think compressive sensing allows you to recover much more of this structure in a fully systematic way.

Along these lines, another cool thing is the ability to recover sparse factors from much smaller datasets. You could potentially build a trading strategy around detecting short-lived cointegrations caused by large-scale portfolios rebalancing their positions.