Research on a stock-matching trading strategy based on bi-objective optimization

In recent years, with strict domestic financial supervision and other policy-oriented factors, some products are becoming increasingly restricted, including nonstandard products, bank-guaranteed wealth management products, and other products that can provide investors with a more stable income. Pairs trading, a type of stable strategy that has proved efficient in many financial markets worldwide, has become the focus of investors. Based on the traditional Gatev–Goetzmann–Rouwenhorst (GGR, Gatev et al., 2006) strategy, this paper proposes a stock-matching strategy based on bi-objective quadratic programming with quadratic constraints (BQQ) model. Under the condition of ensuring a long-term equilibrium between paired-stock prices, the volatility of stock spreads is increased as much as possible, improving the profitability of the strategy. To verify the effectiveness of the strategy, we use the natural logs of the daily stock market indices in Shanghai. The GGR model and the BQQ model proposed in this paper are back-tested and compared. The results show that the BQQ model can achieve a higher rate of returns.


Introduction
Since the A-share margin trading system opened in 2010, there has been a gradual improvement in short sales of stock index futures (Wang and Wang 2013) and investors are again favoring prudent investment strategies, which include pairs-trading strategies. As a kind of statistical arbitrage strategy (Bondarenko 2003), the essence of pairs trading (Gatev et al. 2006) is to discover wrongly priced securities in the market, and to correct the pricing through trading means to earn a profit from the spreads. However, with the increase in statistical trading strategies and the gradual improvement of market efficiency (Hu et al. 2017), profit opportunities using existing trading strategies have become more scarce, driving investors to seek new trading strategies. At present, academic research on pairs trading has mainly concentrated on the construction of pairing models and the optimization design of trading parameters, with a greater focus on the latter. However, merely improving trading parameters does not guarantee a high return for the strategy, and this drives researchers back to the foundations of the pairs-trading model.
There are three main methods for screening stocks: the minimum distance method, the cointegration pairing method, and the stochastic spread method. The minimum distance method was proposed by Gatev et al. (2006)-hence its common name, the GGR model. Gatev et al. (2006) used the distance of a price series to measure the correlation between the price movements of two stocks. When making a specific transaction, the strategy user determines the trading signal by observing the magnitude of the change in the Euclidean distance between the normalized price series of two stocks (the sum of the squared deviations, or SSD). Perlin (2007) promoted GGR as a unitary method rather than a pluralistic one; testing it in the Brazilian financial market, he found that risk can be lessened by increasing the number of pairs and stock. Do and Faff (2010) found that the length of a trading period can affect strategy returns; their study laid the foundation for later research. Jacobs and Weber (2011) found that the GGR model's revenue comes from the difference in the speed of paired-stock information diffusion. Chen et al. (2017) revised the measurement method of the GGR model, changing the original measure (SSD) to the correlation coefficient, and increased the reliability of the multi-pairing strategy. Wu and Cui (2011) first applied the GGR model to the A-share market; conducting a back-test on the stock markets in Shanghai, they found that the GGR model can generate considerable returns, and its profits come from a market's non-validity. Wang and Mai (2014) measured the return on stock markets in Shanghai, Shenzhen, and Hong Kong respectively, and found that improvements to the original approach can bring portfolio construction strategic benefits but can also increase the risk of exploitation of the GGR model.
The cointegration pairing method was first used by Vidyamurthy (2004) to find stock pairs with a cointegral relationship. He used cointegrating vectors as the weight of pairs when trading. To solve the problem of single-stock pairing risks, Dunis and Ho (2005) extended the cointegration method from unitarism to pluralism and proposed an enhanced index strategy based on cointegration. By extracting sparse mean-return portfolios from multiple time series, D' Aspremont (2007) found that small portfolios had lower transaction costs and higher portfolio interpretability than the original intensive portfolios. Peters et al. (2010) and Gatarek et al. (2014) applied the Bayesian process to the cointegration test and found that the pairing method can be applied to highfrequency data.
The stochastic spread method first appeared in a paper by Elliott et al. (2005), who used the continuous Gauss-Markov model to describe the mean return process of paired-stock spreads, thus theoretically predicting stock spreads. Based on the research by Elliott et al. (2005). Do et al. (2006) first linked the capital asset pricing model (CAPM) with the pairs-trading strategy and achieved a higher strategic benefit than when using the traditional random spread method. Bertram (2010), assuming that the price differences of stock obey the Ornstein-Uhlenbeck process, derived the expression of the mean and variance of the strategic return on the position and found the parameter value when the expected return was maximized.
Based on above approaches, many scholars have begun to study mixed multistage pairing-trading strategies. Miao (2014) added a correlation test to the traditional cointegration method and found that screening stock-correlation analysis improved the profitability of the strategy. Xu et al. (2012) combined cointegration pairing with the stochastic spread model and conducted a back-test on the stock markets in both Shanghai and Shenzhen; they found that higher returns could be obtained. Following Bertram's (2010) research, Zhang and Liu (2017) examined a pairs-trading strategy based on cointegration and the Ornstein-Uhlenbeck process and found the strategy to be robust and profitable.
In recent years, most scholars have focused on improving the long-term equilibrium of paired-stock prices in the stock-matching process continuously. Few studies have considered the short-term fluctuations of paired-stock spreads, which has led to poor profitability of the strategy. Therefore, this paper focuses on the stock matching of pairs trading and constructs a bi-objective optimized stock-matching strategy based on the traditional GGR model. The strategy introduces weight parameters, conducts long-term stock price volatility spreads, and adjusts the equalizer to match investors' preferences, enhancing the flexibility and practicality of the strategy.
The remainder of this paper is organized as follows. Basic theory and model section provides the basic theories and models of pairs-trading strategies and double biobjective optimization. Optimized pairing model section establishes an optimized pairing model. Pairing strategy empirical analysis section provides an empirical analysis of the optimal matching strategy proposed in this paper. Finally, Conclusions section presents conclusions and suggests future research direction.

Basic theory and model
Based on theories of pairs trading, stock-pairing rules in the minimum distance method, and multi-objective programming, we propose a strategy to improve profits based on the minimum distance method.

Pairs-trading parameters
Using a pairs-trading strategy requires a focus on the following trading parameters: Formation period: the time interval for stock-pair screening using the stock-matching strategy. Trading period: the time interval in which selected stock pairs are used for actual trading. Configuration of opening: the value of the portfolio construction triggered. For example, we can start a transaction by satisfying the following conditions: (1) The user is in the short position state; (2) the degree to which the paired-stock spread deviates from the mean changes; and (3) the degree changes from less than a given standard deviation to more than a given standard deviation. Closing threshold: the value of the position closing triggered. For example, when the strategy user is in position and the paired-stock spread hits the mean. Stop-loss threshold: the value of the stop-loss triggered; that is, when the rules are engaged for exiting an investment after reaching a maximum acceptable threshold of loss or for re-entering after achieving a specified level of gains.

Minimum distance method
When using the minimum distance method to screen stocks, it is necessary to standardize the stock price series first. Suppose the price sequence of stock A in period T is P A i ði ¼ 1; 2; 3; …; T Þ; r A t is the daily rate of return of stock A. By compounding r, we can get the cumulative rate of return of stock A in period T, which is recorded as: where t = 1, 2, 3, …, T. When we record the standardized stock price series as SP A t , the distance SSD of each two-stock normalized price series can be calculated as follows (Krauss 2016): Multi-objective programming The multi-objective optimization problem was first proposed by economist Vilfredo Pareto (Deb and Sundar 2006). It means that in an actual problem, there are several objective functions that need to be optimized, and they often conflict with each other. In general, the multi-objective optimization problem can be written as a plurality of objective functions, and the constraint equation and the inequality can be expressed as follows: where, x ∈ R u , f i : R n → R(i = 1, 2, ..., n) is the objective function; and g i : R n → R and h i : R n → R are constraint functions. The feasible domain is given as follows: If there is not an x ∈ X, such that then x * ∈ X is called an effective solution (Bazaraa et al. 2008) to the multi-objective optimization problem.

Optimized pairing model
Previous studies on the GGR model have mostly focused on similarities in stock trends and have cared less about the volatility of stock spreads. Such studies could not present ways to achieve higher returns. This paper, however, is based on the traditional GGR model, and can thus propose a new pairs-trading model, namely bi-objective quadratic programming with quadratic constraints (BQQ) model. By adjusting the weights between maintaining a long-term equilibrium of paired-stock prices and increasing the volatility of stock spreads (Whistler 2004), we can achieve equilibrium.

Mean-variance minimization distance model
Assume that there are m stocks in the alternative stock pool, and the formation period of the stock pairing is n days. Take the daily closing price of the stock as the original price series, recorded as P 1 , P 2 , ⋯, P m . To make the price sequence smoother, we use the average price series over the past 30 days: P 1 ; P 2 ; ⋯; P m (instead of the original price series), to eliminate short-term fluctuations in stock prices. Then, in the moment, t can be expressed as follows: First we consider P α i P i .
Let α be the weight of the stock in the stock pool, and then let Then, we divide the stock into two groups according to the positive and negative weights. The stock combination with a positive weight is called P þ t , while the stock combination with a negative weight is called According to the GGR method, as long as we are in the formation period n, we can consider that the groups' prices have to represent a long-term equilibrium relationship. Therefore, we get the bi-objective optimization model as follows: The volatility of the paired-stock spread is a source of revenue for the pairs-trading strategy. Variances are used to describe the volatility of a time series. Therefore, we use the formula below to measure the stock spread: Avoiding the case that α = 0, we increase the regularity constraint; that is, the secondorder modulus is 1, so we can obtain the BQQ model as: s:t: This paper uses a linear weighting method by introducing weight λ(λ > 0), transforming the bi-objective optimization problem into a single-objective optimization problem. The model is denoted as revised quadratic programming with quadratic constraints (RQQ): s:t: Since users of the matching strategy have different risk preferences, λ can be seen as an important indicator of strategic risk. When λ is large, the model magnifies the volatility of the paired-stock spread sequence, and the strategy may obtain higher returns, but it also raises the risk of divergence in the stock spread. Therefore, users can adjust λ to match their risk preferences, which increases the usefulness of the pairing strategy.
To facilitate the model solution, we perform matrix transformation as follows: That is, For a given α k , we get the sub-problem of the model as this: The sequential quadratic programming algorithm Since the objective and constraints of RQQ are quadratic functions, these are typical nonlinear programming problems. Therefore, the sequential quadratic programming algorithm can solve the original problem by solving a series of quadratic programming sub-problems (Jacobs and Weber 2011;Zhang and Liu 2017). The solution process is as follows: Step 1: Give α 1 ∈ R m , ε > 0, μ > 0, δ > 0, k = 1, B 1 ∈ R m × m .
Step 2: Solve sub-question sub(α k ), and we get its solution d k and the Lagrange multiplier μ k in the case of |d k | ≤ ε, terminating the iteration; therefore, let s k ∈ [0, δ] and μ = max (μ, μ k ). By solving this: we get s k, where ε k (k = 1, 2, ⋯) satisfies the non-negative condition and Equation (21) is the exact penalty function.
Thus, we find the optimal sub-solution d k . Make d k the search direction and perform a one-dimensional search in direction d k on the exact penalty function of the original problem; we get the next iteration point of the original problem as α k + 1 . The iteration is terminated when the iteration point satisfies the given accuracy, obtaining the optimal solution of the original problem.

Pairing strategy empirical analysis
To verify the profitability of the BQQ strategy, this paper compares the empirical investment effects of the BQQ strategy and the GGR strategy with the same transaction parameters and applies a profit-risk test for the arbitrage results of the two strategies.

Data selection and preprocessing
We use SSE 50 Index constituent stocks in the Shanghai stock market as the sample set for this study. We choose this sample set for its high circulation market value and large market capitalizations. Since the stock-pairing method proposed in this paper is based on an improvement of the traditional minimum distance method, this is consistent with the GGR model in the time interval selection of the sample: The paired stocks for trading are selected during the formation period of 12 months, and the stocks are traded in the next 6 months. To verify the effectiveness of the strategy, the paper conducts a strategic back-test from January 2016 to December 2018. Within the period, the broader market experienced a complete set of ups and downs.
Due to the existence of share allotments and share issues by listed companies, and because the suspension of stocks will also lead to a lack of market data, the raw data needs to be preprocessed. By reversing the stock price forward, the stock price changes caused by the allotments and stock offerings are eliminated. In addition, we exclude stocks that have been suspended for more than 10 days in the formation period. These missing data are replaced by the closing price of the nearest trading day.

Transaction parameters setting
The implementation of a pairs-trading strategy relies on setting trading parameters. To compare this strategy with the traditional minimum distance method and verify the validity of the BQQ strategy, this paper uses the same parameters used in the GGR model for setting the trading parameters. We set the stop-loss threshold to 3 to prevent excessive losses due to excessive strategy losses and transaction costs. We set the number of paired shares to 10. For convenience, we divide the stocks into groups according to their weights, positive and negative.

Performance evaluation
To compare the effects of the GGR model and the proposed BQQ model, we verify the effectiveness of the proposed optimization pairing strategy. This paper selects the income coefficient α, risk coefficient β, and the Sharpe ratio as evaluation indicators, and the two strategies are back-tested and compared on the JoinQuant platform.

Stock-matching stage
When adopting the GGR model, we select five groups of stocks with the smallest SSD (two in each group) from each formation period. There is a small distance between these stocks. The stocks are selected from 50 constituent stocks. The matching results are shown in Table 1. When adopting the BQQ model, since the trend of the stock was screened beforehand, we select two sets of stocks (five in each group) for pairing. To explore the impact of λ on strategy performance, we perform a back-test on the optimal matching strategy under different values (when λ is greater than 0.7, the paired-stock spread is relatively poor, resulting in a strategy failure). Therefore, this paper is limited to a λ range from 0 to 0.7. The pairing results are shown in Table 2.
As can be seen in Table 2, when λ changes from 0 to 0.4, the selected stock pairs show a very dramatic change; when λ changes from 0.4 to 0.6, the selected stock pairs are almost identical. At that time, the change of λ cannot significantly affect the return; when λ changes from 0.6 to 0.7, the selected stock pairs change less. However, the positives and negatives of the paired-stock weights have changed. Therefore, compared with the GGR model, the optimized pairing strategy makes better use of stock price information and is more flexible.

Stock trading stage
The GGR model and the BQQ model use the same parameters set in the back-test. The trading period is 2016.01-2016.12. The results obtained are shown in Table 3. By comparing the back-test performance of the BQQ strategy with the GGR model, we arrive at five findings: (1) The ability of the BQQ strategy to obtain revenue is significantly stronger than of the GGR model, which shows that the BQQ strategy is effective in increasing the volatility of the spread to improve the profitability of the pairs-trading strategy.  Figure 1 shows the average annualized rate of return of the BQQ strategy and the GGR strategy for different λ values (both in-sample data and out-of-sample data, respectively). For the in-sample rate of return, both strategies were carried out for a total of 32 back-tests, with a total of 31 positive gains. The return of the BQQ strategy is better than that of the GGR strategy in 87.5% of the cases. For the out-of-sample rate of return, the return of the BQQ strategy is better than that of the GGR strategy in 68.8% of the cases. To rule out the deviation of income caused by the different ways of opening a position, we also need to examine the coefficient of the two strategies and the Sharpe ratio.
As shown in Figs. 2 and 3, the BQQ model performs significantly better than the GGR model, both in terms of the coefficient α and the Sharpe ratio. This result indicates that the BQQ model bears the average return of nonmarket risk during the four trading periods, and the average return on unit risk is higher than with the GGR model. Therefore, the better perfomance of the BQQ strategy is not from the strategy taking more market risk; rather, it is independent of the way the strategy is opened. (2) The BQQ strategy has a strong ability to hedge the market. Table 4 shows the average value of the coefficient β of the BQQ strategy under different values of λ. It can be seen that the absolute value of β is below 0.1, which indicates and proves that the performance of the strategy is not affected by market fluctuations, which in turn proves that the pairs-trading strategy based on the minimum distance method can hedge market risk well. Compared with the GGR model, the coefficient β of the BQQ strategy is magnified because the GGR model uses a capitalneutral approach when in the opening position, while the BQQ strategy uses a coefficient-neutral approach. Due to the existence of the spread, the BQQ strategy cannot guarantee that the market value of the bought stock will be equal to the market value of the sold stock when the position is opened, which is equivalent to the fact that some net positions follow market ups and downs and the coefficient will increase.
(3) Similar to the GGR model, the BQQ strategy performs poorly in out-of-sample data. In the 32 out-of-sample back-tests, the annualized return of the BQQ strategy was positive only six times, and the coefficient α was positive only eight times. The main reason for this phenomenon is the lack of rationality in the length of the formation period used at the stock-matching stage and the trading parameters used in the stock-trading stage. The yield of the GGR model is affected by trading parameters in many cases, such as the formation period, trading period, and opening threshold. Since this article presents only a methodological improvement for the stock-pairing trading model, it does not provide a more in-depth study of trading parameters. (4) Performance of the BQQ strategy is very sensitive to the value of λ. Adjustable λ enhances the practicality of the strategy. In the same trading period, the return of the BQQ strategy does not show a monotonous change with λ. When the value of λ is too large, the stock-matching strategy is invalid because when λ increases, the volatility of the paired-stock spread is increasing, which means that the strategy is likely to obtain higher returns. Conversely, the increase of λ raises the risk of divergence in the spread, making it easier for the strategy to trigger a stop-loss signal and cause losses. Therefore, λ is a significant parameter to adjust the risk of the strategy, and the strategy user can adjust λ to match risk preferences, which enhances the usefulness of the strategy. (5) The optimal λ value is time dependent. The benefit of the BQQ strategy are nonmonotonic changes in λ. Excessive λ assembly leads to the invalidation of the stock-matching strategy, which means that for a specific trading period, there is an optimized λ that maximizes the strategy's return. From the perspective of revenue indicators and risk indicators, there are no obvious rules about the performance of the strategy and the change of λ. That is, the optimal λ value varies with the trading period and is time dependent. Table 5 shows the values of coefficient α and the Sharpe ratio from four out-ofsample back-tests. When λ is 0.5, coefficient α and the Sharpe ratio take the maximum value at the same time.
The results show that when λ is 0.5, the average matching revenue of the optimized matching strategy for non-market risk in the four trading periods and the average return from unit risk are the largest, but the value needs to be verified by large-scale data.

Conclusions
By introducing multi-objective optimization to the GGR model, this paper considers the long-term equilibrium of stock prices and the volatility of spreads and establishes a BQQ model. This novel pairs-trading model provides a new perspective for pairstrading strategy research. At the same time, it provides investors with a stock-matching method that effectively improves the profitability of the trading strategy. This paper introduces the weight λ when solving bi-objective optimization problems, and these problems are transformed into single-objective optimization problems and solved by a sequential quadratic programming algorithm. To verify the effectiveness of the optimized pairing strategy, this paper selects the traditional GGR model as model for comparison and conducts back-testing on multiple time intervals on the SSE 50 constituents. We find that the BQQ strategy was able to obtain significantly higher revenue than the GGR model, and the adjustment of the weight λ increases the flexibility and practicality of the strategy. This paper has some limitations. We used the SSE 50 Index as the research target in our empirical analysis. However, this was subject to the limitation of financing and securities lending; the small number of stocks may have affected the performance of the trading strategy. Additionally, when we performed the validity check of the optimized pairing strategy, there was scarce in-depth research available on the trading parameters and optimal values of the strategy, and this may have affected the profitability of the strategy to some extent. Therefore, subsequent research work should include these aspects. In the future, we will expand the number of stock share pools. In addition, the screening method for the transaction parameter of pairs-trading strategy requires indepth research to find the right trading parameters for the BQQ strategy. Finally, we will try to establish an optimized pairing strategy by attaining the function of risk indicator λ through extended empirical analysis.