In this article, I continue exploring solutions to the matching supply with demand challenge. Thanks to Luca, Kamil Kosno, Daniel Brown, Brian Walker, Joe Obbish, Rainer Hoffmann, Paul White, Charlie, Ian, and Peter Larsson, for sending your solutions.
Last month I covered solutions based on a revised interval intersections approach compared to the classic one. The fastest of those solutions combined ideas from Kamil, Luca, and Daniel. It unified two queries with disjoint sargable predicates. It took the solution 1.34 seconds to complete against a 400K-row input. That’s not too shabby considering the solution based on the classic interval intersections approach took 931 seconds to complete against the same input. Also recall Joe came up with a brilliant solution that relies on the classic interval intersection approach but optimizes the matching logic by bucketizing intervals based on the largest interval length. With the same 400K-row input, it took Joe’s solution 0.9 seconds to complete. The tricky part about this solution is its performance degrades as the largest interval length increases.
This month I explore fascinating solutions that are faster than the Kamil/Luca/Daniel Revised Intersections solution and are neutral to interval length. The solutions in this article were created by Brian Walker, Ian, Peter Larsson, Paul White, and me.
I tested all solutions in this article against the Auctions input table with 100K, 200K, 300K, and 400K rows, and with the following indexes:
-- Index to support solution CREATE UNIQUE NONCLUSTERED INDEX idx_Code_ID_i_Quantity ON dbo.Auctions(Code, ID) INCLUDE(Quantity); -- Enable batch-mode Window Aggregate CREATE NONCLUSTERED COLUMNSTORE INDEX idx_cs ON dbo.Auctions(ID) WHERE ID = -1 AND ID = -2;
When describing the logic behind the solutions, I’ll assume the Auctions table is populated with the following small set of sample data:
ID Code Quantity ----------- ---- --------- 1 D 5.000000 2 D 3.000000 3 D 8.000000 5 D 2.000000 6 D 8.000000 7 D 4.000000 8 D 2.000000 1000 S 8.000000 2000 S 6.000000 3000 S 2.000000 4000 S 2.000000 5000 S 4.000000 6000 S 3.000000 7000 S 2.000000
Following is the desired result for this sample data:
DemandID SupplyID TradeQuantity ----------- ----------- -------------- 1 1000 5.000000 2 1000 3.000000 3 2000 6.000000 3 3000 2.000000 5 4000 2.000000 6 5000 4.000000 6 6000 3.000000 6 7000 1.000000 7 7000 1.000000
Brian Walker’s Solution
Outer joins are fairly commonly used in SQL querying solutions, but by far when you use those, you use single-sided ones. When teaching about outer joins, I often get questions asking for examples for practical use cases of full outer joins, and there aren’t that many. Brian’s solution is a beautiful example of a practical use case of full outer joins.
Here’s Brian’s complete solution code:
DROP TABLE IF EXISTS #MyPairings; CREATE TABLE #MyPairings ( DemandID INT NOT NULL, SupplyID INT NOT NULL, TradeQuantity DECIMAL(19,06) NOT NULL ); WITH D AS ( SELECT A.ID, SUM(A.Quantity) OVER (PARTITION BY A.Code ORDER BY A.ID ROWS UNBOUNDED PRECEDING) AS Running FROM dbo.Auctions AS A WHERE A.Code = 'D' ), S AS ( SELECT A.ID, SUM(A.Quantity) OVER (PARTITION BY A.Code ORDER BY A.ID ROWS UNBOUNDED PRECEDING) AS Running FROM dbo.Auctions AS A WHERE A.Code = 'S' ), W AS ( SELECT D.ID AS DemandID, S.ID AS SupplyID, ISNULL(D.Running, S.Running) AS Running FROM D FULL JOIN S ON D.Running = S.Running ), Z AS ( SELECT CASE WHEN W.DemandID IS NOT NULL THEN W.DemandID ELSE MIN(W.DemandID) OVER (ORDER BY W.Running ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) END AS DemandID, CASE WHEN W.SupplyID IS NOT NULL THEN W.SupplyID ELSE MIN(W.SupplyID) OVER (ORDER BY W.Running ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) END AS SupplyID, W.Running FROM W ) INSERT #MyPairings( DemandID, SupplyID, TradeQuantity ) SELECT Z.DemandID, Z.SupplyID, Z.Running - ISNULL(LAG(Z.Running) OVER (ORDER BY Z.Running), 0.0) AS TradeQuantity FROM Z WHERE Z.DemandID IS NOT NULL AND Z.SupplyID IS NOT NULL;
I revised Brian’s original use of derived tables with CTEs for clarity.
The CTE D computes running total demand quantities in a result column called D.Running, and the CTE S computes running total supply quantities in a result column called S.Running. The CTE W then performs a full outer join between D and S, matching D.Running with S.Running, and returns the first non-NULL among D.Running and S.Running as W.Running. Here’s the result you get if you query all rows from W ordered by Running:
DemandID SupplyID Running ----------- ----------- ---------- 1 NULL 5.000000 2 1000 8.000000 NULL 2000 14.000000 3 3000 16.000000 5 4000 18.000000 NULL 5000 22.000000 NULL 6000 25.000000 6 NULL 26.000000 NULL 7000 27.000000 7 NULL 30.000000 8 NULL 32.000000
The idea to use a full outer join based on a predicate that compares the demand and supply running totals is a stroke of genius! Most rows in this result—the first 9 in our case—represent result pairings with a bit of extra computations missing. Rows with trailing NULL IDs of one of the kinds represent entries that cannot be matched. In our case, the last two rows represent demand entries that cannot be matched with supply entries. So, what’s left here is to compute the DemandID, SupplyID and TradeQuantity of the result pairings, and to filter out the entries that cannot be matched.
The logic that computes the result DemandID and SupplyID is done in the CTE Z as follows (assuming ordering in W by Running):
- DemandID: if DemandID is not NULL then DemandID, otherwise the minimum DemandID starting with the current row
- SupplyID: if SupplyID is not NULL then SupplyID, otherwise the minimum SupplyID starting with the current row
Here’s the result you get if you query Z and order the rows by Running:
DemandID SupplyID Running ----------- ----------- ---------- 1 1000 5.000000 2 1000 8.000000 3 2000 14.000000 3 3000 16.000000 5 4000 18.000000 6 5000 22.000000 6 6000 25.000000 6 7000 26.000000 7 7000 27.000000 7 NULL 30.000000 8 NULL 32.000000
The outer query filters out rows from Z representing entries of one kind that cannot be matched by entries of the other kind by ensuring both DemandID and SupplyID are not NULL. The result pairings’ trade quantity is computed as the current Running value minus the previous Running value using a LAG function.
Here’s what gets written to the #MyPairings table, which is the desired result:
DemandID SupplyID TradeQuantity ----------- ----------- -------------- 1 1000 5.000000 2 1000 3.000000 3 2000 6.000000 3 3000 2.000000 5 4000 2.000000 6 5000 4.000000 6 6000 3.000000 6 7000 1.000000 7 7000 1.000000
The plan for this solution is shown in Figure 1.
Figure 1: Query plan for Brian’s solution
The top two branches of the plan compute the demand and supply running totals using a batch-mode Window Aggregate operator, each after retrieving the respective entries from the supporting index. Like I explained in earlier articles in the series, since the index already has the rows ordered like the Window Aggregate operators need them to be, you would think explicit Sort operators shouldn’t be required. But SQL Server doesn’t currently support an efficient combination of parallel order-preserving index operation prior to a parallel batch-mode Window Aggregate operator, so as a result, an explicit parallel Sort operator precedes each of the parallel Window Aggregate operators.
The plan uses a batch-mode hash join to handle the full outer join. The plan also uses two additional batch-mode Window Aggregate operators preceded by explicit Sort operators to compute the MIN and LAG window functions.
That’s a pretty efficient plan!
Here are the run times I got for this solution against input sizes ranging from 100K to 400K rows, in seconds:
100K: 0.368 200K: 0.845 300K: 1.255 400K: 1.315
Itzik’s Solution
The next solution for the challenge is one of my attempts at solving it. Here’s the complete solution code:
DROP TABLE IF EXISTS #MyPairings; WITH C1 AS ( SELECT *, SUM(Quantity) OVER(PARTITION BY Code ORDER BY ID ROWS UNBOUNDED PRECEDING) AS SumQty FROM dbo.Auctions ), C2 AS ( SELECT *, SUM(Quantity * CASE Code WHEN 'D' THEN -1 WHEN 'S' THEN 1 END) OVER(ORDER BY SumQty, Code ROWS UNBOUNDED PRECEDING) AS StockLevel, LAG(SumQty, 1, 0.0) OVER(ORDER BY SumQty, Code) AS PrevSumQty, MAX(CASE WHEN Code = 'D' THEN ID END) OVER(ORDER BY SumQty, Code ROWS UNBOUNDED PRECEDING) AS PrevDemandID, MAX(CASE WHEN Code = 'S' THEN ID END) OVER(ORDER BY SumQty, Code ROWS UNBOUNDED PRECEDING) AS PrevSupplyID, MIN(CASE WHEN Code = 'D' THEN ID END) OVER(ORDER BY SumQty, Code ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS NextDemandID, MIN(CASE WHEN Code = 'S' THEN ID END) OVER(ORDER BY SumQty, Code ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS NextSupplyID FROM C1 ), C3 AS ( SELECT *, CASE Code WHEN 'D' THEN ID WHEN 'S' THEN CASE WHEN StockLevel > 0 THEN NextDemandID ELSE PrevDemandID END END AS DemandID, CASE Code WHEN 'S' THEN ID WHEN 'D' THEN CASE WHEN StockLevel <= 0 THEN NextSupplyID ELSE PrevSupplyID END END AS SupplyID, SumQty - PrevSumQty AS TradeQuantity FROM C2 ) SELECT DemandID, SupplyID, TradeQuantity INTO #MyPairings FROM C3 WHERE TradeQuantity > 0.0 AND DemandID IS NOT NULL AND SupplyID IS NOT NULL;
The CTE C1 queries the Auctions table and uses a window function to compute running total demand and supply quantities, calling the result column SumQty.
The CTE C2 queries C1, and computes a number of attributes using window functions based on SumQty and Code ordering:
- StockLevel: The current stock level after processing each entry. This is computed by assigning a negative sign to demand quantities and a positive sign to supply quantities and summing those quantities up to and including the current entry.
- PrevSumQty: Previous row’s SumQty value.
- PrevDemandID: Last non-NULL demand ID.
- PrevSupplyID: Last non-NULL supply ID.
- NextDemandID: Next non-NULL demand ID.
- NextSupplyID: Next non-NULL supply ID.
Here’s the contents of C2 ordered by SumQty and Code:
ID Code Quantity SumQty StockLevel PrevSumQty PrevDemandID PrevSupplyID NextDemandID NextSupplyID ----- ---- --------- ---------- ----------- ----------- ------------ ------------ ------------ ------------ 1 D 5.000000 5.000000 -5.000000 0.000000 1 NULL 1 1000 2 D 3.000000 8.000000 -8.000000 5.000000 2 NULL 2 1000 1000 S 8.000000 8.000000 0.000000 8.000000 2 1000 3 1000 2000 S 6.000000 14.000000 6.000000 8.000000 2 2000 3 2000 3 D 8.000000 16.000000 -2.000000 14.000000 3 2000 3 3000 3000 S 2.000000 16.000000 0.000000 16.000000 3 3000 5 3000 5 D 2.000000 18.000000 -2.000000 16.000000 5 3000 5 4000 4000 S 2.000000 18.000000 0.000000 18.000000 5 4000 6 4000 5000 S 4.000000 22.000000 4.000000 18.000000 5 5000 6 5000 6000 S 3.000000 25.000000 7.000000 22.000000 5 6000 6 6000 6 D 8.000000 26.000000 -1.000000 25.000000 6 6000 6 7000 7000 S 2.000000 27.000000 1.000000 26.000000 6 7000 7 7000 7 D 4.000000 30.000000 -3.000000 27.000000 7 7000 7 NULL 8 D 2.000000 32.000000 -5.000000 30.000000 8 7000 8 NULL
The CTE C3 queries C2 and computes the result pairings’ DemandID, SupplyID and TradeQuantity, before removing some superfluous rows.
The result C3.DemandID column is computed like so:
- If the current entry is a demand entry, return DemandID.
- If the current entry is a supply entry and the current stock level is positive, return NextDemandID.
- If the current entry is a supply entry and the current stock level is nonpositive, return PrevDemandID.
The result C3.SupplyID column is computed like so:
- If the current entry is a supply entry, return SupplyID.
- If the current entry is a demand entry and the current stock level is nonpositive, return NextSupplyID.
- If the current entry is a demand entry and the current stock level is positive, return PrevSupplyID.
The result TradeQuantity is computed as the current row’s SumQty minus PrevSumQty.
Here are the contents of the columns relevant to the result from C3:
DemandID SupplyID TradeQuantity ----------- ----------- -------------- 1 1000 5.000000 2 1000 3.000000 2 1000 0.000000 3 2000 6.000000 3 3000 2.000000 3 3000 0.000000 5 4000 2.000000 5 4000 0.000000 6 5000 4.000000 6 6000 3.000000 6 7000 1.000000 7 7000 1.000000 7 NULL 3.000000 8 NULL 2.000000
What’s left for the outer query to do is to filter out superfluous rows from C3. Those include two cases:
- When the running totals of both kinds are the same, the supply entry has a zero trading quantity. Remember the ordering is based on SumQty and Code, so when the running totals are the same, the demand entry appears before the supply entry.
- Trailing entries of one kind that cannot be matched with entries of the other kind. Such entries are represented by rows in C3 where either the DemandID is NULL or the SupplyID is NULL.
The plan for this solution is shown in Figure 2.
Figure 2: Query plan for Itzik’s solution
The plan applies one pass over the input data and uses four batch-mode Window Aggregate operators to handle the various windowed computations. All Window Aggregate operators are preceded by explicit Sort operators, although only two of those are unavoidable here. The other two have to do with the current implementation of the parallel batch-mode Window Aggregate operator, which cannot rely on a parallel order-preserving input. A simple way to see which Sort operators are due to this reason is to force a serial plan and see which Sort operators disappear. When I force a serial plan with this solution, the first and third Sort operators disappear.
Here are the run times in seconds that I got for this solution:
100K: 0.246 200K: 0.427 300K: 0.616 400K: 0.841>
Ian’s Solution
Ian’s solution is short and efficient. Here’s the complete solution code:
DROP TABLE IF EXISTS #MyPairings; WITH A AS ( SELECT *, SUM(Quantity) OVER (PARTITION BY Code ORDER BY ID ROWS UNBOUNDED PRECEDING) AS CumulativeQuantity FROM dbo.Auctions ), B AS ( SELECT CumulativeQuantity, CumulativeQuantity - LAG(CumulativeQuantity, 1, 0) OVER (ORDER BY CumulativeQuantity) AS TradeQuantity, MAX(CASE WHEN Code = 'D' THEN ID END) AS DemandID, MAX(CASE WHEN Code = 'S' THEN ID END) AS SupplyID FROM A GROUP BY CumulativeQuantity, ID/ID -- bogus grouping to improve row estimate -- (rows count of Auctions instead of 2 rows) ), C AS ( -- fill in NULLs with next supply / demand -- FIRST_VALUE(ID) IGNORE NULLS OVER ... -- would be better, but this will work because the IDs are in CumulativeQuantity order SELECT MIN(DemandID) OVER (ORDER BY CumulativeQuantity ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS DemandID, MIN(SupplyID) OVER (ORDER BY CumulativeQuantity ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS SupplyID, TradeQuantity FROM B ) SELECT DemandID, SupplyID, TradeQuantity INTO #MyPairings FROM C WHERE SupplyID IS NOT NULL -- trim unfulfilled demands AND DemandID IS NOT NULL; -- trim unused supplies
The code in the CTE A queries the Auctions table and computes running total demand and supply quantities using a window function, naming the result column CumulativeQuantity.
The code in the CTE B queries CTE A, and groups the rows by CumulativeQuantity. This grouping achieves a similar effect to Brian’s outer join based on the demand and supply running totals. Ian also added the dummy expression ID/ID to the grouping set to improve the grouping’s original cardinality under estimation. On my machine, this also resulted in using a parallel plan instead of a serial one.
In the SELECT list, the code computes DemandID and SupplyID by retrieving the ID of the respective entry type in the group using the MAX aggregate and a CASE expression. If the ID isn’t present in the group, the result is NULL. The code also computes a result column called TradeQuantity as the current cumulative quantity minus the previous one, retrieved using the LAG window function.
Here are the contents of B:
CumulativeQuantity TradeQuantity DemandId SupplyId ------------------- -------------- --------- --------- 5.000000 5.000000 1 NULL 8.000000 3.000000 2 1000 14.000000 6.000000 NULL 2000 16.000000 2.000000 3 3000 18.000000 2.000000 5 4000 22.000000 4.000000 NULL 5000 25.000000 3.000000 NULL 6000 26.000000 1.000000 6 NULL 27.000000 1.000000 NULL 7000 30.000000 3.000000 7 NULL 32.000000 2.000000 8 NULL
The code in the CTE C then queries the CTE B and fills in NULL demand and supply IDs with the next non-NULL demand and supply IDs, using the MIN window function with the frame ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING.
Here are the contents of C:
DemandID SupplyID TradeQuantity --------- --------- -------------- 1 1000 5.000000 2 1000 3.000000 3 2000 6.000000 3 3000 2.000000 5 4000 2.000000 6 5000 4.000000 6 6000 3.000000 6 7000 1.000000 7 7000 1.000000 7 NULL 3.000000 8 NULL 2.000000
The last step handled by the outer query against C is to remove entries of one kind that cannot be matched with entries of the other kind. That’s done by filtering out rows where either SupplyID is NULL or DemandID is NULL.
The plan for this solution is shown in Figure 3.
Figure 3: Query plan for Ian’s solution
This plan performs one scan of the input data and uses three parallel batch-mode window aggregate operators to compute the various window functions, all preceded by parallel Sort operators. Two of those are unavoidable as you can verify by forcing a serial plan. The plan also uses a Hash Aggregate operator to handle the grouping and aggregation in the CTE B.
Here are the run times in seconds that I got for this solution:
100K: 0.214 200K: 0.363 300K: 0.546 400K: 0.701
Peter Larsson’s Solution
Peter Larsson’s solution is amazingly short, sweet, and highly efficient. Here’s Peter’s complete solution code:
DROP TABLE IF EXISTS #MyPairings; WITH cteSource(ID, Code, RunningQuantity) AS ( SELECT ID, Code, SUM(Quantity) OVER (PARTITION BY Code ORDER BY ID) AS RunningQuantity FROM dbo.Auctions ) SELECT DemandID, SupplyID, TradeQuantity INTO #MyPairings FROM ( SELECT MIN(CASE WHEN Code = 'D' THEN ID ELSE 2147483648 END) OVER (ORDER BY RunningQuantity, Code ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS DemandID, MIN(CASE WHEN Code = 'S' THEN ID ELSE 2147483648 END) OVER (ORDER BY RunningQuantity, Code ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS SupplyID, RunningQuantity - COALESCE(LAG(RunningQuantity) OVER (ORDER BY RunningQuantity, Code), 0.0) AS TradeQuantity FROM cteSource ) AS d WHERE DemandID < 2147483648 AND SupplyID < 2147483648 AND TradeQuantity > 0.0;
The CTE cteSource queries the Auctions table and uses a window function to compute running total demand and supply quantities, calling the result column RunningQuantity.
The code defining the derived table d queries cteSource and computes the result pairings’ DemandID, SupplyID, and TradeQuantity, before removing some superfluous rows. All window functions used in those calculations are based on RunningQuantity and Code ordering.
The result column d.DemandID is computed as the minimum demand ID starting with the current or 2147483648 if none is found.
The result column d.SupplyID is computed as the minimum supply ID starting with the current or 2147483648 if none is found.
The result TradeQuantity is computed as the current row’s RunningQuantity value minus the previous row’s RunningQuantity value.
Here are the contents of d:
DemandID SupplyID TradeQuantity --------- ----------- -------------- 1 1000 5.000000 2 1000 3.000000 3 1000 0.000000 3 2000 6.000000 3 3000 2.000000 5 3000 0.000000 5 4000 2.000000 6 4000 0.000000 6 5000 4.000000 6 6000 3.000000 6 7000 1.000000 7 7000 1.000000 7 2147483648 3.000000 8 2147483648 2.000000
What’s left for the outer query to do is to filter out superfluous rows from d. Those are cases where the trading quantity is zero, or entries of one kind that cannot be matched with entries from the other kind (with ID 2147483648).
The plan for this solution is shown in Figure 4.
Figure 4: Query plan for Peter’s solution
Like Ian’s plan, Peter’s plan relies on one scan of the input data and uses three parallel batch-mode window aggregate operators to compute the various window functions, all preceded by parallel Sort operators. Two of those are unavoidable as you can verify by forcing a serial plan. In Peter’s plan, there’s no need for a grouped aggregate operator like in Ian’s plan.
Peter’s critical insight that allowed for such a short solution was the realization that for a relevant entry of either of the kinds, the matching ID of the other kind will always appear later (based on RunningQuantity and Code ordering). After seeing Peter’s solution, it sure felt like I overcomplicated things in mine!
Here are the run times in seconds I got for this solution:
100K: 0.197 200K: 0.412 300K: 0.558 400K: 0.696
Paul White’s Solution
Our last solution was created by Paul White. Here’s the complete solution code:
DROP TABLE IF EXISTS #MyPairings; CREATE TABLE #MyPairings ( DemandID integer NOT NULL, SupplyID integer NOT NULL, TradeQuantity decimal(19, 6) NOT NULL ); GO INSERT #MyPairings WITH (TABLOCK) ( DemandID, SupplyID, TradeQuantity ) SELECT Q3.DemandID, Q3.SupplyID, Q3.TradeQuantity FROM ( SELECT Q2.DemandID, Q2.SupplyID, TradeQuantity = -- Interval overlap CASE WHEN Q2.Code = 'S' THEN CASE WHEN Q2.CumDemand >= Q2.IntEnd THEN Q2.IntLength WHEN Q2.CumDemand > Q2.IntStart THEN Q2.CumDemand - Q2.IntStart ELSE 0.0 END WHEN Q2.Code = 'D' THEN CASE WHEN Q2.CumSupply >= Q2.IntEnd THEN Q2.IntLength WHEN Q2.CumSupply > Q2.IntStart THEN Q2.CumSupply - Q2.IntStart ELSE 0.0 END END FROM ( SELECT Q1.Code, Q1.IntStart, Q1.IntEnd, Q1.IntLength, DemandID = MAX(IIF(Q1.Code = 'D', Q1.ID, 0)) OVER ( ORDER BY Q1.IntStart, Q1.ID ROWS UNBOUNDED PRECEDING), SupplyID = MAX(IIF(Q1.Code = 'S', Q1.ID, 0)) OVER ( ORDER BY Q1.IntStart, Q1.ID ROWS UNBOUNDED PRECEDING), CumSupply = SUM(IIF(Q1.Code = 'S', Q1.IntLength, 0)) OVER ( ORDER BY Q1.IntStart, Q1.ID ROWS UNBOUNDED PRECEDING), CumDemand = SUM(IIF(Q1.Code = 'D', Q1.IntLength, 0)) OVER ( ORDER BY Q1.IntStart, Q1.ID ROWS UNBOUNDED PRECEDING) FROM ( -- Demand intervals SELECT A.ID, A.Code, IntStart = SUM(A.Quantity) OVER ( ORDER BY A.ID ROWS UNBOUNDED PRECEDING) - A.Quantity, IntEnd = SUM(A.Quantity) OVER ( ORDER BY A.ID ROWS UNBOUNDED PRECEDING), IntLength = A.Quantity FROM dbo.Auctions AS A WHERE A.Code = 'D' UNION ALL -- Supply intervals SELECT A.ID, A.Code, IntStart = SUM(A.Quantity) OVER ( ORDER BY A.ID ROWS UNBOUNDED PRECEDING) - A.Quantity, IntEnd = SUM(A.Quantity) OVER ( ORDER BY A.ID ROWS UNBOUNDED PRECEDING), IntLength = A.Quantity FROM dbo.Auctions AS A WHERE A.Code = 'S' ) AS Q1 ) AS Q2 ) AS Q3 WHERE Q3.TradeQuantity > 0;
The code defining the derived table Q1 uses two separate queries to compute demand and supply intervals based on running totals and unifies the two. For each interval, the code computes its start (IntStart), end (IntEnd), and length (IntLength). Here are the contents of Q1 ordered by IntStart and ID:
ID Code IntStart IntEnd IntLength ----- ---- ---------- ---------- ---------- 1 D 0.000000 5.000000 5.000000 1000 S 0.000000 8.000000 8.000000 2 D 5.000000 8.000000 3.000000 3 D 8.000000 16.000000 8.000000 2000 S 8.000000 14.000000 6.000000 3000 S 14.000000 16.000000 2.000000 5 D 16.000000 18.000000 2.000000 4000 S 16.000000 18.000000 2.000000 6 D 18.000000 26.000000 8.000000 5000 S 18.000000 22.000000 4.000000 6000 S 22.000000 25.000000 3.000000 7000 S 25.000000 27.000000 2.000000 7 D 26.000000 30.000000 4.000000 8 D 30.000000 32.000000 2.000000
The code defining the derived table Q2 queries Q1 and computes result columns called DemandID, SupplyID, CumSupply, and CumDemand. All window functions used by this code are based on IntStart and ID ordering and the frame ROWS UNBOUNDED PRECEDING (all rows up to the current).
DemandID is the maximum demand ID up to the current row, or 0 if none exists.
SupplyID is the maximum supply ID up to the current row, or 0 if none exists.
CumSupply is the cumulative supply quantities (supply interval lengths) up to the current row.
CumDemand is the cumulative demand quantities (demand interval lengths) up to the current row.
Here are the contents of Q2:
Code IntStart IntEnd IntLength DemandID SupplyID CumSupply CumDemand ---- ---------- ---------- ---------- --------- --------- ---------- ---------- D 0.000000 5.000000 5.000000 1 0 0.000000 5.000000 S 0.000000 8.000000 8.000000 1 1000 8.000000 5.000000 D 5.000000 8.000000 3.000000 2 1000 8.000000 8.000000 D 8.000000 16.000000 8.000000 3 1000 8.000000 16.000000 S 8.000000 14.000000 6.000000 3 2000 14.000000 16.000000 S 14.000000 16.000000 2.000000 3 3000 16.000000 16.000000 D 16.000000 18.000000 2.000000 5 3000 16.000000 18.000000 S 16.000000 18.000000 2.000000 5 4000 18.000000 18.000000 D 18.000000 26.000000 8.000000 6 4000 18.000000 26.000000 S 18.000000 22.000000 4.000000 6 5000 22.000000 26.000000 S 22.000000 25.000000 3.000000 6 6000 25.000000 26.000000 S 25.000000 27.000000 2.000000 6 7000 27.000000 26.000000 D 26.000000 30.000000 4.000000 7 7000 27.000000 30.000000 D 30.000000 32.000000 2.000000 8 7000 27.000000 32.000000
Q2 already has the final result pairings’ correct DemandID and SupplyID values. The code defining the derived table Q3 queries Q2 and computes the result pairings’ TradeQuantity values as the overlapping segments of the demand and supply intervals. Finally, the outer query against Q3 filters only the relevant pairings where TradeQuantity is positive.
The plan for this solution is shown in Figure 5.
Figure 5: Query plan for Paul’s solution
The top two branches of the plan are responsible for computing the demand and supply intervals. Both rely on Index Seek operators to get the relevant rows based on index order, and then use parallel batch-mode Window Aggregate operators, preceded by Sort operators that theoretically could have been avoided. The plan concatenates the two inputs, sorts the rows by IntStart and ID to support the subsequent remaining Window Aggregate operator. Only this Sort operator is unavoidable in this plan. The rest of the plan handles the needed scalar computations and the final filter. That’s a very efficient plan!
Here are the run times in seconds I got for this solution:
100K: 0.187 200K: 0.331 300K: 0.369 400K: 0.425
These numbers are pretty impressive!
Performance Comparison
Figure 6 has a performance comparison between all solutions covered in this article.
Figure 6: Performance comparison
At this point, we can add the fastest solutions I covered in previous articles. Those are Joe’s and Kamil/Luca/Daniel’s solutions. The complete comparison is shown in Figure 7.
Figure 7: Performance comparison including earlier solutions
These are all impressively fast solutions, with the fastest being Paul’s and Peter’s.
Conclusion
When I originally introduced Peter’s puzzle, I showed a straightforward cursor-based solution that took 11.81 seconds to complete against a 400K-row input. The challenge was to come up with an efficient set-based solution. It’s so inspiring to see all the solutions people sent. I always learn so much from such puzzles both from my own attempts and by analyzing others’ solutions. It’s impressive to see several sub-second solutions, with Paul’s being less than half a second!
It's great to have multiple efficient techniques to handle such a classic need of matching supply with demand. Well done everyone!
The post Matching Supply With Demand — Solutions, Part 3 appeared first on SQLPerformance.com.