Advent of Code: Day 2

Reading Time: 4 minutes

It’s the 2nd of December, ergo day two of Advent of Code. Read this if you want more context about the competition, or the series of blog posts.

The challenge

Today’s challenge was an interesting problem from a SQL perspective, because it involved some data wrangling,¬† but after that was actually really easy to solve. This is reminiscent of many of the problems we face in the data business on a day-to-day basis. The difficulty often lies in solving the problem, not actually writing the code.

Data preparation and wrangling is an essential part of any data professional’s skill set. I often find that data prep takes 70% or more of total project time, so it really is worth understanding how to best model your data for the problem you’re trying to solve.

The solution (download)

Essentially, part one of the challenge today was to determine the highest and lowest values in a row of data, and then subtract the lowest from the highest to create a checksum value per row. Add up the checksums for each row to complete the challenge.

First task, putting the data into a table that I can then manipulate. I had a couple of thoughts, use the SQL Server Import Wizard Task, use SSIS, but in the end, mainly because the file was pretty small, I just imported it as 16 text strings and stored it in a table variable. One of the best tips I’ve ever learned in SSMS is using the ALT-key in combination with the arrows for block selection. This technique has saved me countless hours in my job, and today, I created 16 rows of¬†(”),¬†and then block copied the data inside those quotes.

declare @table table (id int identity(1,1),str varchar(500))
insert into @table
values
('3751 3769 2769 2039 2794 240 3579 1228 4291 220 324 3960 211 1346 237 1586'),
('550 589 538 110 167 567 99 203 524 288 500 111 118 185 505 74'),
('2127 1904 199 221 1201 250 1119 377 1633 1801 2011 1794 394 238 206 680'),
('435 1703 1385 1461 213 1211 192 1553 1580 197 571 195 326 1491 869 1282'),
('109 104 3033 120 652 2752 1822 2518 1289 1053 1397 951 3015 3016 125 1782'),
('2025 1920 1891 99 1057 1909 2237 106 97 920 603 1841 2150 1980 1970 88'),
('1870 170 167 176 306 1909 1825 1709 168 1400 359 817 1678 1718 1594 1552'),
('98 81 216 677 572 295 38 574 403 74 91 534 662 588 511 51'),
('453 1153 666 695 63 69 68 58 524 1088 75 1117 1192 1232 1046 443'),
('3893 441 1825 3730 3660 115 4503 4105 3495 4092 48 3852 132 156 150 4229'),
('867 44 571 40 884 922 418 328 901 845 42 860 932 53 432 569'),
('905 717 162 4536 4219 179 990 374 4409 4821 393 4181 4054 4958 186 193'),
('2610 2936 218 2552 3281 761 204 3433 3699 2727 3065 3624 193 926 1866 236'),
('2602 216 495 3733 183 4688 2893 4042 3066 3810 189 4392 3900 4321 2814 159'),
('166 136 80 185 135 78 177 123 82 150 121 145 115 63 68 24'),
('214 221 265 766 959 1038 226 1188 1122 117 458 1105 1285 1017 274 281')

So now I have 16 rows of data, but each row contains 16 tab delimited values. Now I have to make them into values I can compare to each other. However, from a SQL perspective, functions like MIN and MAX are designed to work on columns of data, not rows. So the appropriate data structure for my data is not 16 rows with 16 columns in each, but rather 256 rows, each one indicating the number of the string, the position of the value (1-16) and the value itself.

So the following subset, becomes the table on the right.

3751 3769 2769 2039
Data in a usable table
550  589  538  110
2127 1904 199  221
435  1703 1385 1461

Getting to this state is fairly straightforward, and something we actually do so often that I have a split list user defined table valued function which I keep with me at all times. For the sake of brevity I’ve stripped out the checking and error handling, but the function looks like this.

create function Utility.SplitListXML
(
 @str as varchar(100),
 @delimiter as varchar(10)
)
RETURNS @splitlist table
(
 RowID int identity(1,1),
 Item varchar(max)
)
BEGIN

DECLARE @xml as xml

SET @xml = cast(('<X>'+replace(@str,@delimiter ,'</X><X>')+'</X>') as xml)

insert into @splitlist
SELECT N.value('.', 'varchar(10)') as value 
FROM @xml.nodes('X') as T(N)

RETURN

END

The previous version of this function used a cursor to loop through the data, but it was fairly unwieldy and it never felt comfortable. One day I found this XML technique online and I adapted my function accordingly. It really is a useful technique to have in your locker.

So we have a “Split List” function, but we now have to use it on every row. So we use an cross apply.

select id,
       rowid,
       cast(item as int) item
from @table
cross apply Utility.SplitListXML(str,char(9))

-- char(9) is the ASCII character for TAB

Anyway, back to the problem at hand. It now becomes very simple to see that we can find the MIN and MAX values per RowNumber with a simple query, and then sum them up by using a windowing function.

select id,
       sum(max(item) - min(item)) over () as Total
from data
group by id

Pretty straightforward, isn’t it?!

The second part of the challenge required you to find the pair of values on the row where one value was a divisor of the other, and summing the result of that division for each row. Thankfully the data didn’t require any more wrangling, and I could query it in the format I already had it.

My method for finding the pair, was to join each value on a row to each other value on a row, and find the one value where the modulus was 0. I then added the result of the division and voila! Be wary of this though, creating a near-cartesian join can seriously hinder performance, especially on large data sets.

select sum(a.ColValue / b.ColValue)
from unp a
inner join unp b
on a.RowID = b.RowID
and a.ColumnName <> b.ColumnName
where a.ColValue % b.ColValue = 0

Right, 2 days down, 23 to go. Dunno if I’ll be able to blog every day, but I’m going to give it a go.

Until tomorrow.

One comment

Leave a Reply

Your email address will not be published. Required fields are marked *