It’s day 4 of the Advent Of Code!
Yesterday wasn’t a good day for the set-based solution series, but today turned out to be MUCH better. In fact, I managed to finish both problems within 16 minutes of their release, and, I was one of the top 800 quickest respondents on each part of the question.
The Challenge
The challenge today was to; first split a string into a series of “words”, and then check whether or not there were duplicates in the string. A string with duplicates is viewed as invalid.
In part one, we had to count the number of valid strings in our input.
The solution (download)
As I mentioned on day 2, I have a utility function which is designed for splitting strings. So I used that function to split each string into a series of words.
DECLARE @xml as xml SET @xml = cast(('<X>'+replace(@str,@delimiter ,'</X><X>')+'</X>') as xml) SELECT N.value('.', 'varchar(10)') as value FROM @xml.nodes('X') as T(N)
After that, it was a simple grouping query, filtered by comparing the total number of words in a row count(words) with the distinct number of words in a row count(distinct words).
select a.RowID from @t a cross apply [Utility].[SplitListXML](a,' ') group by a.RowID having count(item) = count(distinct Item)
So, 3 minutes and 28 seconds after the challenge started, I had a solution for part one, and I was ready for part 2. I was also quite optimistic that it would fit the first part of the solution better than yesterday.
And I was right to be optimistic.
The second part of the challenge required that words which were anagrams of each other also “invalidate” the entire string.
I already had the words split out, now I just had to find a way to compare anagrams… So I used my tally table and the substring function to split the words into individual letters, similarly to the technique I used on day 1.
select a.RowID as RowNumber, b.RowID as WordNumber, t.Number as LIndex, substring(b.Item,Number,1) as Item from @t a cross apply Utility.Utility.SplitListXML(a,' ') b inner join Utility.dbo.TallyTable t on Number between 1 and len(b.Item)
This gives me an individual table of the letters in the words. To check for anagrams, I decided to reform each word but with the letters in alphabetical order.
Sorting the letters is easy. Reforming the words… that took me a minute or two to figure out.
And then I remembered. The technique of using FOR XML PATH to split a string works just as well to CONCATENATE a string.
You can actually play around with adding delimiters and all sorts (see this blog post), but for this question, we just had to make a concatenate a bunch of letters with no frills.
SELECT o.RowNumber, o.WordNumber, ( SELECT Item + '' AS [text()] FROM @tmp [i] where [i].RowNumber = [o].[RowNumber] and [i].WordNumber = [o].[WordNumber] order by Item FOR XML PATH('') ) as newword from @tmp [o] group by o.RowNumber, o.WordNumber
The important part of this solution is the order by clause inside the subquery. This ensures that our output is in beautifully alphabetised .
Once I have a table of newly reformed words, I can use the exact same query as in part one, and that’s it, another day done!
Once again, thanks to @ericwastl for the concept and all the work that goes into the challenges. Really enjoying it!
One comment