Posted by
ads' corner
on
Monday, 2020-12-07 Posted in [Sql]
They really trust you a lot … Now all flights are delayed, that’s quite usual in advent times. Except it’s 2020 and no one is traveling anyway. But they also impose more regulations, and now bags need to have colours, and bags need to go into other bags. And someone has to figure it out. That’s right, it’s you! I hope the stopover is long enough to at least grab some ice cream!
Task 1: There is a huge list of rules which colores bag can hold how many other colored bags. You have a shiny golden bag, but apparently you are not allowed to carry it around, it has to go into another bag. Find out which ones are allowed, and how many of them. Oh, and bags can go into bags, which means: recursion.
Task 2: As usual someone made a mistake. You are, after all, allowed to carry your shiny golden bag, but now you also have to have other bags in it. Calculate the possibilities, based on the rules. Yes, recursion again, and tree climbing.
And if your head is swirling: so is mine. Why on North Pole Earth does anyone even remotely care about your shiny golden bag?
Preparation
Let’s start with data loading. The method of just loading the raw data into a single TEXT field becomes quite useful. So far all the processing can be done in the database.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
DROPTABLEIFEXISTS day7;
CREATETABLE day7 (
id SERIALNOTNULLPRIMARYKEY,
valueTEXT
);
COPY day7 (value) FROMSTDIN;
bright olive bags contain 5 dotted white bags, 2 wavy lavender bags.
plaid brown bags contain 3 bright lime bags, 5 plaid coral bags.
dim coral bags contain 1 shiny maroon bag, 2 bright orange bags, 3 clear lavender bags.
dull olive bags contain 5 wavy purple bags, 2 plaid white bags, 4 vibrant magenta bags, 1 light brown bag.
dim yellow bags contain 1 bright cyan bag, 3 striped green bags, 5 dim white bags....\.
I haveĀ 594 rows in the table:
1
2
3
4
5
SELECT COUNT(*) FROM day7;
count
-------
594
(1row)
This time this means it’s 594 rules. Why on earth are there 594 rules to follow for the luggage? From here it gets a bit more complicated. First I want to verify that all rules follow the same pattern:
1
2
3
4
5
6
SELECT (SELECT COUNT(*) FROM day7 WHEREvalueLIKE'% bags contain %')
- (SELECT COUNT(*) FROM day7) AS diff;
diff
------
0
(1row)
Good, at least all rules follow the bags contain pattern, and I don’t need to care about different formats. Then the format is color name of bag, followed by bags contain, followed by one or multiple rules. That’s easy to split with a regexp.
1
2
3
4
5
6
7
8
9
10
WITHRECURSIVE _day7_split_data(split) AS (
SELECT REGEXP_MATCHES(value, '^([\w ]+) bags contain (.*)')
FROM day7
),
day7_split_data AS (
SELECT split[1] AS c1,
split[2] AS c2
FROM _day7_split_data
)
SELECT*FROM day7_split_data;
This query uses two CTE, although I could have wrapped this into one little bit more complicated one. However this way it is more descriptive what the query is doing. The first part _day7_split_data just splits by left and right from bags contain. The second part day7_split_data (no underscore) selects the array fields into columns. This results in a data set which looks like:
1
2
3
4
5
6
7
8
9
c1 | c2
---------------------+-------------------------------------------------------------------------------------------------
bright olive |5 dotted white bags, 2 wavy lavender bags.
plaid brown |3 bright lime bags, 5 plaid coral bags.
dim coral |1 shiny maroon bag, 2 bright orange bags, 3 clear lavender bags.
dull olive |5 wavy purple bags, 2 plaid white bags, 4 vibrant magenta bags, 1 light brown bag.
dim yellow |1 bright cyan bag, 3 striped green bags, 5 dim white bags.
drab chartreuse |3 shiny lime bags, 2 bright indigo bags, 2 muted yellow bags, 5 dim tan bags.
dim cyan |4 drab tomato bags, 3 dotted teal bags, 1 posh purple bag, 2 faded brown bags.
c1 is the bag colour, c2 is the rule(s). Let’s run another check:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
WITHRECURSIVE _day7_split_data(split) AS (
SELECT REGEXP_MATCHES(value, '^([\w ]+) bags contain (.*)')
FROM day7
),
day7_split_data AS (
SELECT split[1] AS c1,
split[2] AS c2
FROM _day7_split_data
)
SELECT (SELECT COUNT(DISTINCT(c1)) FROM day7_split_data)
- (SELECT COUNT(*) FROM day7) AS diff;
diff
------
0
(1row)
The number of distinct bag colours is equal the number of rules. This means there rules do not contain multiple lines for a single colour.
Next step is breaking up the rule(s) into multiple lines, one per rule. A “regexp_split_to_table()” is quite useful to get one line per rule:
WITHRECURSIVE _day7_split_data(split) AS (
SELECT REGEXP_MATCHES(value, '^([\w ]+) bags contain (.*)')
FROM day7
),
day7_split_data AS (
SELECT split[1] AS c1,
split[2] AS c2
FROM _day7_split_data
),
day7_splitup AS (
SELECT c1 wherein,
split[2] AS what,
CAST(split[1] ASINT4) AS howmany
FROM day7_split_data,
LATERAL (
SELECT regexp_match(rstt, '^(\d+) ([\w ]+) bag[s]?\.?$') split
FROM regexp_split_to_table(c2, ', ') rstt
) split
)
SELECT*FROM day7_splitup;
This query returns the following data:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
wherein | what
---------------------+--------------------
bright olive | dotted white
bright olive | wavy lavender
plaid brown | bright lime
plaid brown | plaid coral
dim coral | shiny maroon
dim coral | bright orange
dim coral | clear lavender
dull olive | wavy purple
dull olive | plaid white
dull olive | vibrant magenta
dull olive | light brown
dim yellow | bright cyan
dim yellow | striped green
dim yellow | dim white
Task 1
To find out how many rules apply for which colored bag, and how many bags can be in that bag, it needs a recursive tree over the rules. This works by joining day7_build_tree with itself, and with day7_splitup.
WITHRECURSIVE _day7_split_data(split) AS (
SELECT REGEXP_MATCHES(value, '^([\w ]+) bags contain (.*)')
FROM day7
),
day7_split_data AS (
SELECT split[1] AS c1,
split[2] AS c2
FROM _day7_split_data
),
day7_splitup AS (
SELECT c1 wherein,
split[2] AS what,
CAST(split[1] ASINT4) AS howmany
FROM day7_split_data,
LATERAL (
SELECT regexp_match(rstt, '^(\d+) ([\w ]+) bag[s]?\.?$') split
FROM regexp_split_to_table(c2, ', ') rstt
) split
),
day7_build_tree AS (
SELECT s.wherein AS wherein,
s.what AS what
FROM day7_splitup s
UNIONSELECT s.wherein AS wherein,
bt.what AS what
FROM day7_build_tree bt,
day7_splitup s
WHERE bt.wherein = s.what
)
SELECT*FROM day7_build_tree;
From here the answer for the first task is to find out how manydistinctwherein values are in the result set for the golden shiny bag:
WITHRECURSIVE _day7_split_data(split) AS (
SELECT REGEXP_MATCHES(value, '^([\w ]+) bags contain (.*)')
FROM day7
),
day7_split_data AS (
SELECT split[1] AS c1,
split[2] AS c2
FROM _day7_split_data
),
day7_splitup AS (
SELECT c1 wherein,
split[2] AS what,
CAST(split[1] ASINT4) AS howmany
FROM day7_split_data,
LATERAL (
SELECT regexp_match(rstt, '^(\d+) ([\w ]+) bag[s]?\.?$') split
FROM regexp_split_to_table(c2, ', ') rstt
) split
),
day7_build_tree AS (
SELECT s.wherein AS wherein,
s.what AS what
FROM day7_splitup s
UNIONSELECT s.wherein AS wherein,
bt.what AS what
FROM day7_build_tree bt,
day7_splitup s
WHERE bt.wherein = s.what
)
SELECT COUNT(DISTINCT(wherein)) AS count
FROM day7_build_tree
WHERE what ='shiny gold';
The result is:
1
2
3
4
count
-------
235
(1row)
235 different color combinations where I can hide the golden cow bag … Good luck enforcing that.
Task 2
Task 2 is only slightly different, as the calculation climbs the same recursive tree, but now since the golden bag can contain any of the other bags, we need the product of those. This is calculated with:
1
CAST(bt.howmany ASINT4) *CAST(s.howmany ASINT4) AS howmany
in day7_build_tree. Much to my surprise this ended up in an error message:
ERROR: integer out of range
Hopefully airport security got a good calculator to figure out the endless bag fashion options. To make the calcuation fit I changed the initial [INT4](https://www.postgresql.org/docs/current/datatype-numeric.html) into an INT8.
WITHRECURSIVE _day7_split_data(split) AS (
SELECT REGEXP_MATCHES(value, '^([\w ]+) bags contain (.*)')
FROM day7
),
day7_split_data AS (
SELECT split[1] AS c1,
split[2] AS c2
FROM _day7_split_data
),
day7_splitup AS (
SELECT c1 wherein,
split[2] AS what,
CAST(split[1] ASINT4) AS howmany
FROM day7_split_data,
LATERAL (
SELECT regexp_match(rstt, '^(\d+) ([\w ]+) bag[s]?\.?$') split
FROM regexp_split_to_table(c2, ', ') rstt
) split
),
day7_build_tree AS (
SELECT s.wherein AS wherein,
s.what AS what,
CAST(s.howmany ASINT8) AS howmany
FROM day7_splitup s
UNIONALLSELECT s.wherein AS wherein,
bt.what AS what,
CAST(bt.howmany ASINT8) *CAST(s.howmany ASINT8) AS howmany
FROM day7_build_tree bt,
day7_splitup s
WHERE bt.wherein = s.what
)
SELECT*FROM day7_build_tree;
After all this producesĀ 787602 fashion rows. All I need is the sum() of howmany for the golden shiny bag:
WITHRECURSIVE _day7_split_data(split) AS (
SELECT REGEXP_MATCHES(value, '^([\w ]+) bags contain (.*)')
FROM day7
),
day7_split_data AS (
SELECT split[1] AS c1,
split[2] AS c2
FROM _day7_split_data
),
day7_splitup AS (
SELECT c1 wherein,
split[2] AS what,
CAST(split[1] ASINT4) AS howmany
FROM day7_split_data,
LATERAL (
SELECT regexp_match(rstt, '^(\d+) ([\w ]+) bag[s]?\.?$') split
FROM regexp_split_to_table(c2, ', ') rstt
) split
),
day7_build_tree AS (
SELECT s.wherein AS wherein,
s.what AS what,
CAST(s.howmany ASINT8) AS howmany
FROM day7_splitup s
UNIONALLSELECT s.wherein AS wherein,
bt.what AS what,
CAST(bt.howmany ASINT8) *CAST(s.howmany ASINT8) AS howmany
FROM day7_build_tree bt,
day7_splitup s
WHERE bt.wherein = s.what
)
SELECT SUM(howmany) AS count
FROM day7_build_tree
WHERE wherein ='shiny gold';
Oh, and we are looking at wherein now, because we want to know what can be in the bag, no longer where the bag can be inside.