Advent of Code 2020: "Password Philosophy" - Day 2

Posted by ads' corner on Wednesday, 2020-12-02
Posted in [Sql]

Day 2, another challenge: fix a broken password database. In order to do that, the passwords which violate the policy must be identified.

Task 1: I get a string, consisting of 2 numbers, a letter, and a password string. Check how many passwords have count(letter) which is between number 1 and number 2.

Task 2: The numbers are positions in the password string (beginning by 1, not 0). Exactly one of the two letters in the string must match the letter.

Preparation

Let’s start by creating a table in PostgreSQL, and load the data into it. Since the input data comes as a single string, I decided to take this challenge and not pre-process the data. Hence everything goes into a single TEXT field. I also briefly checked the two numbers, if the first one is always smaller than the second one - that seems to be the case.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
DROP TABLE IF EXISTS day2;

CREATE TABLE day2 (
    value   TEXT
);

COPY day2 FROM STDIN;
1-14 b: bbbbbbbbbbbbbbbbbbb
3-14 v: vvpvvvmvvvvvvvv
2-5 m: mfvxmmm
15-20 z: zdzzzrjzzzdpzzdzzzzz
6-8 g: tggjggggrg
2-3 l: nlllw
1-5 j: jjjjj
4-5 t: prttt
2-4 v: vvrqzp
4-6 v: vvvvvqvvv
7-8 d: mpntdwkd
...
\.

This time I have 1000 lines in the table:

1
2
3
4
5
SELECT COUNT(*) FROM day2;
 count
-------
  1000
(1 row)

Since the data is a single string, I need to split, slice and dice this into pieces. Like yesterday I’m using a CTE for this which breaks the string into the 4 separate fields:

1
2
3
4
5
6
7
8
9
WITH l1 AS (
    SELECT SUBSTRING(value, '^([0-9]+)')::INT AS num1,
           SUBSTRING(value, '^[0-9]+-([0-9]+)')::INT AS num2,
           SUBSTRING(value, '.*? ([a-z]):') AS letter,
           SUBSTRING(value, '.*? [a-z]: (.*)$') AS password,
           value
      FROM day2
)
SELECT * FROM l1;

The result is 4 columns which I can access:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
 num1 | num2 | letter |       password       |             value
------+------+--------+----------------------+-------------------------------
    1 |   14 | b      | bbbbbbbbbbbbbbbbbbb  | 1-14 b: bbbbbbbbbbbbbbbbbbb
    3 |   14 | v      | vvpvvvmvvvvvvvv      | 3-14 v: vvpvvvmvvvvvvvv
    2 |    5 | m      | mfvxmmm              | 2-5 m: mfvxmmm
   15 |   20 | z      | zdzzzrjzzzdpzzdzzzzz | 15-20 z: zdzzzrjzzzdpzzdzzzzz
    6 |    8 | g      | tggjggggrg           | 6-8 g: tggjggggrg
    2 |    3 | l      | nlllw                | 2-3 l: nlllw
    1 |    5 | j      | jjjjj                | 1-5 j: jjjjj
    4 |    5 | t      | prttt                | 4-5 t: prttt
    2 |    4 | v      | vvrqzp               | 2-4 v: vvrqzp
    4 |    6 | v      | vvvvvqvvv            | 4-6 v: vvvvvqvvv
    7 |    8 | d      | mpntdwkd             | 7-8 d: mpntdwkd

Task 1

From here I have a couple different options. I could move the string manipulations which are necessary to find the passwords into the CTE. That will make the overall query easier, as I get the manipulated string as a column. Otherwise if I do that in the “main” query, I have to repeat the manipulation in the WHERE clause (at least if I also want to see the extracted data). I decided to keep things separated, leave the CTE as it is and do the heavy lifting in the outer query.

The task is to identify the number of letters in the password string matching the one letter I have in the task. I solve this by replacing all other characters with an empty string, just leaving this one letter. And since the letter is not static, but changes per row, I dynamically build the string inside REGEXP_REPLACE:

1
REGEXP_REPLACE(l1.password, '[^' || letter || ']', '', 'g')

This leaves a string with only the letters, which I then use to count the LENGTH.

The full query:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
WITH l1 AS (
    SELECT SUBSTRING(value, '^([0-9]+)')::INT AS num1,
           SUBSTRING(value, '^[0-9]+-([0-9]+)')::INT AS num2,
           SUBSTRING(value, '.*? ([a-z]):') AS letter,
           SUBSTRING(value, '.*? [a-z]: (.*)$') AS password,
           value
      FROM day2
)
SELECT *,
       REGEXP_REPLACE(l1.password, '[^' || letter || ']', '', 'g') AS found,
       LENGTH(REGEXP_REPLACE(l1.password, '[^' || letter || ']', '', 'g')) AS length
  FROM l1
 WHERE LENGTH(REGEXP_REPLACE(l1.password, '[^' || letter || ']', '', 'g')) >= l1.num1
   AND LENGTH(REGEXP_REPLACE(l1.password, '[^' || letter || ']', '', 'g')) <= l1.num2;

The resulting data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
 num1 | num2 | letter |       password       |             value             |      found      | length
------+------+--------+----------------------+-------------------------------+-----------------+--------
    3 |   14 | v      | vvpvvvmvvvvvvvv      | 3-14 v: vvpvvvmvvvvvvvv       | vvvvvvvvvvvvv   |     13
    2 |    5 | m      | mfvxmmm              | 2-5 m: mfvxmmm                | mmmm            |      4
    6 |    8 | g      | tggjggggrg           | 6-8 g: tggjggggrg             | ggggggg         |      7
    2 |    3 | l      | nlllw                | 2-3 l: nlllw                  | lll             |      3
    1 |    5 | j      | jjjjj                | 1-5 j: jjjjj                  | jjjjj           |      5
    2 |    4 | v      | vvrqzp               | 2-4 v: vvrqzp                 | vv              |      2
    2 |   12 | w      | jwkfktkbthcwvfrkwgz  | 2-12 w: jwkfktkbthcwvfrkwgz   | www             |      3
   12 |   16 | j      | jjjjpjjmdhjjdjjjjjjj | 12-16 j: jjjjpjjmdhjjdjjjjjjj | jjjjjjjjjjjjjjj |     15
   10 |   11 | f      | fffffffffff          | 10-11 f: fffffffffff          | fffffffffff     |     11
    5 |    7 | m      | kmmgxgmnnpzrzmgsbm   | 5-7 m: kmmgxgmnnpzrzmgsbm     | mmmmm           |      5
    8 |   16 | q      | fqtqqqqxqqzgzfqq     | 8-16 q: fqtqqqqxqqzgzfqq      | qqqqqqqqq       |      9

This is already limited to the matching passwords, as I repeated the LENGTH() into the WHERE clause. psql also tells me how many final rows I have:

1
(548 rows)

For completeness I run a count as well:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
WITH l1 AS (
    SELECT SUBSTRING(value, '^([0-9]+)')::INT AS num1,
           SUBSTRING(value, '^[0-9]+-([0-9]+)')::INT AS num2,
           SUBSTRING(value, '.*? ([a-z]):') AS letter,
           SUBSTRING(value, '.*? [a-z]: (.*)$') AS password,
           value
      FROM day2
)
SELECT COUNT(*) AS count
  FROM l1
 WHERE LENGTH(REGEXP_REPLACE(l1.password, '[^' || letter || ']', '', 'g')) >= l1.num1
   AND LENGTH(REGEXP_REPLACE(l1.password, '[^' || letter || ']', '', 'g')) <= l1.num2;

And get the same result back:

1
2
3
4
 count
-------
   548
(1 row)

Task 2

This is easier than the first task, I only have to pick two specific characters from the password, the position is known and the length for the pick is always 1. Since SQL starts SUBSTRING() with position 1 and not 0, I don’t even have to calculate a different starting position either.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
WITH l1 AS (
    SELECT SUBSTRING(value, '^([0-9]+)')::INT AS num1,
           SUBSTRING(value, '^[0-9]+-([0-9]+)')::INT AS num2,
           SUBSTRING(value, '.*? ([a-z]):') AS letter,
           SUBSTRING(value, '.*? [a-z]: (.*)$') AS password,
           value
      FROM day2
)
SELECT *,
       SUBSTRING(l1.password FROM l1.num1 FOR 1) AS c1,
       SUBSTRING(l1.password FROM l1.num2 FOR 1) AS c2
  FROM l1;

The resulting data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
 num1 | num2 | letter |       password       |             value             | c1 | c2
------+------+--------+----------------------+-------------------------------+----+----
    1 |   14 | b      | bbbbbbbbbbbbbbbbbbb  | 1-14 b: bbbbbbbbbbbbbbbbbbb   | b  | b
    3 |   14 | v      | vvpvvvmvvvvvvvv      | 3-14 v: vvpvvvmvvvvvvvv       | p  | v
    2 |    5 | m      | mfvxmmm              | 2-5 m: mfvxmmm                | f  | m
   15 |   20 | z      | zdzzzrjzzzdpzzdzzzzz | 15-20 z: zdzzzrjzzzdpzzdzzzzz | d  | z
    6 |    8 | g      | tggjggggrg           | 6-8 g: tggjggggrg             | g  | g
    2 |    3 | l      | nlllw                | 2-3 l: nlllw                  | l  | l
    1 |    5 | j      | jjjjj                | 1-5 j: jjjjj                  | j  | j
    4 |    5 | t      | prttt                | 4-5 t: prttt                  | t  | t
    2 |    4 | v      | vvrqzp               | 2-4 v: vvrqzp                 | v  | q
    4 |    6 | v      | vvvvvqvvv            | 4-6 v: vvvvvqvvv              | v  | q
    7 |    8 | d      | mpntdwkd             | 7-8 d: mpntdwkd               | k  | d

From here all I have to do is make sure that either c1 or c2 matches the letter, but not both:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
WITH l1 AS (
    SELECT SUBSTRING(value, '^([0-9]+)')::INT AS num1,
           SUBSTRING(value, '^[0-9]+-([0-9]+)')::INT AS num2,
           SUBSTRING(value, '.*? ([a-z]):') AS letter,
           SUBSTRING(value, '.*? [a-z]: (.*)$') AS password,
           value
      FROM day2
)
SELECT *
  FROM l1
 WHERE (SUBSTRING(l1.password FROM l1.num1 FOR 1) = l1.letter AND
        SUBSTRING(l1.password FROM l1.num2 FOR 1) != l1.letter) OR
       (SUBSTRING(l1.password FROM l1.num1 FOR 1) != l1.letter AND
        SUBSTRING(l1.password FROM l1.num2 FOR 1) = l1.letter);

Oh, and of course count the result set:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
WITH l1 AS (
    SELECT SUBSTRING(value, '^([0-9]+)')::INT AS num1,
           SUBSTRING(value, '^[0-9]+-([0-9]+)')::INT AS num2,
           SUBSTRING(value, '.*? ([a-z]):') AS letter,
           SUBSTRING(value, '.*? [a-z]: (.*)$') AS password,
           value
      FROM day2
)
SELECT COUNT(*) AS count
  FROM l1
 WHERE (SUBSTRING(l1.password FROM l1.num1 FOR 1) = l1.letter AND
        SUBSTRING(l1.password FROM l1.num2 FOR 1) != l1.letter) OR
       (SUBSTRING(l1.password FROM l1.num1 FOR 1) != l1.letter AND
        SUBSTRING(l1.password FROM l1.num2 FOR 1) = l1.letter);

The result:

1
2
3
4
 count
-------
   502
(1 row)

Categories: [Sql]