The Run-Length of Matchsticks (PWC 296)

Challenge 296 solutions in Perl by Matthias Muth

These are my Challenge 296 Task 1 and 2 solutions in Perl
for The Weekly Challenge – Perl and Raku.

Summary

Task 1: String Compression
Detecting runs of same characters? Regular expression!
Replacing it by its run-length encoding? In the same statement!
Reversing the compression as a BONUS? Even simpler!

Task 2: Matchstick Square
A short solution based on a CPAN module works well for the examples and is short, but it doesn’t really scale for larger sets of matchsticks.
In my own implementation, a ‘target sum’ iterator selects groups of matchsticks that have the needed length of one side of the square, leaving the rest of matchsticks for two more calls for the second and third side.
Blindingly fast, even for larger sets (try with 15, 20 or 25 matchsticks, for example!).

Code:
Find the complete source code for both tasks, including more tests, on Github.

Task 1: String Compression

You are given a string of alphabetic characters, $chars.
Write a script to compress the string with run-length encoding, as shown in the examples.
A compressed unit can be either a single character or a count followed by a character.
BONUS: Write a decompression function.

Example 1
Input: $chars = "abbc"
Output: "a2bc"
Example 2
Input: $chars = "aaabccc"
Output: "3ab3c"
Example 3
Input: $chars = "abcc"
Output: "ab2c"

This is an easy task for Perl’s built-in regular expressions:

We capture any character as our first character of the run. Then we use a backreference to that captured first character to accept only the same character again, 1 or more times.

For replacing that sequence of characters by their run-length-encoded form, we use a s/PATTERN/REPLACEMENT/ substitution.
We add three options:

/e
causes the REPLACEMENT to be evaluated as an expression. In our case, this will be a string with the run-length-encoded form of the matched sequence.
/g
for doing a global substitution, for all sequences we can find.
/r
to return the result string instead of the number of substitutions done.

All of that is just one statement:

use v5.36;

sub string_compression( $chars ) {
    return $chars =~ s/(.)g1+/ length( $& ) . $1 /egr;
}

The BONUS implementation is even easier, because we don’t need any backreferences. We simply capture an integer number and the (non-number) character that follows it, and replace it by the character repeated times. The Perl x operator is probably the easiest and best way to do that:

sub string_uncompress( $chars ) {
    return $chars =~ s/(d+)(D)/ $2 x $1 /egr;
}

Task 2: Matchstick Square

You are given an array of integers, @ints.
Write a script to find if it is possible to make one square using the sticks as in the given array @ints where $ints[ì] is the length of ith stick.

Example 1
Input: @ints = (1, 2, 2, 2, 1)
Output: true
Top: $ints[1] = 2
Bottom: $ints[2] = 2
Left: $ints[3] = 2
Right: $ints[0] and $ints[4] = 2
Example 2
Input: @ints = (2, 2, 2, 4)
Output: false
Example 3
Input: @ints = (2, 2, 2, 2, 4)
Output: false
Example 4
Input: @ints = (3, 4, 1, 4, 3, 1)
Output: true

The simple solution reaches the limits too soon

My first, simple solution uses the CPAN Algorithm::Combinatorics module. It contains a partitions function, which generates all possible subsets of a given set of numbers (our matchsticks). It can be parameterized to return exactly four subsets (for the four sides of our square).

use v5.36;
use builtin qw( true false );
use List::Util qw( sum all );
use Algorithm::Combinatorics qw( partitions );

sub matchstick_square_AC( @ints ) {
    my $total = sum( @ints );
    return false
        unless $total % 4 == 0;
    my $side_length = $total / 4;
    for my $p ( partitions( @ints, 4 ) ) {
        my @sums = map sum( $_->@* ), $p->@*;
        return true
            if all { $_ == $side_length } @sums;
    }
    return false;
}

This works for the examples, and the solution is short.

But generating all possible sets, we have to throw most of them away because already the first one doesn’t give us the needed side length of the square.
For bigger numbers of matchsticks, the memory for keeping all the partitions gets really high.
For 15 matchsticks, on my server the process runs out of memory already.

The partitions function can also return an iterator (when used in scalar context), so that we don’t need the memory to store all possible partitions at the same time.
For the same number of matchsticks, the memory footprint then stays low, but the run time is at around four minutes already.

This is an easy solution, but it doesn’t make my programmer’s heart happy.

My own ‘target sum’ iterator

I therefore did a second solution, based on these thoughts:

The sum of the matchstick lengths has to be divisible by 4. If not, it will be impossible to distribute them to four equal sides.
The length of one side then is that sum divided by four.
For the first side of the square, we select groups of matchsticks that have the correct length for that side. This is much less complex than finding all combinations of four groups that all have the correct length.
It removes a lot of useless tries much earlier.
For the second and third side, we do the same, based on the matchsticks remaining from the previous step.
For the fourth side, the remaining matchsticks automatically sum up to the right length.

So this solution is based on finding groups of numbers that sum up exactly to a given target number (our side length).

We can implement a function to do that any way we want. I will be using a simple tree traversal algorithm.

My implementation will be an iterator, in order to avoid memory problems for storing all the combinations (even there will be much less combinations than in the previous solution!).

The skeleton of that function and its usage can look like this:

sub get_exactly( $target_sum, $ints ) {
    # Setup the closure data.
    ...;
    # Return the iterator function.
    return sub() {
        ...;
        return ( $used, $rest );    # array_refs
    }
}

my $side_length = sum( $ints->@* );
my $iterator = get_exactly( $side_length, $ints );
while ( my ( $used, $rest ) = $iterator->() ) {
    # Here, the sum of the numbers in $used->@* equal $side_length,
    # and $rest->@* are the remaining (unused) numbers.
    ...;
}

That makes this main function possible for solving the task:

use v5.36;
use builtin qw( true false );

sub matchstick_square( @ints ) {

    # The total sum of the matchstick lengths must be divisible by 4.
    my $total_sum = sum( @ints );
    if ( $total_sum % 4 != 0 ) {
        vsay "total sum( $total ) is not divisible by 4";
        return false;
    }
    my $side_length = $total_sum / 4;

    # Get a combination for the first side of the square.
    my $iterator_1 = get_exactly( $side_length, @ints, );
    while ( my ( $used_1, $rest_1 ) = $iterator_1->() ) {
        # We got a combination for the first side.
        # Use the remaining matchsticks to get a combination
        # for the second side.
        my $iterator_2 = get_exactly( $side_length, $rest_1 );
        while ( my ( $used_2, $rest_2 ) = $iterator_2->() ) {
            # ... and the same for the third side.
            my $iterator_3 = get_exactly( $side_length, $rest_2 );
            while ( my ( $used_3, $rest_3 ) = $iterator_3->() ) {
                # Here, we have found combinations for three sides.
                # The remaining matchsticks automatically sum up to
                # the correct length of the fourth side.
                return true;
            }
        }
    }
    # No combination found.
    return false;
}

Now it still remains to implement the get_exactly iterator.

As I said before, I am using a tree traversal algorithm. In fact, it is a ‘breadth first search’ (BFS). It is based on a queue of possible continuations, that are processed within a loop until the queue runs empty. Every entry in the queue carries some context data, to be able to process that queue entry when it its turn. The context consists of:

    ( $used, $sum, $rest, $next_index )

In the loop, if the end condition is met ($sum == $target_sum), we return that combination (the $used and $rest array_refs).
If not, we loop through the remaining numbers in the array that $rest references, starting from the number at $next_index (to not repeat number that we have already tried). We add a candidate combination to the queue that includes that number, by creating a new context for it. If a number would exceed the target sum, we just skip it.

That means that all candidate combination in the queue are either smaller than or exactly equal to the target sum.

So here it is:

sub get_exactly( $target_sum, $ints ) {
    # Initialize the closure data for the iterator
    # ( $used, $sum, $rest, $next_index ).
    my @queue = ( [ [], 0, $ints, 0 ] );

    # Return the iterator function.
    return sub() {
        # Check whether we have reached the end before.
        return ()
            unless @queue;

        # Find the next matching combination.
        while ( @queue ) {
            my ( $used, $sum, $rest, $next_index ) = ( shift @queue )->@*;
            # dsay ":queue", pp ( $used, $sum, $rest, $start_index );

            # Check the success criteria and return if it is met.
            return ( $used, $rest )
                if $sum == $target_sum;

            # If not, add more candidates.
            for my $index ( $next_index .. $rest->$#* ) {
                my $value = $rest->[$index];
                my $new_used = [ $used->@*, $value ];
                my $new_sum = $sum + $value;
                my $new_rest = [
                    $rest->@[ 0 .. $index - 1 ],
                    $rest->@[ $index + 1 .. $rest->$#* ],
                ];
                push @queue, [ $new_used, $new_sum, $new_rest, $index ]
                    if $new_sum <= $target_sum;
            }
        }
        # End of the list.
        return ();
    };
}

For testing, I create random lists of 15 or 20 or 25 numbers (choosing quite big numbers, so that they don’t combine well, and there will be no solution).

The first solution worked for around 4 minutes for a set of 15 matchsticks. I didn’t try more then.

My ‘target sum’ solution does that in less than 0.1 seconds!
It takes less than 3 seconds for a set of 20, and less than 5 seconds for 25.

Now that makes me happier…!

Thank you for the challenge!

Source link
lol