Project Euler on a Microcontroller

victor82 · on July 29, 2022

You can avoid using abstractions and write it directly on the silicon, it's not really difficult, it turns into a bunch of muxes, registers and adders (as shown in the link below) and solves the problem in just 333 clock cycles, using a minimum of power

https://gist.github.com/vmunoz82/49de4c63bee1768283162ec5406...

  class Euler(Elaboratable):
    def __init__(self):
        self.output = Signal(19)

    def elaborate(self, platform):
        m = Module()

        count5 = Signal(3)
        c3, c5 = Signal(17), Signal(18)

        cond3, cond5 = (c3 < 1000) & (count5 != 0), (c5 < 1000) & (count5 < 3)

        m.d.sync += count5.eq(Mux(count5 == 4, 0, count5+1))
        m.d.sync += [
            c3.eq(c3+3),
            c5.eq(Mux(cond5, c5+5, c5))
        ]
        m.d.sync += self.output.eq(self.output +
                                   Mux(cond3, c3, 0)+Mux(cond5, c5, 0))
        return m

stefanpie · on July 29, 2022

Amaranth (previously nmigen) is super awesome declarative ways to build circuits. I like to think of it as a superior verilog alternative that lets you do RTL design in Python. Being able to use python constructs to dynamically build logic like in the example is super useful. You can integrate it with other things like scikit-learn, numpy, and script to build things like filters and hardware accelerators.

shawnnapora · on July 29, 2022

This is pretty neat! Is there a way to have arbitrary input for the number of cycles and actually emit the answer? It looks like the answer is being read in software from a register, if I understand the .vcd output correctly.

victor82 · on July 30, 2022

Well, that code is module without any context (ie: connected to nothig), it just have 2 inputs (clk and rst), and an "output" with the current sum, after it reach the desired value it keep stuck there (until reset).

So as the simulation code shows, it just tick the clock (with the "yield" statement) and read the "output" register and print it.

Of course you could connect this "Project Euler 1 accelerator" to a CPU, a SoC or whatever, or just connect LEDs directly to "output"

rowanG077 · on July 30, 2022

I'm not sure if I think this superiour to Verilog. It's way less readable but it does have the advantage of having insane metaprogramming ability. It's basically a circuit templating language. Really sucks the syntax is so unreadable.

bjourne · on July 29, 2022

Pretty cool. Maybe there are some pop count/bit-twiddling tricks for checking divisibility with constants? Like this: https://stackoverflow.com/a/39386483/189247

Jtsummers · on July 29, 2022

The program bypasses the need to actually check the particular number against 3 and 5 using any kind of arithmetic instructions, instead it's keeping separate values for 3-count and 5-count which will be either 3 or 5 when the number is a multiple of either. It then performs a clear on it. So it has this state when it gets to the checks:

  n    3-count   5-count
  1    1         1
  2    2         2
  3    3         3
  4    1         4
  ...

So the check is just a cpi (compare immediate) instruction for whether the particular counter is 3 or 5. Which is probably faster than bit twiddling, especially since they don't have a pop count instruction available in the instruction set. On an instruction set with that, I imagine it would be a useful trick though.

bjourne · on July 29, 2022

The point of bit-twiddling is to produce a number n that is 1 or -1 if counter is divisible by 3 or 5 and 0 otherwise, and then add either mul(n, counter) or and(n, counter) to the result. It avoids branches which could shave off a few bytes depending on how branch instructions are encoded. Since we're golfing, we're not interested in speed, only program size.

Jtsummers · on July 29, 2022

Still not an option on the platform in the article, though. You'd have to make your own pop count and implement it likely with a loop and bit shifting. It's almost certainly going to take more instructions than this and end up with as many branches as a result. The instruction set is very limited. Though quoting myself:

>> On an instruction set with that [pop count], I imagine it would be a useful trick though.

Prcmaker · on July 29, 2022

I learned real Matlab using project Euler in my first year of uni. I'd been taught it previously as 'just a calculator', but working through the problems opened up a world of skills I knew nothing about. Those learnings pushed me in to microcontrollers and got me thinking outside the box in a way I still use regularly. I can't recommend it highly enough.

kcexn · on July 30, 2022

MATLAB is such a peculiar language. Using it often reminds me of using Prolog, although MATLAB does a good job of hiding its esoteric nature.

Some of my early post-grad work was done in MATLAB and I remember having to find ways to collapse loops into matrix operations so that MATLAB could finish executing the programs in a reasonable amount of time.

I actually think somewhere in the MATLAB documentation it explains that all loop constructs in MATLAB are compiled into matrix operations by the interpreter before MATLAB executed anything.

Which is why MATLAB is so often orders of magnitude slower than Python even though they're supposed to be able to solve similar problems. The idiomatic MATLAB way of writing programs just requires a very matrix-y view of the world.

Prcmaker · on July 31, 2022

I rarely found the bottleneck of Matlabs inefficiency to be a major issue with my work, and found the speed to get analysis started and finished a great benefit. I've used a lot of more 'real' programming languages when the need has arisen, but for day to day quickly crunching numbers, matlab (or octave, now) has been indispensable. I'd love to have the time to develop a good basis for numerical analysis in python, but the learning curve is a hard sell for myself when I've got a tool I'm comfortable with.

cozzyd · on July 30, 2022

I mean numpy is basically MATLAB for Python...

Evidlo · on July 29, 2022

Alternatively you could compile to/write for a virtual machine.

shawnnapora · on July 29, 2022

A virtual machine is interesting to me when it corresponds to a hardware target, since then that isn't hiding complexity in software. I suppose, though, that an enterprising individual could make a custom hardware target that implements an overly complex, overspecific instruction set and win at hardware golf but I'd honestly be very impressed.

The positive part about virtual machines in general would be the ease of debugging and sharing/validating answers, though.

softbuilder · on July 29, 2022

I am reminded of Txtzyme: https://github.com/WardCunningham/Txtzyme

dmitrygr · on July 29, 2022

134 bytes is a lot

shawnnapora · on July 29, 2022

A lot relative to what? :) These 134 bytes will emit an answer without requiring a kernel or a runtime, either of which generally will be orders of magnitude larger than 134 bytes alone.

dmonitor · on July 29, 2022

If we keep down this path, we will some day have a code-golf optimized CPU architecture.

Used transistor count as the blog suggests is clearly the best metric

dmitrygr · on July 29, 2022

i do a lot of uC work, i meant that for this task on that uC, it is a lot of bytes, less than half that many can easily accomplish this task. possibly less, if i could be bothered to spend time

cinntaile · on July 30, 2022

Someone demonstrates something he accomplished on a microcontroller and your reply is "134 bytes is a lot". Ok that can be the case, but the poster mentions this in the article that it's not very optimized. It would probably be better if you framed your comment like "Cool project, if you're interested you can bring down the number of bytes by doing this (insert code sample)"