Tuesday, 19 August 2014

Making the most of the Pi's limits

With maximum overclock ('Turbo') I can synthesize reliably 8 notes of polyphony, with 3 or 4 of these filter-enabled. I think an 8-note polysynth with the filter on will stutter, sadly. These 8 notes (or monosynths, however you want to think of them) all have to be in 'cheap' mode (44.1kHz rendering, no oversample) or the Pi will run out of grunt, but 'cheap' mode sounds pretty good. Not as good as 'expensive' mode, but the up front aliasing minimization makes it sound sweet enough. The two recent posts with audio featured all synths in 'cheap' mode.

This 8 note limit isn't bad to be honest, because the synth has 128 rapidly-accessible presets, and for the cost of 2 audio packets worth of rendering a preset can be swapped in. So mid-song, if you issue a 'Program Change' on MIDI channel 4, synth 4 will transplant its personality for a whole new one, within just 3ms, with no interruption of audio rendering. The 2 audio packets is a block on further incoming commands, not a block on rendering. So worst case the next Note On might trigger 3ms late.

Basically you can swap synths in and out between musical phrases. This is computationally less expensive than leaving an extra synth running and hardly using it, so within a fixed polyphony gives you access to a much wider audio palette. Of course and reverb tail from the old synth will continue to sound after you swap it out, so the transitions sound effective. If you are careful with what is sounding when, you can potentially get a big, complex arrangement out of these 8 notes.

Which is nice.

Being Boiled on a Raspberry Pi - excerpt

Those of you with long memories may recollect that I had 2 sign-off tests for my little baby synthcluster. One was that it should be able to perform a 'passable' glamsynth version of Mama Weer All Crazee Now. Whatever 'passable' means in that context - that was always sort of the comedy goal, driven by my exercise music loop. 

But this - THIS - was the serious one - can PIANA running on a 'humble' (read 'feeble'!) Raspberry Pi do a reasonable emulation of one of the most iconic and classic analog synth tracks from 1978? And most important of all - hence the Slade thing - can it pass the glam rock handclap test?!?

What do you think? Listen to the voice of Buddha and judge - just 5 synths here, just 2 of them pitched, 3 percussive. Not bad is it, for £20 of hardware? No samples, all computed, made with code indeed. Synth music totally beats a plastic 3D printed bracelet ... 

Monday, 18 August 2014

Beware denormalized floats on the Pi folks

Word of warning folks. My project had what I thought were all the right settings, i.e. -Ofast -ffast-math - but still I was being absolutely hammered by a performance problem associated with denormalized floats. When injecting the Popcorn MIDI into the Pi, I print out an occasional 'last packet rendered at effective %f samples/sec', and most of the time everything was comfortable, running between 60-75k, bottoming out mid-50s. But as soon as I hit stop, performance fell through the floor - 40.5k, 30k, suddenly a remarkably horrible 2.3k samples/sec as ALSA also got annoyed with me for failing to deliver packets in a timely manner. As soon as I introduced a denorm fixer into 2 key places - the reverb unit and the stereo delay - it all sorted itself. And obviously, reverting to my previous fixed-point delay and reverb implementations also worked great.

So, buyer beware - I don't know how to set compiler flags to stop this happening in other projects, but at least I have an emergency drop-in denorm fixer, which looks like this -


static inline void denormFloat ( float *s )
{
int exp = (((int *) s)[0]) & 0x7f800000;
if (exp < 0x1000000) (*s)=0;

}

and which, if my exponent head is working correctly, has a whole power of two guard band in it before going denormalized.

p.s. this isn't a Pi 'problem' apart from the question 'are compiler settings not being honoured' - my Mac does exactly the same, so the consistent IEEE 754 implementation between Intel and ARM is to be commended. But it is a massive Pi problem if it hits you, because you have absolutely no performance headroom. On the Mac I can afford to drop 20x and will never miss a packet deadline. Se be pure, be vigilant, behave.

Sunday, 17 August 2014

RESULT!!!

And it is done.

May I present to you, one Raspberry Pi Model B, one $5 USB MIDI interface, one £20(ish) Behringer USB audio interface, 7 Virtual Analog synthesizers, 9 notes of polyphony, a bunch (4 or 5?) stereo delays, a global reverb straight out of the upcoming Jordantron, and ladies and gentlemen - Popcorn!

Recorded straight out of the phono outputs of the Behringer into my Mac, no processing, exactly the bytes emitted by the Pi. Here we go ...




No glitches - amazing. Worst-case micropacket of 64 samples rendered at 47,300 samples per second. I instrumented every 64-sample chunk, the worst one took 93% of the Pi to calculate, leaving very little room for the other 8 threads ... thank heavens for a bit of elastic buffering, eh? But the point is, there is no headroom AT ALL here. But it worked. Hell, the whole thing works - all those synths, all those delays, reverb, all on a teeny tiny Pi.

Two years on, my work here is almost done.

UPDATE : for those who care, the difference between this version and the last one is that I went back to a fixed-point reverb and fixed-point delay. The performance crash was happening - bizarrely - after all notes stopped being voiced. I could see by inspection that all the oscillators were idle, so the only thing consuming any compute was the reverb unit, and running Instruments on the Mac yielded the same result - suddenly, terrible performance after all the notes stopped. So I am thankful for consistent floating-point implementations between Intel and ARM! Floating-point was entering some bogosity via denormals and causing a performance plummet, even though I believed I had all the 'force denormals to zero' flags set. So, a quick revert to fixed-point made it all work (single compiler flag, in the makefile hoorah!) and once it was clear that this was causing the problem I dug in and I have ended up having to manually flush denormals at the input side of reverb and the delay. And now I can build with floating-point on and it still works, without the sudden drop to 20% of performance. Mighty bogus though - I have -ffast-math on and -Ofast, which according to my cursory reading around should deliver minimum checking, minimum adherence to spec and maximum performance. No such luck.

p.s. the 'Tau' platform plays this sequence with 80% of both cores free. As opposed to 7% of the single core free. It is pretty immense. And I don't need to manually flush denormals. This whole thing with denormalization / compiler settings  / performance in the toilet remains a puzzle, and I hate having makefile variations like this, but there you go.

Friday, 15 August 2014

Popcorn

This tune was the first time I ever heard a Moog. Fitting that I should try to get my Pi to play it. Almost successful ... but I'm assuming it's a momentary CPU load spike that lets it down, then RtAudio and ALSA get their knickers into a right old twist and it all goes to hell in a handbasket.


It's all in the video description, but for those who can't be bothered to click through, Logic is sequencing and recording the Pi audio output. The Pi is being fed USB MIDI and is delivering USB audio. 6 synths are active, one of them is three note polyphonic for total 8 note polyphony, and 2 other synths are configured but doing nothing (as I miscounted when I set it up!). On screen are Logic, and a terminal with 2 shells open, one to launch piana and one to run top -H to keep an eye on CPU burden. And again, no samples, everything is computed live, everything has a delay but these are turned way down, reverb is also down low, the BPCVOs are doing their alias management thing, doing wild and crazy Phase Distortion, but only the two percussive instruments have a filter turned on. 

Enjoy!

UPDATE : I'm onto the track of something quite strange here. It seems to not be ALSA's problem, nor RtMidi (although USB MIDI still eats my cycles like they are going out of fashion), there is some state being entered inside the synthesis loop that is consuming way too many cycles, and the problem only manifests on the Raspberry Pi, because it has so few cycles to spare. Testing and developing on a woefully underpowered platform can be a bloody good thing ... 

A bit more in the loop diagnostics indicates that the thing runs fine apart from a couple of sections of this song, where rendering performance suddenly plummets, even though no oscillators are sounding. Something is going subtlely wrong internally, and it may take time to find, but I'll track it down. And once I do, there will be more popcorn, without hiccups. 

Thursday, 14 August 2014

USB MIDI is still ugly on the Pi

Here's a snapshot of PIANA running, playing the usual Fox on the Run, but this time configured with a 'Plump Lady' configuration, eliminating a synth ('Eerie Noise' has gone) and making Phil Collins monophonic rather than duophonis, to give it the thing more headroom. Basically, can I get this tune to play in a recognisable way on the Pi?

The top snapshot in yellow is the steady state, bottom one in green is a couple of seconds after I power off the attached USB MIDI keyboard. It's not sending anything - no timing, no active sensing - but now it's not connected I'm consuming an entire 9% less CPU. 9% of the CPU being burned just because a device is attached. Yikes ...


I'm starting to wonder if the USB issues I'm seeing with both audio and video are brown-outs caused by the Pi not being able to get enough power into the USB devices. I'll run it through a hub tomorrow morning before putting it away for another month. But the most recent PIANA / POLYANA mission is successfully accomplished, which is get the codebase back onto linux, get it running, get a handle on performance. So now it can go to sleep again quite happily for four more weeks. 

Wednesday, 13 August 2014

Performance snapshot

This is from the unPi / Tau - on the Pi this workload (11 synths, 2 active playing left hand and right hand parts of Fox on the Run) consumes 80% of the CPU, a chunk of that I'm sure down to the poor USB implementation. Here 80% of each core on the dual-core machine is free. So instead of 20% of an ARM11 free, I have 160% of a Cortex A9. Big, big difference.

Brilliant. This will make the most A-FLIPPING-MAZING multisynth / LeagueStation. And it has enough performance to throw a number of mellotron-like sample players on it at the same time, for Linn drums plus Sopranos plus quality sampled piano plus tons and tons of synths, everything with a private delay and with a global reverb chucked in. Should be good for at least 30 notes of full-on synth polyphony, probably 200 notes of sampled polyphony, or mix'n'match.

Totally, totally, totally flipping brilliant.



For the nerdcurious : there are 2 dummy workloads 'thrash1' and 'thrash2', to ensure a dual-core machine is kept busy and that percentage measurements are truly capturing percentage of peak. 

Three synth threads - all called 'synththread' - are there to loadbalance the synthesis work. They do all the oscillator, EG, LFO and mod matrix stuff. The load balancing algorithm is far from ideal - I think it distributes work across threads per-synth rather than per-oscillator - but it does at least scatter chunks of synthesizing across cores, as evidenced by the dump above. These threads also do the per-synth delay unit. The thread called 'piana' is mainly a high hit because it grabs all the outputs from the synth threads, which are separated out into a 'raw' and a 'reverb send', and reverbs them before combining them for output. I'm not entirely sure where the ALSA callbacks are accounted for, probably in 'nativeaudio' (which is the RtAudio / RtMidi launcher) as I can't imagine any other reason for that to show up high in the list. 'feedcallback' is a dumb little thing that isolates the reverb from the elastic decoupler in the ALSA callback, buffering 64 sample pairs at a time, so adds minimal latency. And 'synthMIDI' handles all MIDI packets, routing them to the right synth, and actually - not sure why I did it this way! - goes grubbing around inside the oscillators to hand over notes and controllers. There must be a good reason ... 

But it all hangs together nicely and sounds glorious. Audio to follow soon. 

UPDATE : it plays the Fat Lady with pretty much the same load - I'm seeing 80% free on each core consistently. The Pi just dies. 

Tuesday, 12 August 2014

So damn close - but not quite

Well, PIANA's up and running again, building natively on the Pi, and making noises. And sounding divine. RtAudio for sound, RtMidi for MIDI, both MIDI and audio over USB, some 10 lines of code that are wrappered in #ifdef iOS, so the portification was a giant success, and - dammit dammit dammit!!!! - the Pi doesn't quite have enough performance to do the Fat Lady.

The workload shown by top -H quickly gets up into the 90s, then audio starts to stutter and break up. And this with the full-on 1 GHz turbo mode. I suspect a big pile of effort optimizing the linux image may squeeze a bit of reliability back, but really, I've been in the world of diminishing returns for a long time, this will just have to do. No Fat Lady. Should be comfy for 8 notes of 'no filter' polyphony, maybe 8 of 'low quality filter', some less ambitious Human League-lite performances will still be achievable from a single Pi, but at this point, it will do what it does. There shall be no further tuning.

HOWEVER - my Tangerine Tau is TOTALLY ROCKING. It doesn't have a snd_seq module in the kernel so RtMidi can't initialize a MIDI input port for me, but my test sequence (yes, it's still Fox On The Run) runs with the Fat Lady configuration, and is storming along.

A reminder - the Fat lady configures 11 synths, with IIRC 14 notes of polyphony, so 14 oscillators are trying to sound. All but 2 of them silent but are still burning cycles as they do their thing, because they are never quite certain whether they are silent or not (envelope generators keep ticking until the amplitude gets down to 1 part in 64k, delays run forever), but even with no notes playing it's a big workload. Playing Fox on the Run the Pi burns about 70% of itself. The Tau (no, that isn't its real name!) is showing around 24% utilization. So the Tau is turning out to be about an iPad 2, as expected.

So - big disappointment on the Pi front, it was always going to be a stretch but I had hoped it would make it.

Sunday, 10 August 2014

... it seems the only way

Portified. Now running on command-line OSX, MIDI in via keys, audio out via Behringer USB audio thing. Still on GDC semaphores as OSX semaphores are in a state ...

And now working with named semaphores. Anonymous semaphores simply don't work on OSX - I tend to forget that. But on a positive note, RtAudio and RtMidi are completely and utterly brilliant. It all just worked. Sounds came out, keyboard inputs were recognised, all for a few dozen lines of code.

This should just compile and run on a Pi ... literally, zero source code changes. I should need to just grab all the files into one place and construct a giant CFLAGS line for all the #defines and there it will be - for the first time in over 12 months. Blimey. In fact the Pi is the least interesting platform right now, *really* interested to get it onto my Tangerine Tau!

Update update - still something amiss with my named semaphores, but on linux I shall be using anonymous semaphores anyway, so there's nothing to worry about. But annoying, I will have a whole half dozen lines of code that differ between OSX and linux.

Saturday, 9 August 2014

The path of least resistance ...

I have been wondering how grim the process might be of getting this synth back onto a linux platform, what with my absolute, utter aversion to sitting inside linux environments and wrestling with linux toolchains. The unparalleled horror etc.

So, remembering a presentation I gave to the Khronos Group many, many years ago on 'How to port a game from Desktop to Mobile', I thought I would take a leaf out of my own book, and embrace the rather bogus-sounding concept of 'portification'. This is, you pretend to port the code, but don't - just replace as many platform-specific components with platform-agnostic components and abstraction layers as possible, all the while building, running and testing on your preferred, comfortable platform. And by the time you have finished, you are probably no more than 50 lines of code away from a finished port,  thus making the final port the most trivial operation possible. I mean, any fool can write and debug 50 lines of platform-specific code, right? Even within an alien horror show of a dev environment.

This is really only possible on POSIX platforms, but that's OK, Symbian died a long time ago and who gives a flying fig about Microsoft OSs, they cannot even be bothered to stick industry-standard OpenGL ES onto their phones, the contrarian troublemakers.

So here I sit, in the process of 'portifying' POLYANA from iPad to linux, via - and this is the key card up my sleeve - command-line Darwin. I should have thought of this sooner ...

I have an app with, as of this moment, no GUI and no touch interactions. It intersects with iOS at a small number of places -

filing system - iOS apps have their own wack file organization
GCD - it uses Apple-specific semaphores
CoreAudio - natch
CoreMIDI - ditto

So, by turning it into a working Darwin command-line tool, I

a) eliminate the filing system issues - it will be linux-happy filing system code when I'm done
b) can swap in and test POSIX semaphores, leaving it working and very linux-happy
c) use RtAudio as an abstraction layer, build, test and
d) use RtMIDI as an abstraction layer for MIDI ...

and by the time it generates a sound on the Mac, it should simply be a 'drag the files across and type make' away from working on a linux platform. And once it's on linux, it's on Pi, so POLYANA becomes PIANA once again. All in about 2 hours of focussed work.

OK, it will only be a command-line tool, but actually my current grand plan is to leave it as a command-line tool on linux, and implement a GUI as a completely separate application. I can actually do the same thing on iOS and OSX, but fold the separate applications (GUI, synth) into separate threads within a single application, using tiny variations on the same communications mechanism.

The mile-wide smiley joy joy joy of all this, is that I sit in XCode throughout, a development and debugging tool that I am 100% happy with. My comfort zone will never be threatened, and at my age the last thing I need is to find myself miles outside my comfort zone.

We shall find out soon enough - watch this space.

And yes, it's a Human League reference. But it's also a rather lovely and funny art piece my wife made last year that finally hangs on a wall in the house now the house has a non-clashing colour scheme with the piece.