Monday, 1 September 2014

Aliasing

Here's a before and after - or rather, after and before. Firstly, as it sounds now, in an 'ensemble string' setting. Then, what it sounds like without all the antialiasing / waveshaping work in place. So, had I attempted to do a Phase Distorting synthesizer naively, this is how it would sound.

Yuck.

There are artifacts I'm not happy about, and I can't work out exactly what's causing them, but it's in way better shape than some I've heard!


Sunday, 31 August 2014

Strawberry Sopranotron

People want to hear all the voices in these apps, so I figured the easiest way to manage that was to set up a sequence in Logic that pushes out a MIDI Program Change then plays a sequence, and just replicate that as many times as there are voices.

And here's the first one - Sopranotron in the Fields with Strawberries.


Friday, 29 August 2014

You Rock Guitar and PIANA

This is short and sweet. Sweet ish.


Six notes of polyphony and 60% of the CPU.

Thursday, 28 August 2014

Denormals on the Pi, the last word

Thanks to somebody helpful on the Raspberry Pi forums, I have a solution. If you too are suffering from a VFP hitting denormalization issues, and having performance get hammered as a result, just do this -

static inline
void enable_runfast(void)
{
#ifdef RPI
    uint32_t fpscr;
    __asm__ __volatile__ ("vmrs %0,fpscr" : "=r" (fpscr));
    fpscr |= 0x03000000;
    __asm__ __volatile__ ("vmsr fpscr,%0" : :"ri" (fpscr));
#endif
}

It was as simple as that. I probably only needed to call it once / once per process but called it at the head of every launched thread just in case. Now it keeps synthesizing fast even during (apparent) silence. That's the most annoying thing about denormalization and audio - by definition the signal is orders of magnitude too quiet to hear, yet the CPU suddenly puts in 10x or more the effort to generate it. I'm not entirely sure why the solution provided writes 0x03000000 into the register as the documentation I have subsequently dug out implies that 0x01000000 is all I need to do, and in fact using 0x01000000 seems to deliver the same result. Will ask ...

In fact in the process of formulating the question on the forums, I found it - NaN. There you go.

Tuesday, 26 August 2014

I just had to wrestle another few percent out of PIANA

It just wasn't fast enough on a 950MHz device, and I really want to target 950 rather than 1GHz. So more tweaking and tuning while watching the non-event that was the Guardian Kate Bush liveblog.

Not yet tested on a Pi, but I think I have got another 6% or so out of it. For sure its 'low load' performance will now be much better, but that's of less interest than the key question - 'will it be able to play Popcorn with 5-10% CPU left over for a GUI, clocked at 950MHz?'

Testing in about 20 minutes I think.

UPDATE : I broke my build system, had to wait overnight to run tests. Which are positive - here's PIANA running Popcorn at 950MHz, all synths active and bashing away - and I have 12% of the CPU free, as evidenced by the two 'thrash' threads. This was captured during one of those 'double time drumming' chunks of the track, where CPU load is highest.


The big question now is, is this 'under 90%' of the un-overclocked state, or the overclocked state, given how the Pi CPU governor works? So now I need to run the same test as 800MHz and see if I run out of puff. 

Tried it - 800MHz falls down in a heap when the drums come in, which is where not only do I get 4 more notes of polyphony, but they are all filtered and hence more expensive. So can't do 800MHz. 900MHz - sort of worked, sort of almost. No audio drop outs that I could hear, but that tiny drop in CPU relative to 950MHz is enough for the MIDI timing to go a tiny bit astray, as the CPU load squeezes out MIDI response in favour of synthesis. 

So there you have it. We look reliable at 950MHz - but there is no GUI - and very nearly reliable at 900. So as long as the GUI refresh is kept at a lower priority than MIDI input, it should pass the Popcorn test if nothing else. 

Also it's worth pointing out, based on the screen grab above, not only is 'top -H' eating up about 2% of my precious CPU, but I have 2 SSH sessions active - 3 actually, as the file system is SSH-mounted - so the network will be getting in my way. 

And one final thing - I just re-ran at 1GHz / full-on Turbo overclocking, and the performance difference is non-linear, which is to be expected - systems do collapse non-linearly as they hit saturation, so no surprise that a non-linear amount of CPU is freed up by a wee bit more clock rate - but even so, nice to see almost 30% of the CPU free all the time. Sort of all the time - occasional dips to 29% free. 

And this stupid denormal issue has not yet gone away completely ... I see about 4% CPU variance after I play then stop vs no sounds at all, so there is still work to be done there. But when I run in fixed-point I don't have enough performance - maybe the ARM code exiting the compiler isn't quite optimal for fixed-point, but whatever, I need to stay on floats now. 

One more tweak has resulted in more reliable behaviour with slightly fewer explicit exponent checks - I now clamp floats at strategic points in the pipeline to a rather hideous 2^-24 - so I now get float performance, fixed-point precision as all my old audio work was done in Apple's 8.24 format.  

Tuesday, 19 August 2014

Making the most of the Pi's limits

With maximum overclock ('Turbo') I can synthesize reliably 8 notes of polyphony, with 3 or 4 of these filter-enabled. I think an 8-note polysynth with the filter on will stutter, sadly. These 8 notes (or monosynths, however you want to think of them) all have to be in 'cheap' mode (44.1kHz rendering, no oversample) or the Pi will run out of grunt, but 'cheap' mode sounds pretty good. Not as good as 'expensive' mode, but the up front aliasing minimization makes it sound sweet enough. The two recent posts with audio featured all synths in 'cheap' mode.

This 8 note limit isn't bad to be honest, because the synth has 128 rapidly-accessible presets, and for the cost of 2 audio packets worth of rendering a preset can be swapped in. So mid-song, if you issue a 'Program Change' on MIDI channel 4, synth 4 will transplant its personality for a whole new one, within just 3ms, with no interruption of audio rendering. The 2 audio packets is a block on further incoming commands, not a block on rendering. So worst case the next Note On might trigger 3ms late.

Basically you can swap synths in and out between musical phrases. This is computationally less expensive than leaving an extra synth running and hardly using it, so within a fixed polyphony gives you access to a much wider audio palette. Of course and reverb tail from the old synth will continue to sound after you swap it out, so the transitions sound effective. If you are careful with what is sounding when, you can potentially get a big, complex arrangement out of these 8 notes.

Which is nice.

Being Boiled on a Raspberry Pi - excerpt

Those of you with long memories may recollect that I had 2 sign-off tests for my little baby synthcluster. One was that it should be able to perform a 'passable' glamsynth version of Mama Weer All Crazee Now. Whatever 'passable' means in that context - that was always sort of the comedy goal, driven by my exercise music loop. 

But this - THIS - was the serious one - can PIANA running on a 'humble' (read 'feeble'!) Raspberry Pi do a reasonable emulation of one of the most iconic and classic analog synth tracks from 1978? And most important of all - hence the Slade thing - can it pass the glam rock handclap test?!?

What do you think? Listen to the voice of Buddha and judge - just 5 synths here, just 2 of them pitched, 3 percussive. Not bad is it, for £20 of hardware? No samples, all computed, made with code indeed. Synth music totally beats a plastic 3D printed bracelet ... 

Monday, 18 August 2014

Beware denormalized floats on the Pi folks

Word of warning folks. My project had what I thought were all the right settings, i.e. -Ofast -ffast-math - but still I was being absolutely hammered by a performance problem associated with denormalized floats. When injecting the Popcorn MIDI into the Pi, I print out an occasional 'last packet rendered at effective %f samples/sec', and most of the time everything was comfortable, running between 60-75k, bottoming out mid-50s. But as soon as I hit stop, performance fell through the floor - 40.5k, 30k, suddenly a remarkably horrible 2.3k samples/sec as ALSA also got annoyed with me for failing to deliver packets in a timely manner. As soon as I introduced a denorm fixer into 2 key places - the reverb unit and the stereo delay - it all sorted itself. And obviously, reverting to my previous fixed-point delay and reverb implementations also worked great.

So, buyer beware - I don't know how to set compiler flags to stop this happening in other projects, but at least I have an emergency drop-in denorm fixer, which looks like this -


static inline void denormFloat ( float *s )
{
int exp = (((int *) s)[0]) & 0x7f800000;
if (exp < 0x1000000) (*s)=0;

}

and which, if my exponent head is working correctly, has a whole power of two guard band in it before going denormalized.

p.s. this isn't a Pi 'problem' apart from the question 'are compiler settings not being honoured' - my Mac does exactly the same, so the consistent IEEE 754 implementation between Intel and ARM is to be commended. But it is a massive Pi problem if it hits you, because you have absolutely no performance headroom. On the Mac I can afford to drop 20x and will never miss a packet deadline. Se be pure, be vigilant, behave.

UPDATE - the VFP does have a 'ignore denormals' mode, and performance is now reliable without having to manually flush. Excellent, and thanks to the Pi forums people for having a quick answer available.

Sunday, 17 August 2014

RESULT!!!

And it is done.

May I present to you, one Raspberry Pi Model B, one $5 USB MIDI interface, one £20(ish) Behringer USB audio interface, 7 Virtual Analog synthesizers, 9 notes of polyphony, a bunch (4 or 5?) stereo delays, a global reverb straight out of the upcoming Jordantron, and ladies and gentlemen - Popcorn!

Recorded straight out of the phono outputs of the Behringer into my Mac, no processing, exactly the bytes emitted by the Pi. Here we go ...




No glitches - amazing. Worst-case micropacket of 64 samples rendered at 47,300 samples per second. I instrumented every 64-sample chunk, the worst one took 93% of the Pi to calculate, leaving very little room for the other 8 threads ... thank heavens for a bit of elastic buffering, eh? But the point is, there is no headroom AT ALL here. But it worked. Hell, the whole thing works - all those synths, all those delays, reverb, all on a teeny tiny Pi.

Two years on, my work here is almost done.

UPDATE : for those who care, the difference between this version and the last one is that I went back to a fixed-point reverb and fixed-point delay. The performance crash was happening - bizarrely - after all notes stopped being voiced. I could see by inspection that all the oscillators were idle, so the only thing consuming any compute was the reverb unit, and running Instruments on the Mac yielded the same result - suddenly, terrible performance after all the notes stopped. So I am thankful for consistent floating-point implementations between Intel and ARM! Floating-point was entering some bogosity via denormals and causing a performance plummet, even though I believed I had all the 'force denormals to zero' flags set. So, a quick revert to fixed-point made it all work (single compiler flag, in the makefile hoorah!) and once it was clear that this was causing the problem I dug in and I have ended up having to manually flush denormals at the input side of reverb and the delay. And now I can build with floating-point on and it still works, without the sudden drop to 20% of performance. Mighty bogus though - I have -ffast-math on and -Ofast, which according to my cursory reading around should deliver minimum checking, minimum adherence to spec and maximum performance. No such luck.

p.s. the 'Tau' platform plays this sequence with 80% of both cores free. As opposed to 7% of the single core free. It is pretty immense. And I don't need to manually flush denormals. This whole thing with denormalization / compiler settings  / performance in the toilet remains a puzzle, and I hate having makefile variations like this, but there you go.

Friday, 15 August 2014

Popcorn

This tune was the first time I ever heard a Moog. Fitting that I should try to get my Pi to play it. Almost successful ... but I'm assuming it's a momentary CPU load spike that lets it down, then RtAudio and ALSA get their knickers into a right old twist and it all goes to hell in a handbasket.


It's all in the video description, but for those who can't be bothered to click through, Logic is sequencing and recording the Pi audio output. The Pi is being fed USB MIDI and is delivering USB audio. 6 synths are active, one of them is three note polyphonic for total 8 note polyphony, and 2 other synths are configured but doing nothing (as I miscounted when I set it up!). On screen are Logic, and a terminal with 2 shells open, one to launch piana and one to run top -H to keep an eye on CPU burden. And again, no samples, everything is computed live, everything has a delay but these are turned way down, reverb is also down low, the BPCVOs are doing their alias management thing, doing wild and crazy Phase Distortion, but only the two percussive instruments have a filter turned on. 

Enjoy!

UPDATE : I'm onto the track of something quite strange here. It seems to not be ALSA's problem, nor RtMidi (although USB MIDI still eats my cycles like they are going out of fashion), there is some state being entered inside the synthesis loop that is consuming way too many cycles, and the problem only manifests on the Raspberry Pi, because it has so few cycles to spare. Testing and developing on a woefully underpowered platform can be a bloody good thing ... 

A bit more in the loop diagnostics indicates that the thing runs fine apart from a couple of sections of this song, where rendering performance suddenly plummets, even though no oscillators are sounding. Something is going subtlely wrong internally, and it may take time to find, but I'll track it down. And once I do, there will be more popcorn, without hiccups.