Regression testing & build system

For a plugin that I currently develop, I want to add regression testing. I am planning to use the Hardware-Virtual plugin.

I talked with @craigdissel about adding a scripting interface to the plugin, see


My next step will be to add a python scripting interface that will enable defining complex tests as python scripts. For my own plugin I am planning to use CMake/CTest as test driver.

Thinking about it, I came to the point were I decided to add a CMake build system to build the firmware for the tests. Currently, the H.-V. plugin (ab)-uses the Arduino build system to build a virtual firmware that runs on x86. This could be done way more transparent using CMake.

Kaleidoscope currently outgrows Arduino, anyway. There are already ideas about porting to other keyboards that have non-Arduino hardware, such as e.g. Planck an ErgoDox. Why not using CMake as an addition, not a replacement for Arduino, to build the overall firmware? I understand the ratio for using Arduino very well (simplicity, user friendlyness, etc.) but it is too limited for more complex build tasks.

I have a lot of experience with CMake. It would be a good choice as it is platform independent and supports cross builds in a straight forward manner.

Opinions?

2 Likes

Sorry to those that read my last post while I editted it. I accidentally slipped on the Reply button while writing it on my smartphone, thus had to re-edit it. Currently I do not have a keybord available, not even a normal one.

I have a strong, but not necessarily logical antipathy to cmake. In the past, as a user, Iā€™ve found it very, very difficult to work with.

Arduino does support many, many different hardware platforms and provides a fairly nice abstraction that makes dealing with it as a user very straightforward. The Planck and ErgoDox EZ are just as ā€˜Arduinoā€™-friendly as the Model 01. When we talk about ā€œArduinoā€, we mean some subset of the build infrastructure, the bootloader, the IDE, the standardized tool chain, and the ā€˜arduino coreā€™ that exports the primitives and runloop we use.

I donā€™t plan for Kaleidoscope to move away from the Arduino runloop, the Arduino core, or being buildable in the Arduino IDE or with the Arduino toolchain.

I believe I saw some discussion on the Arduino developersā€™ mailing list in the past few weeks about linking to libraries built using other build systems. That may be worth taking a look at.

That being said, Iā€™m very excited to see your first bits of test infrastructure taking shape!

3 Likes

A regression testing system, as I imagine it would perform the following task (for each plugin that defines tests).

  1. download Kaleidoscope core and only those plugins necessary (not only stock plugins),
  2. build a firmware-simulator,
  3. run tests, each driven by an individual python script.

The whole process should

  • be invoked via as little commands as possible,
  • only pass or fail,
  • to be easily integratable with travis,
  • work for stock and user plugins in the same way,
  • allow to build with different compilers and compiler-flavours,
  • reside in its own git-repo.

All this needs a driver. Makefiles and shell scripts are non-portable and Arduino does not come with any scripting facilities.

CMake, liked or not, provides everything that is needed (portable, scriptable, auto-detecting, well documented and maintained, very widely used). The intrgrated CTest system allows for very simple test definitions. If there are any free, portable alternatives to CMake that I overlooked and that do not require re-inventing the wheel, please let me know. I do not lay stress on CMake because I love it, it has actually quite some deficiencies, e.g. it is not a true programming language in a strict sense (for good reasons), is lacking namspaces and scoped functions/macros, but appart from that, it appears to be the best tool available as driver of a regression testing system.

BTW, the Hardware-Virtual plugin is already non portable as Arduino does not come with an auto-detection for the host compiler, which is therefore hardcoded in the build scripts.

However, we can stick with the Arduino build system for step 2, except for it is not platform independent.

It is highly desired to build with as many different compilers and c.-versions as possible. Also the use of different sanitizers as provided by gcc and clang is highly recommended.

Allow me to present an alternate vision for regression testing.

The regression testing framework lives in its own git repo, but it is not responsible for downloading anything (Kaleidoscope core, plugins, whatever). Itā€™s also not responsible for defining tests. It is simply a python script / shell script / TBD (perhaps python script is most portable, but Iā€™m open to suggestions) meant to be invoked from the ā€œplugin directoryā€ (root of the Arduino-Boards repo).

Kaleidoscope core, and each plugin, will each have a folder ā€˜testsā€™ as part of their plugin. This will contain subfolders ā€œsketchesā€, ā€œtestsā€, and ā€œoutputsā€. Each plugin is responsible for maintaining its own regression tests and adding new ones as appropriate.

Kaleidoscope-Test will visit each directory in libraries and look for a tests folder. If found, it will run that pluginā€™s regression tests. (And coreā€™s tests when it finds Kaleidoscope core.) Each individual test is defined by a file (eventually written in the scripting frontend of Hardware-Virtual) that lives in the tests subfolder; it has a descriptive name, and specifies both (a) a sketch file to run the test on - for which Kaleidoscope-Test will search in the tests/sketches folder as well as the examples folder - and (b) an input sequence for the test. Kaleidoscope-Test will build the sketch (if it hasnā€™t already), run the test, and compare the output to the expected output as defined in the file(s) in tests/outputs of the same name as the test. (Alternately, if itā€™s easier, the comparison to expected output could be scripted directly in the test definition file.)

Like the proposal above, this is designed to be invoked via as little commands as possible, give a pass/fail output, integrate with travis, work for stock and user plugins in the same way (it simply searches all subdirectories of libraries), and live in its own git repo. (Although, itā€™s lightweight enough - perhaps just a single Python script? - that it could just as easily live in Kaleidoscope core or Arduino-Boards directly, alongside the build infrastructure.)

Unlike the proposal above, it doesnā€™t rely on CMake, partly since Jesse indicates antipathy toward CMake, and partly because it simply seems unnecessary for the relatively lightweight task assigned to Kaleidoscope-Test itself in this vision. However, I admit Iā€™m inexperienced with both CMake and regression testing in general, so I defer to those with more experience in this area.

1 Like

Since I am currently restricted to a smartphone, I could not paraphrase my thoughts as thorough as @craigdissel did. But thatā€™s not necessary as I mostly agree with his vision.

When I wrote In my last post about a testing-repo, I also only meant it to contain the driver infrastructure. His descriptio of the structure of tests as well as input and output-data are exactly according to the way I imagine it.

A test would be defined by

  • the required components (plugins & core)
  • the sketch
  • keyboard input
  • keyboard report output

The latter two would best be defined within a dedicated python script that would reside in the pluginsā€™ testing directories, same as the definition of the required plugins by their URLs and the sketch. A plugin could have many of them.

Using a hand made python skript to drive all this is exactly what Iā€™d call reinventing the wheel. After having spend almost 15 years on writing a simulation software with over 400 000 lines of C++ code and a test bench with hundreds of tests, I am pretty sure that I wouldnā€™t want that.

Shared testing infrastructure with simple, straightforward test scripts in each repository is something Iā€™m 100% on board with. Itā€™s been on our list since the beginning, but weā€™re still playing catch up. Thank you for pushing this discussion and research forward, @noseglasses.

What youā€™re describing is similar to what weā€™re already doing for testing (though our tests donā€™t test the functionality.)

The current test driver for the ā€˜whole systemā€™ lives here:

The per-plugin version is here:

As of now, what we test is:

  • cpplint, to check a bunch of heuristics about coding style
  • smoke testing compilation of examples. While this doesnā€™t catch behavior regressions, it catches numerous issues already
  • astyle to make sure code formatting doesnā€™t stray from our autoformatting guidelines.

We also already run the same tests on every pull request, automatically.

I would be very, very, very happy to also start running test scripts that test the behavior of the code.

Iā€™m a big, big fan of TAP, the ā€œtest anything protocolā€ that originated in Perlā€™s testing culture, but has been portedā€¦pretty widely.

For tests that are explicitly just testing ā€œinputā€ and ā€œoutputā€, that sounds a lot like TCLā€™s old expect, and for which there are many, many drivers. What I think weā€™d want for those kinds of tests is for the test script to, essentially, be ā€œdataā€ rather than a program in a specific programming language.

Of course, what Iā€™d like best is for us to find an existing solution thatā€™s designed for running tests scripts on embedded projects like this.

Right now, I think the best next step is for us to get a single, hacky test script for the kaleidoscope core and somewhat-reproducible instructions for running it by hand.

I think the minimal test is ā€œsimulate pressing key (row 0, col 1) on a keyboardā€ "look for a ā€œHID key event sending press and release of the ā€˜1ā€™ key.ā€

Right now, I donā€™t care whether this is on a ā€˜virtual AVRā€™ or compiled x86 code. I donā€™t need it to run in the cloud and I donā€™t think we want much build automation for it.

Once we have that, then we can start to compare different test infrastructure options to find something thatā€™s going to be easy for developers to use and maintain, easy to automate and easy for end-users to understand.

Does that feel like a reasonable next step?

2 Likes

Sorry if that sounded harsh, but I already spend way too much time of my live, writing stuff that I later threw away after I discovered that thereā€™s a better of-the-shelf solution. One, that I often ignored beforehand, because I thought myself to be smart and creative.
I simply try to learn from any mistakes made earlier.

Sure. I will start working on this as soon as I am back in civilisation next week.

1 Like

Itā€™s clear to me that you have strong feelings about this. Itā€™s also clear to me that you have experience and that youā€™re volunteering your time to try to make Kaleidoscope better.

I really appreciate that.

You and I come from different technology backgrounds, so we have different preferences and different experiences.

I have no problem at all with you (or anybody) advocating strongly for a design or technology you think might be a good fit for a problem we have. Whatā€™s important to me is that everybody be friendly and respectful of each other. To be clear, I donā€™t think youā€™ve said anything that goes against that. :slight_smile:

For what itā€™s worth, writing and throwing away an implementation when you discover something better often only works because you wrote something yourself and, through that act, learned more about what you wanted or needed.

Iā€™m totally happy to chat about this in this thread this week, but, you should by no means feel obligated to spend your time away from civilization chatting on keyboard forums unless you want to :wink:

-jesse

3 Likes

Wisely said. Makes me regret those many hours lost a little less. :sweat_smile:
Of course I learned a lot by this. Most important lesson was: Think carefully before you touch the keyboard.

3 Likes

And thanks for the overview about whatā€™s done so far.

1 Like

Oh, if Iā€™d followed THAT lesson, I never would have tried to make a custom keyboard and we wouldnā€™t be having this discussion :wink:

3 Likes

No worries. Iā€¦realize that wasnā€™t documented anywhere. Thatā€™s a mistake. Iā€™ve opened a ticket for myself.

1 Like

At least as a conversation starter, I have an example test script and some very simple tests. To try them out, check out my forks of Arduino-Boards and Kaleidoscope at their regression-testing branches; Iā€™ve added a regression-test target to the main Makefile, so simply run make regression-test.

I did remove or modify some of the example sketches for Kaleidoscope core so that they build right now with Hardware-Virtual. (Among other things, sketches that include Model01-TestMode are currently not supported for virtual builds, see issue #3 on that repo.)

The implementation itself certainly has a long ways to go Iā€™m sure, but this is basically a proof of concept / conversation starter for what we would want in a testing framework, and how that might or might not resemble what I have here.

Also, this is very probably the largest Bash script Iā€™ve written in my life, so if itā€™s crazy inefficient or breaks Bash conventions or something, please teach me something new :slight_smile:

1 Like

I am currently working on a Python based test system. Iā€™ve already created a plugin that allows to wrap any Kaleidoscope modules (core/plugin) for access from python, see

Unfortunately, I am facing problems that are related to my plugin relying on @craigdisselā€™s Hardware-Virtual plugin.

To not reinvent the wheel, my plugin uses the HV plugin for the generation of host (x86) builds. Unfortunately, since its creation, the Kaleidoscope core has already evolved in a way that invalidated Kaleidoscope-Hardware-Virtualā€™s implementation.

Due to its great potential and the likelyhood for it to become a core part in any future host-based testing setup, I would love to see the HV plugin moved to the Kaleidoscope core. If this is, for any reasons, currently not possible, it would at least be desirable to add a test to the coreā€™s existing test bench that detects breakage of the HV plugin builds after incompatible modifications to the firmware core.

For Craig it is not possible to automatically react on API changes that break the plugin. On the other hand, it would be quite simple to get him informed automatically when the respective Travis test fails, e.g. via an auto-generated email. Of course all this is on him and the core maintainers to decide, I am just wondering how it could be done.

Let me know if I can help by any means, e.g. by setting up such tests.

1 Like

@craigdissel - Would you be up for me cloning Hardware-Virtual into the keyboardio org and granting you admin perms to the repo as a first step to getting it into Arduino-Boards?

Itā€™s my preference that we work toward Hardware-Virtual being defined in boards.txt/platform.txt and having the actual hardware specific backend be in the form of an arduino ā€˜coreā€™. In addition to being more generally useful to other projects, that helps keep it well encapsulated and means that as we add other variant hardware, we can use the same system to support that, without needing changes to other bits of our infrastructure.

Sure, feel free to add to the keyboardio org. Iā€™m all for this being mainstreamed.

Iā€™m not exactly sure what you mean in the second paragraph above. Hardware-Virtual does contain an arduino ā€˜coreā€™, and the installation process in Hardware-Virtualā€™s README results in installing this ā€˜coreā€™ in exactly the place Arduino expects ā€˜coresā€™. If you want Hardware-Virtualā€™s Arduino core pulled out and made a separate thing (not living in the Hardware-Virtual repo), that would be possible, but at least at this time, it includes a number of features that really only make sense if you also have the Hardware-Virtual plugin. Right now, the virtual Arduino core is not that independently useful.

Also, Hardware-Virtual necessarily contains code outside of its Arduino core - for instance, a ā€œvirtualā€ HID library mirroring KeyboardioHID, and hardware stubs implementing Kaleidoscope-Hardware - so I donā€™t think itā€™s possible to refactor it into purely an Arduino core plus some configuration in boards.txt and platform.txt. It will always have to be a plugin in the same sense that the hardware plugins or KeyboardioHID are.

Just curious, do you have feedback on my test system I posted above? Is it missing some features that youā€™re planning to include in your Python one? Is there a way we could combine efforts rather than building an entirely new Python test system? If there are advantages to building a Python one from scratch, Iā€™m all for it, just curious what they are, for my own sake.

1 Like

:+1: I hope to get to it later today, but have some non-virtual hardware stuff that takes priority.

Oh, thatā€™s easy. What I meant was ā€œI didnā€™t do my research before telling you that someday I hope we can redo your work to match the way you already built it.ā€

Understood. My gut feeling is that this will be a win eventually.

The Arduino infrastructure has provisions for ā€˜platform-specificā€™ libraries tied to an individual core. What youā€™re talking about sounds like itā€™s what that support was designed for.

Note that none of this is a precondition to getting Hardware-Virtual upstreamed, just my goals for the future.

2 Likes