The importance of a distribution
Happy New Year! We were able to deliver yet another release on time. Version 1.4.4 is available for JavaCPP, JavaCPP Presets, JavaCV, ProCamCalib, and ProCamTracker. Many thanks to all contributors! Since the previous post, according to
git shortlog 1.4.2..1.4.4 --summary, they were:
Alex Merritt, Aman Gupta, Ao Qi, bitstormGER, deinhofer, Deividi, eguid, EmergentOrder, Gertjan Al, HGuillemet, Jarek Sacha, Jeremy Apthorp, kigkrazy, lloydmeta, louxiu, nahojjjen, Nico Hezel, renderdude, Samuel Audet, Taha Emara, vimalaguti, vincent-grosbois, wumo, Yuta Okamoto, Zayin Krige
Many thanks as well to those not on this list but who helped fix CI settings, test builds, debug code, file suggestions, make corrections, report bugs, propose ideas, update wiki pages, etc. As usual this release fixes many issues and contains a lot of updates, all the details can be found in the
CHANGELOG.md files, but one important change concerns MXNet, whose Scala API is now partially usable from Java. Consequently, the JavaCPP Presets for MXNet now also bundle the official Scala API, just as with the official Java APIs of OpenCV and TensorFlow. Given these developments, there is one question that keeps coming up over and over again: Why do we need a distribution like Bytedeco even if everyone shows up with Java APIs on their own?
Well, for one thing, the feature set exposed by these official APIs for Java is very limited, so many users still require access to the C/C++, Scala, or even Python APIs. Moreover, PyTorch is slated to become the first deep learning framework with a full-featured and easy-to-use C++ API, but with no plans in sight for Java, so once the C++ API becomes usable we intend on integrating it to Bytedeco as part of the JavaCPP Presets, see issue bytedeco/javacpp-presets#623.
Another reason could be that JavaCPP provides a set of basic classes as foundation for native libraries, such as
PointerScope introduced in the previous post. Thanks to Hervé Guillemet and Sam Carlberg, they will also soon support the Java Platfom Module System (JPMS), including
jlink, something that the upstream projects have no plans to provide for their Java APIs. (A preview version of OpenCV is already available on the
jpms branch with corresponding snapshot artifacts.) Although Project Panama promises to offer most of the features that JavaCPP already provides today, delivery is expected only in a few more years, numbers showing faster-than-JNI performance for both JIT and AOT compilers are still lacking, development for C++ has yet to start, and it does not aspire to support platforms such as Android and iOS. Though, even in the case of plain Java SE, a loader for native libraries is required, something that community members such as Johan Vos understand, but there are no plans to integrate such functionality to OpenJDK. Unless a fork like Corretto happens to stimulate evolution towards the demands of contemporary software development, I am afraid the need for a third-party tool like JavaCPP is here to stay for the foreseeable future.
Nevertheless, assuming that Project Panama succeeds and renders JavaCPP obsolete—which would be awesome, although doubtful based on the discussion above, but just for the sake of the argument—the need for a distribution like Bytedeco will remain. The redistributable binaries for CUDA are over 2 GB for all supported platforms, compressed, and similarly for MKL, which is about 0.5 GB for all supported platforms. These include aggressively optimized implementations of BLAS, LAPACK, and FFTW, among many other things, achieving at least 10× speedups over anything one could possibly write in pure Java today. (Project Panama also promises to fix this for CPUs, eventually, but not for GPUs.) Their files are currently bundled by Bytedeco and shared by, for instance, Deeplearning4j, and the JavaCPP Presets for OpenCV, Caffe, MXNet, and TensorFlow. To offer a user-friendly experience, if each of these projects were to start bundling CUDA and MKL for all platforms on their own, an application depending on all those libraries may very well end up bundling over 10 GB of duplicate code! Projects competing against each other either downstream or upstream cannot offer such redistributables as shared resources: That is the role of a distribution. The goal is to make all those libraries, including the JDK itself, work well together.
Right now, unless we are misinformed, Bytedeco is the only such distribution available for Java, however small it may be, which is indeed quite sad, but I hope this post helps spur more cooperation or, at the very least, some competition. There is still much to be done before the community at large begins to recognize the clear necessity for Java distributions of native libraries that work not only with Java SE on Linux, Mac, and Windows, but also for mobiles and embedded platforms such as Android, iOS, and Raspberry Pi.
More concretely, to attain these goals, we will have to start putting more resources in priority into the following efforts:
- Obtain more visibility, gain a wider sense of community, by
- Writing more blog posts, on a fancier web site,
- Meeting people in person at conferences, workshops, etc, and
- Participating in upstream projects, driving acceptance by their developers and users, by
- Reporting bugs and helping to fix them, and even by
- Contributing and maintaining JavaCPP-based wrappers upstream, among other things;
- Aim for fully automated bindings generation for the features of C++ most widely used by native libraries,
- Keep creating new presets for additional native libraries, plus maintain them up-to-date with upstream projects.
A lot of help is going to be required to realize all this, so please spread the word! To contribute yourself to one of the items listed above, please communicate via the mailing list from Google Groups, issues on GitHub, or the chat room at Gitter. We are looking forward to continue working with everyone on all these projects this year as well!