Targeted Grasping of the Right Object
10.07.2025 - 2D Images Combined with Deep Learning Enable Robust Detection Rates
3D-based machine vision methods provide support in identifying and inspecting objects in three-dimensional space. A new bin picking software goes one step further by leveraging advanced deep learning algorithms. The result: exceptionally robust detection rates, easy training, and the ability for robots to reliably, quickly, and efficiently grasp various objects—even under challenging conditions.
3D vision technologies are used for machine vision to capture, evaluate, and process 3D data for controlling mechanical systems or processes. These capabilities enable applications that cannot be realized with 2D approaches at the same level of robustness. For instance, 3D vision can detect defects, dents, or deformations on surfaces with precision, perform high-accuracy measurements of components at the micrometer level, or reliably inspect weld seams.
MVTec’s machine vision software products, Halcon and Merlic, offer a broad range of tools and operators to successfully implement 3D applications. These include 3D matching techniques and various methods for 3D object localization to determine precise positions. MVTec’s products stand out due to several key advantages: in addition to advanced technologies, the software is hardware-independent, allowing integration with a wide variety of hardware components. This ensures a high flexibility for users.
Bin Picking: A Challenge for Robotics
A common application involves accurately determining the position and orientation of objects in 3D space. This is critical for automated handling processes where robots interact with complex and variable parts, such as bin picking or standard pick-and-place tasks. Bin picking is particularly challenging: objects lie in disordered piles, sometimes stacked on top of each other, making automated grasping more difficult.
To address such complex scenarios, MVTec developed the Deep 3D Matching technology, which is included in the current version 25.05 of the Halcon software. What makes this technology unique is its combination of deep learning algorithms with rule-based methods, one of the first solutions of its kind. This hybrid approach enables robust detection rates that conventional matching technologies cannot achieve. Deep 3D Matching determines object positions using only 2D images.
Synthetic Training Data and Low Labeling Effort
Another key benefit: Deep 3D Matching only requires synthetic image data for training. These data sets can be generated fully automatically using CAD models of the target objects. This eliminates the need for manual image labeling, saving significant time and costs. Additionally, it requires only minimal parameter configuration while still achieving robust results. This makes it ideal for implementing industrial bin picking and pick-and-place applications in a simple, efficient, and cost-effective manner.
Affordable 2D Cameras Reduce Costs
The required images are captured using simple 2D cameras positioned at various angles. Due to their low cost, these cameras offer a significant economic advantage and can be used flexibly thanks to MVTec’s hardware-independent software. Moreover, the system architecture allows additional cameras to be integrated easily, without major setup changes. This flexible camera configuration helps minimize ambiguities and false positives in detection results.
Machine Vision Software as Easy to Implement as Possible
Interview with John Campbell, Key Account Manager at MVTec
The new 3D Deep Matching function from MVTec‘s Halcon shows that bin picking also works with 2D images. This proves the essential role that software plays in machine vision. But what if the user is not a software expert? And how complex is it to learn how to use new image processing software? These and other questions are answered by John Campbell, Key Account Manager.
inspect: What role does software play in modern machine vision systems?
John Campbell: Simply put, software is the intelligence of a vision system. It is responsible for using the available compute power of the system to acquire, process, and analyze data from the imaging hardware to produce outputs that can, for example, control a manufacturing process. In today’s increasingly automated world, modern vision systems are being used in more and more demanding ways. As a result, greater intelligence is required from the vision systems. They must be flexible enough to acquire data from a variety of sources, equipped to solve a wide range of complex tasks, and easily adaptable to efficiently run on a variety of compute platforms and under a variety of challenging conditions.
inspect: What is the most important aspect of machine vision systems for your customers?
Campbell: As a pure software manufacturer for machine vision, we have a wide range of customers, each with different priorities for using machine vision. However, I think it’s fair to say that value is the most important aspect of a machine vision system. Customers expect their investment to produce an attractive return, be it with improved quality, reduced costs, increased efficiency, etc. MVTec’s hardware-agnostic software provides value by offering high performance for even the most challenging applications, being easily scalable, extendable, future-proof, and easy to support.
inspect: Why are easy implementation and accessibility so important?
Campbell: It is important because it means implementing and supporting a vision system doesn’t require highly skilled or trained resources. Experts are often difficult to find and typically come at a much higher cost. So having software that can be used without extensive training leads to savings in terms of the human capital needed.
inspect: What are the disadvantages of a system that is easy to install and operate?
Campbell: Ease-of-use and accessibility often come with frustrating limitations and costly tradeoffs in performance, capability, and flexibility. That’s why it’s important to consider how critical machine vision is to your long-term automation strategy. I work with companies that are constantly challenging the limitations of machine vision technology. For them, easy, off-the-shelf vision systems are too performance-limited and too expensive to scale. To keep up with their innovation requirements, they need highly optimizable vision software that supports the entire vision hardware market. They need to easily adapt their existing software applications to add new inspection tasks or leverage newly available acceleration hardware.
inspect: How does MVTec resolve this contradiction?
Campbell: At MVTec we are extremely focused on finding the balance between ease-of-use and the probability of success in solving all types of highly challenging applications. For example, in our latest Halcon release, we have introduced support for large language models to simplify the development of applications with our software. Additionally, our rapid prototyping is being modernized to improve the user experience and further shorten development times.
inspect: How do users benefit from the hardware independence of Halcon and Merlic?
Campbell: I’ll use a real-life example to illustrate why hardware independence can drastically reduce capital expenditures: A customer I work with was using an off-the-shelf 3D laser profilometer system to inspect and measure the height of a dispensed material across a large surface. Due to hardware limitations of the system, they were unable to achieve the throughput needed with a single station. So, the customer ended up doubling the number of inspection stations on each line. Eventually they decided to try a profilometer from another vendor. The new sensor had a larger field of view and higher scan rate than the previous system. They used MVTec’s software to interface with this sensor and process the data. With this configuration, they got more accurate inspection results at the required throughput with a single system. At scale they saved hundreds of thousands of dollars by using the best hardware for their application with a powerful standard software that could support it.
inspect: What can users do if their use case, and therefore their system changes?
Campbell: One of the benefits of having hardware-agnostic standard vision software as the foundation of a machine vision strategy is the flexibility to make adjustments as requirements change. Instead of installing additional hardware into a machine to add new inspection capabilities, you can often use the existing hardware and simply program the new inspection functions into the existing software. Similarly, upgrading performance doesn’t always mean upgrading the whole system. A faster computer or new AI accelerator hardware or even a software upgrade can be a lower-cost way to unlock next-generation performance improvements.
inspect: What is required on the part of the user to implement a suitable machine vision system with Halcon?
Campbell: Experience is a good place to start. Having a clear, measurable objective and performance requirement, and the budget, timeline, resources, and technology available to achieve the desired outcome will make design and implementation much more efficient. Understanding the limitations of technology is also important. These days we hear so much about the ability of deep learning models to solve tasks once thought impossible. While deep learning is a very powerful tool in image processing and analysis, it comes with its own set of trade-offs, prerequisites, and performance limitations. Understanding the implications of those and any other technology for that matter is critical to developing a successful system or strategy.