Transformer Net

Assembly Line

Generative AI for Process Systems Engineering

🧠⏳ Multi-layer parallel transformer model for detecting product quality issues and locating anomalies based on multiple time‑series process data in Industry 4.0

📅 Date:

✍️ Authors: Jiewu Leng, Zisheng Lin, Man Zhou, Qiang Liu, Pai Zheng, Zhihong Liu, Xin Chen

🔖 Topics: Transformer Net, Anomaly Detection, Quality Assurance

🏢 Organizations: Guangdong University of Technology, The Hong Kong Polytechnic University, China South Industries Group


Smart manufacturing systems typically consist of multiple machines with different processing durations. The continuous monitoring of these machines produces multiple time-series process data (MTPD), which have four characteristics: low data value density, diverse data dimensions, transmissible processing states, and complex coupling relationships. Using MTPD for product quality issue detection and rapid anomaly location can help dynamically adjust the control of smart manufacturing systems and improve manufacturing yield. This study proposes a multi-layer parallel transformer (MLPT) model for product quality issue detection and rapid anomaly location in Industry 4.0, based on proper modeling of the MTPD of smart manufacturing systems. The MLPT consists of multiple customized encoder models that correspond to the machines, each using a customized partition strategy to determine the token size. All encoders are integrated in parallel and output to the global multi-layer perceptron layer, which improves the accuracy of product quality issue detection and simultaneously locates anomalies (including key time steps and key sensor parameters) in smart manufacturing systems. An empirical study was conducted on a fan-out, panel-level package (FOPLP) production line. The experimental results show that the MLPT model can detect product quality issues more accurately than other methods. It can also rapidly realize anomalous locations in smart manufacturing systems.

Read more at Journal of Manufacturing Systems

🧠🦾 Google’s Robotic Transformer 2: More Than Meets the Eye

📅 Date:

✍️ Author: Michael Levanduski

🔖 Topics: Transformer Net, Machine Vision, Vision-language-action Model

🏢 Organizations: Google


Google DeepMind’s Robotic Transformer 2 (RT2) is an evolution of vision language model (VLM) software. Trained on images from the web, RT2 software employs robotics datasets to manage low-level robotics control. Traditionally, VLMs have been used to combine inputs from both visual and natural language text datasets to accomplish more complex tasks. Of course, ChatGTP is at the front of this trend.

Google researchers identified a gap in how current VLMs were being applied in the robotic space. They note that current methods and approaches tend to focus on high-level robotic theory such as strategic state machine models. This leaves a void in the lower-level execution of robotic action, where the majority of control engineers execute work. Thus, Google is attempting to bring the power and benefits of VLMs down into the control engineers’ domain of programming robotics.

Read more at Control Automation

🧠🦾 RT-2: New model translates vision and language into action

📅 Date:

🔖 Topics: Robot Arm, Transformer Net, Machine Vision, Vision-language-action Model

🏢 Organizations: Google


Robotic Transformer 2 (RT-2) is a novel vision-language-action (VLA) model that learns from both web and robotics data, and translates this knowledge into generalised instructions for robotic control.

High-capacity vision-language models (VLMs) are trained on web-scale datasets, making these systems remarkably good at recognising visual or language patterns and operating across different languages. But for robots to achieve a similar level of competency, they would need to collect robot data, first-hand, across every object, environment, task, and situation.

In our paper, we introduce Robotic Transformer 2 (RT-2), a novel vision-language-action (VLA) model that learns from both web and robotics data, and translates this knowledge into generalised instructions for robotic control, while retaining web-scale capabilities.

Read more at Deepmind Blog

🧠🦾 RoboCat: A self-improving robotic agent

📅 Date:

🔖 Topics: Robot Arm, Transformer Net

🏢 Organizations: Google


RoboCat learns much faster than other state-of-the-art models. It can pick up a new task with as few as 100 demonstrations because it draws from a large and diverse dataset. This capability will help accelerate robotics research, as it reduces the need for human-supervised training, and is an important step towards creating a general-purpose robot.

RoboCat is based on our multimodal model Gato (Spanish for “cat”), which can process language, images, and actions in both simulated and physical environments. We combined Gato’s architecture with a large training dataset of sequences of images and actions of various robot arms solving hundreds of different tasks.

The combination of all this training means the latest RoboCat is based on a dataset of millions of trajectories, from both real and simulated robotic arms, including self-generated data. We used four different types of robots and many robotic arms to collect vision-based data representing the tasks RoboCat would be trained to perform.

Read more at Deepmind Blog

RT-1: Robotics Transformer for Real-World Control at Scale

📅 Date:

✍️ Authors: Keerthana Gopalakrishnan, Kanishka Rao

🔖 Topics: Industrial Robot, Transformer Net, Open Source

🏢 Organizations: Google


Major recent advances in multiple subfields of machine learning (ML) research, such as computer vision and natural language processing, have been enabled by a shared common approach that leverages large, diverse datasets and expressive models that can absorb all of the data effectively. Although there have been various attempts to apply this approach to robotics, robots have not yet leveraged highly-capable models as well as other subfields.

Several factors contribute to this challenge. First, there’s the lack of large-scale and diverse robotic data, which limits a model’s ability to absorb a broad set of robotic experiences. Data collection is particularly expensive and challenging for robotics because dataset curation requires engineering-heavy autonomous operation, or demonstrations collected using human teleoperations. A second factor is the lack of expressive, scalable, and fast-enough-for-real-time-inference models that can learn from such datasets and generalize effectively.

To address these challenges, we propose the Robotics Transformer 1 (RT-1), a multi-task model that tokenizes robot inputs and outputs actions (e.g., camera images, task instructions, and motor commands) to enable efficient inference at runtime, which makes real-time control feasible. This model is trained on a large-scale, real-world robotics dataset of 130k episodes that cover 700+ tasks, collected using a fleet of 13 robots from Everyday Robots (EDR) over 17 months. We demonstrate that RT-1 can exhibit significantly improved zero-shot generalization to new tasks, environments and objects compared to prior techniques. Moreover, we carefully evaluate and ablate many of the design choices in the model and training set, analyzing the effects of tokenization, action representation, and dataset composition. Finally, we’re open-sourcing the RT-1 code, and hope it will provide a valuable resource for future research on scaling up robot learning.

Read more at Google AI Blog