
Kota Dohi
Research & Development Group
Hitachi, Ltd.
Empowering non-experts to unlock time-series insights
Time series data are sequences of sensor measurements recorded over time (e.g., temperatures, pressures, vibrations) whose evolving shapes encode machine states and anomalies. Understanding these signals lets teams retrieve similar situations, explain anomalies, and reuse know how. Time-series data from industrial sensors saturates today’s production lines —temperatures, pressures, vibrations, and energy readings stream in around the clock. Yet interpreting these signals remains difficult, especially for non-expert engineers and operators. Traditional methods for searching or describing sensor data still depend on domain specialists, manual sketching, or brittle rule-based systems. Such approaches do not scale and struggle to accommodate diverse domains.
A new paradigm: Learning time-series as language
Recent advances in contrastive learning and large language models (LLMs) have opened a new path: treating time-series data as if it were a language. Our method proposed in [1], CLaSP (Contrastive Language–Signal Pre-training), embodies this idea. The overview of the technology is shown in Fig. 1. The model is trained to map a waveform and a sentence that describes it into the same vector space. Once trained, CLaSP models can perform zero-shot retrieval — users simply type a natural language query, and the model surfaces matching time-series without predefined rules, sketches, or domain-specific feature engineering.
To obtain the millions of signal–sentence pairs that CLaSP needs, we built two complementary, fully automatic pipelines. The first approach, which we call the forward approach, begins with a library of canonical functions (sinusoids, sawtooth waves, sigmoids, Gaussians, etc.), randomizes their parameters, optionally adds noise or spikes, and emits a caption. The other approach, backward approach, inverts the process: it ingests real industrial sensor data, detects salient characteristics — rising trend, drop-outs, periodicity, asymmetry, high amplitude, and more — and translates that feature set into fluent natural-language descriptions using a predefined dictionary of feature-to-language mappings. By merging synthetic variety with real-world complexity, the two pipelines yield a rich, diverse corpus that allows CLaSP to robustly align language and time-series signals.

Applications: Search and captioning made natural
The shared vector space learned with CLaSP, together with the complementary forward / backward data pipelines supports three practical capabilities, all accessible through ordinary language:
- Single time-series retrieval [2]
A user-supplied query (e.g., trend, periodicity, fluctuation) is transformed and compared against vector representations of stored time-series. Because the model is trained contrastively on large-scale signal–text pairs, it retrieves relevant time-series in a zero-shot manner, without any domain-specific tuning. - Time-series difference retrieval [3]
Certain tasks require comparing two time-series rather than evaluating one in isolation. We developed a method for generating synthetic pairs of two time-series (reference and target time-series data) and a corresponding texts, as shown in Table 1. The relational retriever transforms each candidate pair, forms the vector difference, and aligns that difference with the vector representation of a query sentence that specifies the desired change. - Generating captions from time-series [1]
For generating captions from time-series, we add a captioning module to CLaSP. It has two parts. First, a ‘bridge network’ converts the signal embedding produced by CLaSP into a form that a language generator can read. Second, a text-decoding network turns that embedding into a short, domain-neutral caption. By training these two networks on synthetic captions from the SUSHI dataset [4] and real sensor data produced by our backward approach, the module works across many industries and sensor types. Table 2 shows examples of captions generated by our model.
Together, these capabilities provide a natural-language interface for exploring, comparing, and documenting time-series data without manual feature engineering or handcrafted rule sets.


Summary and Outlook
The CLaSP framework turns industrial sensor streams into plain language. It learns a shared vector space from millions of synthetic and real signal–sentence pairs. This representation powers zero-shot retrieval, comparative search, and automatic captioning—eliminating handcrafted features and making time-series exploration intuitive across industries.
This approach has broad implications for industrial AI as it enables:
• Scalable diagnostics without manual labeling
• Human-friendly interfaces for sensor data analysis
• Cross-domain generalization for diverse applications
By bridging the gap between time-series data and language, we enable systems to interpret sensor signals through familiar linguistic concepts—bringing us closer to AI that truly understands data, not just processes it, and making these technologies more accessible, interpretable, and powerful.
Acknowledgements
I would like to acknowledge my colleagues, Aoi Ito, Tomoya Nishida, Harsh Purohit, Takashi Endo, and Yohei Kawaguchi, with whom this research was conducted.
References
[1] K. Dohi, A. Ito, H. Purohit, T. Nishida, T. Endo, and Y. Kawaguchi, “Domain-independent automatic generation of descriptive texts for time-series data”, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5, 2025.
[2] A. Ito, K. Dohi, and Y. Kawaguchi, “CLaSP: Learning Concepts for Time-Series Signals from Natural Language Supervision”, in Proc. European Signal Processing Conference (EUSIPCO), pp. 1817-1821, 2025.
[3] K. Dohi, T. Nishida, H. Purohit, T. Endo, and Y. Kawaguchi, “Retrieving Time-Series Differences Using Natural Language Queries”, in Proc. European Signal Processing Conference (EUSIPCO), pp. 1832-1836, 2025.
[4] Y. Kawaguchi, K. Dohi, and A. Ito, “SUSHI: A dataset of synthetic unichannel signals based on heuristic implementation,” https://github.com/y-kawagu/SUSHI , 2024, accessed: 2025-10-21.






