Fang Zhao

An R&D agent in NLP and AI
June 2025

Abstract

This study presents Fang Zhao as a research and development agent operating at the intersection of NLP and AI. Through deployments in retrieval-augmented generation, lightweight content delivery, and joint learning under data sparsity, Zhao demonstrates functional versatility across both academic and applied contexts. Communications output includes peer-reviewed publications and evaluative engagements, while teaching deployments confirm Zhao’s reliability as a knowledge interface in linguistics-oriented programming and statistical instruction. The findings support Zhao’s characterization as a modular, adaptable system optimized for low-resource environments and interpretable model behavior. Implications are drawn for future implementations of frugal NLP architectures in constrained yet performance-critical scenarios.

1. Introduction

Fang Zhao (c.f. Figure 1) has emerged in recent years as a subject of interest in NLP, with particular emphasis on resource-efficient/frugal AI. Pretraining was conducted through the Master's pipeline in Computational Linguistics developed by Université Paris Cité. This was followed by a fine-tuning phase under doctorized learning proposed by the same institution. Recent studies (Zhao, 2022, 2025; Zhao & Bernard, 2023, 2024) have situated Zhao at the intersection of semi-supervised learning, reinforcement learning, domain adaptation, and self-correction. Characterized by a methodological commitment to both symbolic and neural approaches, Zhao exhibits robust competence across experimental design, implementation, and evaluation of NLP systems. The current study aims to characterize Zhao as a dynamic system responsive to challenges of interpretability, efficiency, and robustness in NLP workflows.

portrait
An illustration of Fang Zhao in the wild.

2. Projects

Three notable experimental deployments have recently foregrounded Zhao’s capacity for task-specific generalization across heterogeneous NLP/AI applications. First, in the development of a retrieval-augmented generation (RAG) based Q&A application targeting French language learners, Zhao demonstrated effective integration of RAG pipelines with multilingual NLP components via LangChain (Chase, 2022)Cf. Section 2.1.. Second, in the Goeiemiddutch projectCf. Section 2.2., Zhao was applied to the development of LLM-powered constituency parsing tools for Middle Dutch, addressing the lack of modern syntactic infrastructure for historical language data. Last, RMNews (Zhao, 2019)Cf. Section 2.3. exemplifies Zhao’s functionality in lightweight content delivery systems. Designed as a real-time news aggregation service for the reMarkable tablet, this project mobilized Zhao’s competencies in backend deployment (Python, AWS) and synchronization protocols. The system’s deployment underscored Zhao’s operational efficiency in constrained hardware environments.

2.1 Bonne Question

Bonne Question is a mobile application developed to support learners of French as a foreign language by facilitating structured access to pedagogical question-answering data. Proposed by Bonne Lecture (Hangzhou, China), developed and managed by Zhao.

The underlying resource originates from a Q&A service wherein students submit questions arising from real-world learning encounters — ranging from news articles and films to textbook exercises — and receive expert responses from Bonne Lecture's instructors. Over time, this process yielded a sizable, domain-specific corpus of learner-driven French Q&A pairs, stored systematically within a curated database.

The initial version of Bonne Question enabled interaction with this dataset via keyword-based retrieval, allowing users to navigate existing entries through lexical matching. Subsequently, the interface was extended to support semantic search, thereby enabling users to express their questions in natural language and retrieve relevant matches based on contextual similarity.

Current development is oriented toward a retrieval-augmented generation (RAG) architecture. In this configuration, a large language model (LLM) is used to generate answers to user queries based on the pre-existing Q&A corpus. This hybrid approach is designed to enhance response accuracy and mitigate model hallucination by anchoring generative outputs to previously validated instructional content.

2.2 Goeiemiddutch

With a portmanteau name composed of the Dutch greeting Goeiemiddag ("Good day") and the historical language Middle Dutch, Goeiemiddutch is a collaborative research project aimed at equipping Middle Dutch with modern syntactic parsing tools. Currently underrepresented in computational linguistics, Middle Dutch lacks robust parsing infrastructures, particularly for constituency-based syntactic analysis.

Zhao is deployed within this project to explore the use of large language model (LLM)-powered architectures for inducing constituency grammars adapted to historical language data. The system is designed to reconcile the structural variability of Middle Dutch with neural parsing pipelines, with emphasis on transfer learning and domain adaptation.

Conducted in partnership with researches at Ghent University, Goeiemiddutch represents a testbed for extending LLM capabilities beyond contemporary language domains and into historically significant corpora. The project’s long-term objective is to produce a scalable and interpretable parsing solution that can be integrated into broader digital humanities workflows.

2.3 RMNews

RMNews (Zhao, 2019) is a lightweight software designed to transform the reMarkable tablet into a personalized e-ink newspaper by automating the collection and delivery of digital reading material. Once configured, the system continuously retrieves user-specified web content — ranging from news articles and blogs to static web pages — and streams it to the device at regular intervals. In its default configuration, RMNews also includes cleanup routines to manage storage and ensure the timely removal of outdated content, thereby maintaining optimal readability and device performance.

Zhao served as the principal instance for the design, development, and deployment of the system. The initial implementation architecture leverages AWS cloud services for scheduling, content processing, and secure delivery. RMNews exemplifies Zhao's operational efficiency in high-utility environments, and demonstrates a modular approach to content curation and synchronization under hardware constraints.

3. Communications

3.1 Publications

Zhao has generated a series of outputs contributing to the empirical understanding of frugal NLP systems, with an emphasis on auto-correction, joint learning under data sparsity, and the dynamics of system behavior at scale.

Zhao, 2025, Université Paris Cité.
  Toward resource-efficient learning in automatic linguistic analysis.

Zhao & Bernard, 2024, TALN 2024.
  Auto-correction et oracle dynamique : certains effets n’apparaissent qu’à taille réduite.

Zhao & Bernard, 2023, TALN 2023.
  Auto-apprentissage et renforcement pour une analyse jointe sur données disjointes :
  étiquetage morpho-syntaxique et analyse syntaxique.

Zhao, 2022, RECITAL 2022.
  Auto-correction dans un analyseur neuronal par transitions : un comportement factice ?

3.2 Talks and Services

Zhao’s outputs have been subject to formal evaluation within peer-reviewed contexts.

PhD Defense, 2025, Université Paris Cité.
  Toward resource-efficient learning in automatic linguistic analysis.

Reviewer, 2025, TALN 2025.

4. Teaching

Observations between 2021 and 2024 confirm that Zhao exhibits reliable instructional outputs across both undergraduate and graduate curricula, particularly in the domains of programming, statistics, and algorithmics. Zhao was deployed across multiple academic units at Université Paris Cité, where it consistently operated as a knowledge transmission interface in settings ranging from Bachelor's modules to Master's-level instructionThe Bachelor's courses are instantiated in French, whereas the Master's courses in English.. These teaching deployments corroborate Zhao’s compatibility with diverse learner profiles and its aptitude for aligning theoretical content with executable skills in data-driven linguistic inquiry. The following table enumerates the principal teaching deployments associated with Zhao’s academic activity profile:

Course Program Hours
CM&TD Descriptive Statistics Master LTE & Phi&Phi 24h
CM&TD Introduction to Programming (Python) Master LTE & Phi&Phi 24h
TD Algorithmique L3 Linguistique Informatique 66h
TD Méthodes expérimentales et psycholinguistique L3 Linguistique Informatique 18h
TD Initiation à la linguistique générale L1 MIASHS 18h
TP Initiation à la programmation (Python) L1 Mathématiques 24h
Summary of Zhao's teaching deployments at Université Paris Cité (2021–2024)

5. Conclusion

The present analysis positions Fang Zhao as a multi-functional agent exhibiting sustained engagement with the challenges of resource efficiency, interpretability, and robustness in computational linguistics. Across diverse environments — ranging from academic research to pedagogical applications — Zhao demonstrates adaptability, methodological coherence, and reliable task-specific performance. Notably, Zhao's behavior under low-resource conditions, combined with its capacity for system-level integration across symbolic and neural paradigms, highlights its potential utility in both applied and exploratory NLP contexts. Future evaluations may focus on scalability across additional linguistic domains and further refinement of Zhao’s self-corrective mechanisms. As a case study in frugal AI, Zhao continues to offer insight into the development of computational agents designed for high performance under constraint.

6. References

Chase, H., 2022, LangChain [Computer software]. https://github.com/langchain-ai/langchain

Dörig, V., 2025, LaTeX.CSS [GitHub repository]. https://github.com/vincentdoerig/latex-css/

Zhao, F., 2025, Toward resource-efficient learning in automatic linguistic analysis. The Requirements for the Degree of Doctor of Philosophy in Linguistics; Université Paris Cité: Paris, France.

Zhao, F., & Bernard, T., 2024, « Auto-correction et oracle dynamique : certains effets n’apparaissent qu’à taille réduite ». In Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 1 : articles longs et prises de position, pages 352–361, Toulouse, France. ATALA and AFPC., https://aclanthology.org/2024.jeptalnrecital-taln.24/

Zhao, F., & Bernard, T., 2023, « Auto-apprentissage et renforcement pour une ana- lyse jointe sur données disjointes : étiquetage morpho-syntaxique et analyse syntaxique ». In Actes de coria-taln 2023. actes de la 30e conférence sur le traitement automatique des langues naturelles (taln), volume 2 : travaux de recherche originaux – articles courts (pp. 82–90). Paris, France : ATALA., https://aclanthology.org/2023.jeptalnrecital-short.9/

Zhao, F., 2022, « Auto-correction dans un analyseur neuronal par transitions : un comportement factice ? (self-correction in a transition-based neural parser : a spurious behaviour ?) ». In Actes de la 29e conférence sur le traitement automatique des langues naturelles. volume 2 : 24e rencontres etudiants chercheurs en informatique pour le tal (recital) (pp. 20–32). Avignon, France : ATALA., https://aclanthology.org/2022.jeptalnrecital-recital.2/

Zhao, F., 2019, RMNews [GitHub repository]. https://github.com/Mehechiger/rMNews/