﻿<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type='text/xsl' href='/ui/templates/normal/view.xsl'?>
<html RelativePath="/" class="103002" site="103002">
	<head>
		<title>Text mining boosts knowledge system construction _Sass English Web-RESEARCH-Commentary</title>
		<meta id="5191" name="description" content="Text mining boosts knowledge system construction " />
	</head>
	<body>
		<span>
			<a href="../103002/default.aspx">Commentary</a>
			<a href="../103/default.aspx">RESEARCH</a>
		</span>
		<div>
			<h1>Text mining boosts knowledge system construction</h1>
			<h2 />
			<h3>Aria</h3>
			<h4>2025-04-28 03:13</h4>
			<h5>WANG SHUNYU</h5>
			<a href="http://www.csstoday.com/Item/13749.aspx">Chiense Social Sciences Today</a>
			<p><![CDATA[<form><p style="background-color:#ffffff; line-height:20pt; margin:0pt 0pt 11.25pt; text-align:justify"><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; text-transform:none">With the arrival of the digital age, knowledge is expanding, intersecting, and spreading at an unprecedented pace. The widespread adoption of the internet and the improvement of academic databases have created an immense repository of textual data for knowledge discovery. Academic literature, professional books, research reports, and course materials in electronic form are now abundant and easily accessible. Traditional methods of organizing knowledge—reading and summarizing texts manually—are increasingly inadequate when faced with such volume. They are prone to sampling bias and may overlook key information due to the subjectivity of human judgment. Tasks such as extracting discipline-specific concepts, classifying ideas, constructing relational networks, and analyzing paradigm shifts now call for more precise, efficient, and intelligent methods.</span><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; text-transform:none">&nbsp;</span></p><p style="background-color:#ffffff; line-height:20pt; margin:0pt 0pt 11.25pt; text-align:justify"><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; text-transform:none">Text mining, also known as text data mining, refers to the process of extracting high-value, latent information from large volumes of unstructured or semi-structured text—information that conventional methods cannot readily obtain. This technology plays a crucial role in building concept, theory, method, and application systems. Conceptual systems aim to map out core disciplinary concepts and their interconnections. Theoretical systems focus on integrating and refining theoretical knowledge. Methodological systems seek to optimize and innovate research techniques. Application systems emphasize putting academic knowledge into practice. In navigating the complexities of the digital era, text mining emerges as a vital tool for advancing the construction of comprehensive knowledge systems.</span><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; text-transform:none">&nbsp;</span></p><p style="background-color:#ffffff; line-height:20pt; margin:0pt 0pt 11.25pt; text-align:justify"><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; font-weight:bold; text-transform:none">Technical foundation</span><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; font-weight:bold; text-transform:none">&nbsp;</span></p><p style="background-color:#ffffff; line-height:20pt; margin:0pt 0pt 11.25pt; text-align:justify"><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; text-transform:none">Text mining combines natural language processing (NLP), machine learning, statistics, and data mining. Core tasks include text preprocessing, feature extraction, classification, clustering, sentiment analysis, entity recognition, relationship extraction, and topic modeling. </span><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; text-transform:none">&nbsp;</span></p><p style="background-color:#ffffff; line-height:20pt; margin:0pt 0pt 11.25pt; text-align:justify"><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; text-transform:none">For the purpose of knowledge system construction, key techniques include feature engineering, dimensionality reduction, semantic networks, topic modeling, and time series analysis. Used in concert, these tools can extract core concepts from academic literature, establish conceptual relationships, support theoretical development, and facilitate multi-level analysis of research methods and subjects—at the macro, meso, and micro levels—while tracking the trajectory of academic inquiry.</span><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; text-transform:none">&nbsp;</span></p><p style="background-color:#ffffff; line-height:20pt; margin:0pt 0pt 11.25pt; text-align:justify"><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; text-transform:none">Feature engineering plays a foundational role in identifying core academic concepts. It includes techniques such as the bag-of-words model, term frequency–inverse document frequency (TF-IDF), topic modeling, and word embeddings. The bag-of-words model treats a text as a collection of words and identifies key terms by analyzing frequency. TF-IDF weighs both the frequency of terms within a document and their rarity across documents to highlight key concepts. Topic models like Latent Dirichlet Allocation (LDA) uncover underlying themes through word co-occurrence patterns. Word embeddings, such as Word2Vec and GloVe, map words into low-dimensional vector spaces, allowing for the clustering of semantically similar terms to identify conceptual groupings.</span></p><p style="background-color:#ffffff; line-height:20pt; margin:0pt 0pt 11.25pt; text-align:justify"><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; text-transform:none">Dimensionality reduction is necessary due to the high complexity of textual data. Techniques such as correspondence analysis and t-distributed stochastic neighbor embedding (t-SNE) project high-dimensional data into lower dimensions, making patterns and relationships easier to detect. Correspondence analysis processes word-frequency matrices, computes silhouette coefficients, and visualizes the associations between terms and documents, helping to scaffold a knowledge system. T-SNE clusters data points in reduced-dimensional space; the resulting groupings and their spatial relationships clarify key topics, their interconnections, and hierarchical structures, offering a foundation for organizing knowledge.</span></p><p style="background-color:#ffffff; line-height:20pt; margin:0pt 0pt 11.25pt; text-align:justify"><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; text-transform:none">Topic modeling includes techniques such as Latent Semantic Analysis, LDA, Dynamic Topic Models, Structural Topic Models, and Biterm Topic Models, which are particularly well-suited to large-scale text analysis. Implementation involves preprocessing, selecting appropriate topic numbers and algorithms, and then interpreting and labeling the extracted topics in line with academic standards. Topic relationships are determined by analyzing similarity and co-occurrence frequencies, which helps clarify their logical connections. The final step involves distilling key knowledge elements to form structured knowledge units, thereby supporting the construction of a cohesive knowledge system framework.</span><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; text-transform:none">&nbsp;</span></p><p style="background-color:#ffffff; line-height:20pt; margin:0pt 0pt 11.25pt; text-align:justify"><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; text-transform:none">Semantic networks play a critical role in revealing knowledge structures. They encompass knowledge representation, association mining, structural analysis, inference, and visualization. In knowledge representation and modeling, academic terms and concepts are abstracted as nodes, and their relationships as edges, forming a network. Co-occurrence analysis and semantic similarity measures are used to uncover hidden associations. Structural analysis employs metrics like node centrality and community detection to assess the importance of concepts and define subfields. Inference techniques trace indirect connections and potential knowledge by searching network paths, providing support for academic research.</span></p><p style="background-color:#ffffff; line-height:20pt; margin:0pt 0pt 11.25pt; text-align:justify"><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; text-transform:none">Time series analysis treats academic data as sequences that change over time, enabling the identification of patterns, trends, and models. By extracting time-related features, plotting trends, applying spectral analysis, and examining sequence correlations, researchers can track knowledge development and forecast emerging research directions.</span><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; text-transform:none">&nbsp;</span></p><p style="background-color:#ffffff; line-height:20pt; margin:0pt 0pt 11.25pt; text-align:justify"><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; font-weight:bold; text-transform:none">Application prospects</span><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; font-weight:bold; text-transform:none">&nbsp;</span></p><p style="background-color:#ffffff; line-height:20pt; margin:0pt 0pt 11.25pt; text-align:justify"><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; text-transform:none">To date, text mining has already yielded progress in the construction of knowledge systems, particularly in three areas: bibliometric research; the derivation and tracking of academic concepts; and the development and application of new tools in ontology engineering.</span></p><p style="background-color:#ffffff; line-height:20pt; margin:0pt 0pt 11.25pt; text-align:justify"><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; text-transform:none">Bibliometric research applies quantitative methods to analyze academic literature, citation networks, and keyword co-occurrence, helping to reveal the internal logic and structure of scholarly development. Citation analysis, for instance, can trace the evolution of core literature, identify key scholars and institutions, and provide empirical support for the formation of disciplinary systems. High frequency and emergent keyword analysis helps capture research frontiers and trending topics, offering guidance for the dynamic updating of knowledge systems. Analyses of international collaboration networks also shed light on academic globalization and foster interdisciplinary integration and innovation.</span></p><p style="background-color:#ffffff; line-height:20pt; margin:0pt 0pt 11.25pt; text-align:justify"><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; text-transform:none">The derivation and tracking of academic concepts is particularly valuable in the humanities and social sciences, where topic modeling can help analyze historical texts to identify core issues and shifts in thought. For example, Arjuna Tuzzi has used correspondence analysis and topic analysis to examine the historical development of academic literature; Giuseppe Giordan and colleagues have applied topic models to abstracts from leading American sociology journals to trace the discipline’s evolution; and Wang Shunyu and Chen Ruizhe have used structural topic models to analyze the abstracts of papers related to the Belt and Road Initiative, highlighting regional differences in scholarly focus. Text mining has also been applied to terminology standardization and academic influence assessment, offering technical support for the normalization and continual refinement of knowledge systems.</span><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; text-transform:none">&nbsp;</span></p><p style="background-color:#ffffff; line-height:20pt; margin:0pt 0pt 11.25pt; text-align:justify"><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; text-transform:none">Finally, the development and application of new tools in ontology engineering has greatly enhanced the systematization and intelligent construction of knowledge systems. These tools automatically extract key concepts, terms, and their semantic relationships to build structured knowledge networks that expose the internal logic and evolution of disciplinary knowledge. In computer science, platforms such as Protégé have been used to develop the Web Ontology Language (OWL), providing a standardized framework for knowledge representation and reasoning in artificial intelligence. In the social sciences, semantic analysis tools are used to mine policy texts and construct knowledge graphs, supplying a scientific basis for policymaking and evaluation. These tools address persistent challenges such as vague concepts and unclear relationships, and—through dynamic updating and cross-domain integration—help drive the ongoing evolution and innovation of knowledge systems.</span></p><p style="background-color:#ffffff; line-height:20pt; margin:0pt 0pt 11.25pt; text-align:justify"><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; text-transform:none">Nevertheless, building knowledge systems through text mining still presents challenges. First, natural language is inherently complex, marked by ambiguity, polysemy, metaphor, and flexible syntax, all of which complicate the identification and extraction of key concepts. Second, text quality also varies, with common issues such as misspellings, grammatical errors, and inconsistent abbreviations. Noise from irrelevant or redundant content, including advertisements, can further hinder accurate concept extraction and raise processing costs. Third, some disciplines are highly specialized, with unique terminologies, while others—especially emerging interdisciplinary fields—lack standardized conceptual boundaries. These conditions require researchers to possess deep domain knowledge and analytical precision, making knowledge extraction more demanding. Finally, many semantic relationships—such as causality, hyponymy or coordination—are implicit and require advanced analysis and reasoning to be accurately captured and represented in semantic networks.</span><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; text-transform:none">&nbsp;</span></p><p style="background-color:#ffffff; line-height:20pt; margin:0pt 0pt 11.25pt; text-align:justify"><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; text-transform:none">Text mining presents new opportunities for building knowledge systems and offers significant academic value. By automating the processing of vast academic corpora, it enables efficient extraction of key concepts, terms, and their semantic relationships, supporting the systematization and structuring of disciplinary knowledge. It can also detect research frontiers and emerging trends, revealing the inner dynamics of knowledge evolution and informing strategic planning. Furthermore, text mining fosters interdisciplinary integration and innovation, offering methodological tools for the growth of new, hybrid fields. Leveraging these capabilities will advance the refinement and expansion of knowledge systems and support the development of an independent knowledge system.</span></p><p style="background-color:#ffffff; line-height:20pt; margin:0pt 0pt 11.25pt; text-align:justify"><span style="background-color:#ffffff; color:#333333; font-family:&#39;Times New Roman&#39;; font-size:12pt; font-style:normal; font-weight:bold; text-transform:none">Wang Shunyu is a professor from the School of Foreign Languages at Xijing University.</span></p><p style="line-height:20pt; margin:0pt; orphans:0; text-align:justify; widows:0"><span style="font-family:&#39;Times New Roman&#39;; font-size:12pt">&nbsp;</span></p><p><br/></p></form>]]></p>
			<b>2025-04-28 11:14</b>
			<dd>541</dd>
			<ul>
				<li />
				<li />
				<li />
			</ul>
		</div>
		<a id="perv" href="../103002/5190.aspx">Climate leadership at a critical time</a>
		<a id="next" href="../103002/5197.aspx">Newly revised law contributes to urban heritage preservation</a>
		<ul id="new">
			<li>
				<a href="../103002/5557.aspx">Chinese sociology needs to renew holistic perspectives</a>
			</li>
			<li>
				<a href="../103002/5556.aspx">Davos: When Actions Speak Louder than Words</a>
			</li>
			<li>
				<a href="../103002/5554.aspx">Collective Prosperity: Zhejiang's Rural Cluster Development Model</a>
			</li>
			<li>
				<a href="../103002/5553.aspx">Investment in people a key policy imperative</a>
			</li>
			<li>
				<a href="../103002/5549.aspx">Smart Cities in China: Lessons for Data-Driven Urban Governance</a>
			</li>
			<li>
				<a href="../103002/5548.aspx">Adaptive co-evolution China’s five-year plans promote the symbiotic development of technology and institutions</a>
			</li>
			<li>
				<a href="../103002/5547.aspx">New quality productive forces to promote green lifestyles</a>
			</li>
			<li>
				<a href="../103002/5546.aspx">Innovation anchors China's high-quality growth in new five-year plan</a>
			</li>
			<li>
				<a href="../103002/5543.aspx">Rural elderly care entails multistakeholder participation</a>
			</li>
			<li>
				<a href="../103002/5542.aspx">Bold leap toward the future</a>
			</li>
		</ul>
		<ul id="hot">
			<li>
				<a id="472" href="../103002/5525.aspx">Pioneering path</a>
			</li>
			<li>
				<a id="371" href="../103002/5503.aspx">Stabilizing engine for the world</a>
			</li>
			<li>
				<a id="323" href="../103002/5500.aspx">Agreement must move on up</a>
			</li>
			<li>
				<a id="265" href="../103002/5480.aspx">World looks to new engines for growth in 2026</a>
			</li>
			<li>
				<a id="240" href="../103002/5517.aspx">2025 a year of global health milestones, challenges</a>
			</li>
			<li>
				<a id="233" href="../103002/5476.aspx">World looks to new engines for growth in 2026</a>
			</li>
			<li>
				<a id="219" href="../103002/5499.aspx">Transition empowerment</a>
			</li>
			<li>
				<a id="193" href="../103002/5498.aspx">Stability, openness and development</a>
			</li>
			<li>
				<a id="176" href="../103002/5488.aspx">How Livestream E-Commerce Connects Digital Commerce with Cultural Heritage</a>
			</li>
			<li>
				<a id="152" href="../103002/5490.aspx">Investing in People: China’s New Growth Priority</a>
			</li>
		</ul>
	</body>
</html>