中文分词
- 网络chinese word segmentation
-
基于双数组Trie树中文分词研究
Research of Chinese Word Segmentation Based on Double-Array Trie
-
近五年我国中文分词研究论文计量分析
Bibliometric Study on Chinese Word Segmentation Papers of China in the Past Five Years
-
WEB文本挖掘的中文分词系统的设计与实现
Design and Implementation of Chinese Automatic Word-cut in Web Text Mining
-
基于中文分词的OWL-S/UDDI语义Web服务检索模型
OWL-S / UDDI semantic web services retrieval model based on Chinese word segmentation
-
Web文本挖掘中的一种中文分词算法研究及其实现
The research and implementation on a Chinese automatic word - segment algorithm in Web text mining
-
基于K最短路径的中文分词算法研究与实现
Research and Implementation of Chinese Word Segmentation Algorithm Based on K Shortest Paths
-
主要从Web端的用户查询接口和中文分词技术等两个方面进行了阐述。
From the end of the main Web user interface and query Chinese word segmentation and other technical aspects are described .
-
主要研究了基于隐马尔科夫模型的中文分词以及全文检索引擎,系统的主体是使用C语言开发完成,是一个简单易用,功能强大的舆情监控系统。
It main research Chinese word and full-text search engine based on Markov model . The main body of the system is development completed by the use of C language .
-
然后对网络蜘蛛、移动远程信息收集Agent、中文分词和全文索引建立、个人偏好处理以及搜索引擎信息检索等相关技术进行了分析研究;
Thirdly , it analyzes the technology of the spiders , the mobile long-distance collecting agent , the Chinese participle and the whole-text index , the personal-interest and the information search .
-
本文首次使用SVM方法来完成中文分词的任务,使用上下文窗体属性和基于规则的属性对样本进行刻画。
Here we explore SVM for a Chinese word segmentation task , use the context attributes and rule-based attributes as the features for a sample .
-
通过Spider自动抓取页面技术、中文分词等技术方法,设计了Web文本挖掘原型,对实用的Web挖掘系统的开发具有较好的参考价值。
By using Spider crawling automatic page technology and Chinese word segmentation techniques , Web text mining prototype is designed . It does benefits for the exploit of the Web mining system .
-
Katz平滑算法在中文分词系统中的应用
Application of Katz Smoothing in Chinese Word Segmentation System
-
基于中文分词和全文检索技术的OPAC资源整合探讨
The Integration of OPAC System Based on Chinese Word Segmentation and Full-text Retrieval Technology
-
根据web文本环境的特点,研究重点在于中文分词中的未登录词识别问题,同时兼顾切分歧义消解、整体切分准确率和高效处理海量文本的能力。
According to the characteristics of web text , we focused on the Out-of-Vocabulary ( OOV ) word identification problem in Chinese word segmentation task , as well as the disambiguation and the ability to deal with huge amount of text .
-
但是Lucene仍有许多不足的地方需要进行改进,特别是在中文分词的处理上。
But in practice there are still many aspects to be improved , particularly in its handling of Chinese word segmentation .
-
DI索引方法采用倒排文件索引机制及中文分词技术,建立了绝对索引模型和相对索引模型,能有效支持各种形式的路径表达式,又不会占用过大的空间。
Applying inverted file indexing technique and Chinese word segmentation technique , DI method builds an absolute index model and a relative one . The method can efficiently support diverse queries .
-
搜索引擎作为传统IR技术在Web上的扩展,涉及至数据收集、中文分词技术、倒排索引、隐含数据获取、分布式结构、海量数据存储、用户行为分析等关键技术。
Search engine is the extendability of traditional IR techniques , concerning the key techniques : data collection , Chinese word segmentation , inverted index , retriving hidden data , distributed architectures , huge data store , analysis of human behavior , etc.
-
在保证分词速度的同时,也提高了结果的准确率。再次,在基于词典和统计的中文分词算法的基础上,设计并运用JAVAWeb技术实现了中文分词系统。
In ensuring the advantage of speed and improving the accuracy of the results . Moreover , in the dictionary and statistics based on the Chinese word segmentation algorithm based on the design and use Java Web technology to achieve the Chinese word segmentation system .
-
此Ftp搜索引擎不仅能够自动生成标准格式的XML资源文档,而且采用基于字典的前向最大匹配中文分词法在Lucene中动态更新全文索引。
And the new designed Ftp search engine can generate an XML resource documents by standard format automatically , thus maximally match Chinese words segmentation and update the full text index dynamically in the Lucene documents .
-
其中标准检索模块采用Lucene搜索引擎技术,利用ICTCLAS中文分词组件,来实现有效的主题词检索。
The standard retrieval module uses Lucene search engine technology , using ICTCLAS Chinese word components , to achieve efficient topic word retrieval .
-
针对第一部分的工作,作者对系统中所涉及到的关键技术数据挖掘技术、LDA模型、中文分词进行了研究,给出了精准广告系统的设计方案并对各模块的功能进行了说明。
For the first part of the work , the authors gives advertising accurate system design and describes the function of each module by learning the data mining techniques , LDA model and Chinese word segmentation .
-
中文分词采用的中国科学院计算技术研究所汉语词法分析系统ICTCLAS的开源代码。
The code of Chinese word segmentation is adopted open source codes of ICTCLAS of Institute of Computing Technology , Chinese Academy of Science .
-
2003年在日本札幌举行了第一届ACL-SIGHAN国际中文分词竞赛。
The ACL-SIGHAN sponsored the First International Chinese Word Segmentation in July , 2003 in Japan .
-
接着阐述利用ICTCLAS分词工具和旅游领域词汇相结合进行的中文分词处理,停用词过滤的分析。
Secondly , state the Chinese word segmentation using ICTCLAS word segmentation tool and tourism domain vocabulary , and analyze the filtering of stop words .
-
中科院计算所采用基于层次隐马尔科夫模型,自主研发出来的ICTCLAS中文分词系统对本文提供了莫大的帮助。
Chinese word segmentation system ICTCLAS CAS , which was independently developed by Computing Institute hierarchical based on hidden Markov model , provides a great help for this paper .
-
在分词和索引部分,详细地探讨中文分词和Lucene的索引机制,最后概述了检索的原理。紧接着,本文开始着手基于Lucene的音乐资讯垂直搜索引擎的分析和设计。
In the section of word segmentation and index , a mechanism of Chinese segmentation and Lucene indexing in detail is given , an overview of search besides . Then , Thesis starts to focus on a design of the lucene-based music news vertical search engine .
-
本文针对词法分析中的中文分词、词性标注和动词细分类进行了深入的研究并实现了一个实用化的词法分析系统IRLAS。
This paper makes an intensive study of Chinese word segmentation , part of speech tagging and verb subdivision of lexical analysis and develops a practical lexical analysis system named IRLAS .
-
最初,它是以开源项目Luence为应用主体的,结合词典分词和文法分析算法的中文分词组件。
Initially , it is based on the application of the main open source project Luence , the combination of sub-word dictionary and grammar of Chinese word segmentation algorithm components .
-
对检索技术、排序算法和中文分词技术进行了重点研究和总结,并针对词典分词法的不足,使用了改进的基于三数组Trie索引树匹配算法,充分实现了智能分词的原则。
In addition to research and concluded search technology , sorting algorithm and Chinese word segmentation techniques . For the lack of dictionary segmentation , improve the use of three array based Trie index tree matching algorithm , realize fully of the principle of " brainpower segmentation " .
-
提出了一种基于专有名词优先的中文分词方法:利用专业词典、通用词典和同义词词典相结合的词典机制,优先切分专有名词,对粗分结果利用Trigram模型进行消歧而获取最终结果。
Design a Chinese word segmentation algorithm based on priority special name : using dictionary mechanism which combine synonyms dictionary , general dictionary and special dictionary , cut the sentences by special name firstly , and get the segmentation result of disambiguating with Trigram mode lastly .