yǔ liào
  • corpus;data
  1. Space到底该怎么定名?&一项基于真实语料的调查研究

    The Denomination of Space : a Study Based on Corpus

  2. 还给出了新的算法在2001文本检索会议(TREC10)的Web语料上的实验数据。

    Experiments data are also given on the 2001 text retrieval conference ( TREC 10 ) Web corpus .

  3. 基于Ontology的Web语料的挖掘

    Web English Corpora Mining Based on the Ontology

  4. 提出了Web语料库的概念,并且通过讨论Web信息的抽取、分类及语料的标注等来讲述语料库的构建。

    This thesis first gives out the concept of Web corpus and discusses the construction of corpus through discussing the information extraction from Web , the classification and the notification of corpus .

  5. 因特网语料自动下载分析软件的设计利用P2P网络传播技术上传和下载文件行为到底是侵权还是合法?

    The Design of software for Downloading and Analysing web-pages Automatically Are the downloading and uploading by P2P software legal or illegal ?

  6. 翻译过程中翻译策略和翻译单位的TAPs语料研究

    Investigating Translation Strategies and Units in the Translation Process : A Think-aloud Protocol Study

  7. 本论文以英语和汉语的真实语料为基础,拟对空语类PRO在英、汉语中的照应性约束特征进行研究。

    This paper is intended to investigate into the properties of anaphoric binding of PRO , which is one type of empty categories .

  8. 然后使用信息增益(IG)对得到的特征进行特征选择。最后使用SVM和NB分类器对语料的特征向量进行分类。

    Then we selected features using information gain ( IG ) for classifying features vectors with SVM and NB classifier .

  9. 第一章介绍了bilεn语法化研究的意义、现状、方法、语料来源等内容。

    First chapter tells studying meaning , status , methods , sources of the contents of the corpus of grammaticalization .

  10. 本文基于大量真实的WTO语料,考察WTO文本的语言现象,分析其特有的句法特征,并探讨其汉译的一些策略。

    This paper , based on a large of corpus of authentic WTO texts , examines their linguistic , particularly their syntactic features and the strategies for translating such texts into Chinese .

  11. 一般认为,V+感觉N结构中定语时态ル形居多,但我们利用语料库得出了不同的结论。

    It is generally held that the tense adopted in attributive structure " V + Sense Noun " is mostly realized by "?" . However , a corpus-based study arrived at a different conclusion .

  12. 通过对语料进行潜在语义索引和增加FAQ反馈,不断增强系统的回答能力。

    It is able to update and improve the answering ability by LSI and FAQ .

  13. 我们还开发了一个标注辅助工具,收集了200篇突发事件领域的新闻报道作为生语料并对其进行了标注,制作了一个中文事件语料库(ChineseEventCorpus,CEC)。

    Further more , we have developed an annotation tool , collected 200 reported articles about emergencies as raw corpus and annotated it to build a Chinese event corpus ( CEC ) .

  14. 对于语者辨识,语者特定模型直接用语者的语料借助于期望值最大化算法(EM)来训练,辨识算法采用了最大事后概率法则(MAP);

    For speaker identification , Expectation Maximization Algorithm ( EM ) is adopted to train speaker dependent model , and afterwards recognize speaker according to Maximum a Posteriori Criterion ( MAP ) .

  15. 本文中所用的双语语料是LDC的关于香港的双语新闻报道。

    The bilingual corpus in this paper is LDC parallel texts in Hong Kong newspaper .

  16. 针对大规模单语语料资源,提出了采用B-tree结构的二级索引机制;

    In view of large-scale single language materials resources , proposed uses B-tree thc structure two level of index mechanism ;

  17. 以此理论框架为指导,本文作者根据收集的语料探讨了ESP教师提问中常用的提问策略,旨在研究教师提问的顺应过程。

    Directed by this theoretical framework , the author discusses the questioning strategies prevailing in ESP classes and purports to examine the adaptability of ESP teacher 's questioning in the Chinese context .

  18. 作为一种古老的英语修辞格,pun通常被定义为通过运用一词多义或同音异义手段来使一段语料同时表达多种含义。

    Pun is an old English rhetorical device in English language , commonly defined as humorous uses of a word with many senses or words alike in sound .

  19. 传统的统计方法基于贪婪原则,常以语料的似然函数或困惑度(perplexity)作为评价标准。

    Conventional statistical clustering methods usually base on greedy principle . The common Metric for evaluating a clustering algorithm is the likelihood function or perplexity of the corpus .

  20. 搭建了平行语料联合训练条件下基于GMM模型的语音转换平台作为基准系统,并具体分析了传统语音转换方法存在的问题。

    Set up a GMM transformed voice platform with joint training under the parallel corpora , and analyzed the problems existing in the traditional voice conversion method . 3 .

  21. 应用于以BNC等新闻语料和COSE校园搜索中的实体关系网络搭建的两个系统中。

    The visualization system has been applied to two systems with BNC news corpus and COSE campus search entity relationship corpus .

  22. 评测结果对比表明基于tf-idf词权重信息的余弦相似度方法改善了缺少评测语料的多文档文摘自动评测的质量。

    A comparison of term weight calculation strategies shows that the tf-idf weighting has improved the cosine similarity based multi-document summarization evaluation performance with less test data .

  23. 采用中国学习者语料库(CLEC)和本族语语料库(Brown和LOB语料库)中的真实语料,对中国大学生的动词/名词搭配使用行为进行研究。

    The verb / noun collocation patterns of the Chinese college students are investigated by comparing the language data from Chinese Learner Corpus ( CLEC ) and those form native speaker corpora ( Brown and LOB ) .

  24. 实验表明,在仅有几千条标准汉英双语语料的情况下,两个系统开放测试的BLEU评分分别为0.7167和0.5531,基本达到了实用化的翻译水平。

    The experiments show that , with the help of several thousand bilingual pairs , the two systems reach the 0.7167 and 0.5531 BLEU score in the open test respectively .

  25. 本文以BBS会话内容为语料来源,以语言顺应论为理论基础,运用定性和定量的研究方法,从语用学角度研究BBS语篇中中英语码转换现象。

    Taking BBS discourse as the data source and Linguistic Adaptation Model as theoretical base , the present thesis conducts a qualitative and quantitative study of the Chinese / English code-switching in the discourse from the pragmatic perspective .

  26. 本论文基于2-gram统计模型而实现一种能很好适应语料信息的分词算法,且时间和精度都能满足文本知识管理系统的应用需要。

    The algorithm in this dissertation is based on 2-gram statistical model and can meet the requirements of application in accuracy and efficiency respectively .

  27. 本论文选择流传广泛而普及的现代英语小说&PrideandPrejudice(译作:《傲慢予偏见》)作为语料来源,将英语原著及汉语译本里的if-假设句作为对比材料。

    Moreover , not all the Chinese grammar textbooks have the systematic analysis of this issue . This essay has chosen the broadly popular modern English novel Pride and Prejudice as the language attributes and has compared the if-hypothetical sentences in both the English version and the Chinese translation version .

  28. 基于该算法的系统在TDT4中文语料上进行了测试,结果表明该算法属于目前结果最好的算法之一,并显著降低了算法的时间和空间复杂度。

    The system implemented with the algorithm has been tested on TDT4 corpus and got satisfactory results while reducing time and space complexity of algorithm remarkably .

  29. 近20年来大规模的真实语料统计分析表明,自然话语中的70%是由单词和固定短语之间的一种半固定的板块结构来实现的(Altenberg,1991)。

    Large-scale statistic analyses of real language materials in recent 20 years have shown that 70 percent of natural utterances are completed by the semi-fixed " chunks " existing between words and fixed phrases ( Altenberg , 1991 ) .

  30. 在将来的研究中,还要进一步扩充语料。

    In future studies , the corpus should be substantially extended .