ictclas分词api---c++

上传人:第*** 文档编号:34037600 上传时间:2018-02-20 格式:DOC 页数:6 大小:55.50KB
返回 下载 相关 举报
ictclas分词api---c++_第1页
第1页 / 共6页
ictclas分词api---c++_第2页
第2页 / 共6页
ictclas分词api---c++_第3页
第3页 / 共6页
ictclas分词api---c++_第4页
第4页 / 共6页
ictclas分词api---c++_第5页
第5页 / 共6页
点击查看更多>>
资源描述

《ictclas分词api---c++》由会员分享,可在线阅读,更多相关《ictclas分词api---c++(6页珍藏版)》请在金锄头文库上搜索。

1、ICTCLAS 分词 API-C+1 字符编码enum eCodeType CODE_TYPE_UNKNOWN, / type unknown,system will automatically detectCODE_TYPE_ASCII, / ASCIICODE_TYPE_GB, / GB2312,GBK, gb18030CODE_TYPE_UTF8, / UTF-8CODE_TYPE_BIG5 / BIG5;Jni 中定义为 int 型,分别对应如下:(0:编码未知,系统将会自动识别)(1:ASCII)(2:gb2312、GBK、gb18030)(3:UTF-8)(4:BIG5)2 ICT

2、CLAS_Initbool ICTCLAS_Init(const char* pszInitDir=NULL);参数 pszInitDir:初始化路径始化路径,应包含配置文件(Configure.xml )和词典目录 (Data 目录)以及授权文件(user.lic). 如果这些文件及目录在系统运行当前目录下,此参数可以为 null。示例:if(!ICTCLAS_Init() printf(Init failsn);return -1;else printf(okn);3 ICTCLAS_Exitbool ICTCLAS_Exit( );返回值成功返回 true;否则返回 false。ICTC

3、LAS_Exit();4 ICTCLAS_ImportUserDictunsigned int ICTCLAS_ImportUserDict(const char *sFilename,eCodeType eCT)Return ValueThe number of lexical entry imported successfullyParameterssFilename: Text filename for user dictionaryeCT:Character encoding typeRemarksThe ICTCLAS_ImportUserDict function works pr

4、operly only if ICTCLAS_Init succeeds.The text dictionary file foramt see User-defined Lexicon.You only need to invoke the function while you want to make some change in your customized lexicon or first use the lexicon. After you import once and make no change again, ICTCLAS will load the lexicon aut

5、omatically if you set UserDict on in the configure file. While you turn UserDict off, user-defined lexicon would not be applied.unsigned int nItems=ICTCLAS_ImportUserDict(userdict.txt,CODE_TYPE_GB);5 ICTCLAS_ParagraphProcessint ICTCLAS_ParagraphProcess(const char *sParagraph,int nPaLen,eCodeType eCt

6、,int bPOStagged,char* sResult);Return ValueReturn the pointer of result buffer and the length of the result.ParameterssParagraph: The source paragraphnPaLen: The length of the paragrapheCodeType: The character coding type of the stringbPOStagged: Judge whether need POS tagging, 0 for no tag; 1 for t

7、agging; default:1.sResult: The processing results char* sSentence=当西班牙人捧起大力神杯时,西班牙首相萨帕特罗也激动不已;int nPaLen=strlen(sSentence);char* sRst=0;/用户自行分配空间,用于保存结果;sRst=(char *)malloc(nPaLen*6);/建议长度为字符串长度的 6倍。int nRstLen=0;nRstLen=ICTCLAS_ParagraphProcess(sSentence,nPaLen,CODE_TYPE_GB,1,sRst);printf(The resul

8、t is:n%sn,sRst);6 ParagraphProcessAt_pstRstVec ICTCLAS_ParagraphProcessA(const char *sParagraph,int PaLen,eCodeType eCodeType,int bPOStagged,int Return Valuethe pointer of result vector, it is managed by system, user cannot alloc and free it.struct stResultint start; /start positionint length; /leng

9、th#ifdef POS_TAGGERint iPOS; /POS char sPOSPOS_SIZE;/word type#endifint word_ID; /word_IDint word_type; /Is the word of the users dictionary?(0-no,1-yes)int weight;/ word weight;ParameterssParagraph: The source paragraphnPaLen: The length of the paragrapheCodeType: The character coding type of the s

10、tringbPOStagged: Judge whether need POS tagging, 0 for no tag; 1 for tagging; default:1.nRstcnt: The length of the processing resultchar* sSentence=当西班牙人捧起大力神杯时,西班牙首相萨帕特罗也激动不已;int nPaLen=strlen(sSentence);char* sRst=0;/用户自行分配空间,用于保存结果;sRst=(char *)malloc(nPaLen*6);/建议长度为字符串长度的 6倍。int rstCount=0;t_ps

11、tRstVec stVec=ICTCLAS_ParagraphProcessA(sSentence,nPaLen,CODE_TYPE_GB,g_bPOSTagged,rstCount);for (int i=0;irstCount;i+)/打印分词结果printf(start=%d,length=%drn,rstVeci.start,rstVeci.length);ICTCLAS_ResultFree(rstVec); /调用接口释放内存Outputstart=0,length=2start=2,length=6start=8,length=2start=10,length=2start=12

12、,length=2start=14,length=6start=20,length=2start=22,length=2start=24,length=2start=26,length=6start=32,length=4start=36,length=8start=44,length=2start=46,length=4start=50,length=47 ICTCLAS_FileProcessbool ICTCLAS_FileProcess(const char *sSrcFilename,eCodeType eCt,const char *sDsnFilename,int bPOStag

13、ged);Return ValueReturn true if processing succeed. Otherwise return false.ParameterssSourceFilename: The source file path to be analysized;eCodeType: The character code type of the source filesDsnFilename: The result file name to store the results.bPOStagged: Judge whether need POS tagging, 0 for n

14、o tag; 1 for tagging; default:1.示例:ICTCLAS_FileProcess(Test.txt, CODE_TYPE_GB,Test_result.txt,1);8 ICTCLAS_SetPOSmapselect which pos map will use.int ICTCLAS_SetPOSmap(int nPOSmap);Return ValueReturn 1 if excute succeed. Otherwise return 0.ParametersParameters :nPOSmap : ICT_POS_MAP_FIRST 计算所一级标注集IC

15、T_POS_MAP_SECOND 计算所二级标注集PKU_POS_MAP_SECOND 北大二级标注集PKU_POS_MAP_FIRST 北大一级标注集示例:ICTCLAS_SetPOSmap(ICT_POS_MAP_FIRST);9 ICTCLAS _GetWordIdint ICTCLAS_GetWordId(const char *sWord,int nWrdLen,eCodeType eCT);Return ValueThe value of the WordID.ParameterssWord: The target wordnWrdLen: The length of the wo

16、rdeCodeType: The character type char* sWord=我们;int Rest=0;int nWrdLen=strlen(sWord);Rest=ICTCLAS_GetWordId(sWord, nWrdLen,CODE_TYPE_GB);printf(The WordID is: %d,Rest);Output The result is 6388910 ICTCLAS_ResultFreebool ICTCLAS_ResultFree ( t_pstRstVec pRetVec)Return ValueReturn true if excute succeed. Otherwise return false.Parameterst_pstRstVec: the point

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 办公文档 > 解决方案

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号