Skip to content

sequence轉id, id轉sequence

tsungjung411 edited this page Dec 1, 2017 · 6 revisions

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/preprocessing/label.py

NLU 常用工具 / 計算 f1-score 會用到的工具

  • 序列轉ID
  • ID轉序列

LabelEncoder 提供底下功能

  • sequence2id 功能 (sequence 轉 id, 序列轉id)
  • id2sequence 功能 (id 轉 sequence, id轉序列)
  • id 通常是指唯一的、不重複的流水號

Python程式碼:

from sklearn.preprocessing import LabelEncoder
import numpy

le = LabelEncoder()
le.fit(['BOS', '台南', '的', '霾害', '揮之不去', 'EOS'])

# sequence2id (sequence 轉 id, 序列轉id)
print("序列轉 ID:")
print("ID('<BOS>') = ",     le.transform(['<BOS>']) )
print("ID('台南') = ",     le.transform(['台南']) )
print("ID('的') = ",      le.transform(['的']) )
print("ID('霾害') = ",     le.transform(['霾害']) )
print("ID('揮之不去') = ", le.transform(['揮之不去']) )
print("ID('<EOS>') = ",     le.transform(['EOS']) )
print("ID('<BOS>', '台南', '<EOS>') = ", le.transform(['<BOS>', '台南', '<EOS>']) )

# id2sequence (id 轉 sequence, id轉序列)
print()
print("ID 轉序列:")
print("Sequence([0,2,1]) = {}".format(le.inverse_transform([0,2,1])))

print()
print("序列清單:")
print(le.classes_)

執行結果:

序列轉 IDID('<BOS>') =  [0]
ID('台南') =  [2]
ID('的') =  [4]
ID('霾害') =  [5]
ID('揮之不去') =  [3]
ID('<EOS>') =  [1]
ID('<BOS>', '台南', '<EOS>') =  [0 2 1]

ID 轉序列Sequence([0,2,1]) = ['<BOS>' '台南' '<EOS>']

序列清單:
['<BOS>' '<EOS>' '台南' '揮之不去' '的' '霾害']

#NLU, #tool, #LabelEncoder

Clone this wiki locally