-
Notifications
You must be signed in to change notification settings - Fork 0
sequence轉id, id轉sequence
tsungjung411 edited this page Dec 1, 2017
·
6 revisions
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/preprocessing/label.py
- 序列轉ID
- ID轉序列
- sequence2id 功能 (sequence 轉 id, 序列轉id)
- id2sequence 功能 (id 轉 sequence, id轉序列)
- id 通常是指唯一的、不重複的流水號
Python程式碼:
from sklearn.preprocessing import LabelEncoder
import numpy
le = LabelEncoder()
le.fit(['BOS', '台南', '的', '霾害', '揮之不去', 'EOS'])
# sequence2id (sequence 轉 id, 序列轉id)
print("序列轉 ID:")
print("ID('<BOS>') = ", le.transform(['<BOS>']) )
print("ID('台南') = ", le.transform(['台南']) )
print("ID('的') = ", le.transform(['的']) )
print("ID('霾害') = ", le.transform(['霾害']) )
print("ID('揮之不去') = ", le.transform(['揮之不去']) )
print("ID('<EOS>') = ", le.transform(['EOS']) )
print("ID('<BOS>', '台南', '<EOS>') = ", le.transform(['<BOS>', '台南', '<EOS>']) )
# id2sequence (id 轉 sequence, id轉序列)
print()
print("ID 轉序列:")
print("Sequence([0,2,1]) = {}".format(le.inverse_transform([0,2,1])))
print()
print("序列清單:")
print(le.classes_)
執行結果:
序列轉 ID:
ID('<BOS>') = [0]
ID('台南') = [2]
ID('的') = [4]
ID('霾害') = [5]
ID('揮之不去') = [3]
ID('<EOS>') = [1]
ID('<BOS>', '台南', '<EOS>') = [0 2 1]
ID 轉序列:
Sequence([0,2,1]) = ['<BOS>' '台南' '<EOS>']
序列清單:
['<BOS>' '<EOS>' '台南' '揮之不去' '的' '霾害']
#NLU, #tool, #LabelEncoder
tj_tsai / tsungjung411@yahoo.com.tw / tsungjung411@gmail.com