-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsample.txt
592 lines (592 loc) · 31.4 KB
/
sample.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/317420253
A survey on extractive text summarization
Conference Paper · January 2017
DOI: 10.1109/ICCCSP.2017.7944061
CITATIONS
26
READS
3,140
2 authors:
Some of the authors of this publication are also working on these related projects:
human action recognition View project
Human Activity Recognition View project
N. Moratanch
Anna University, Chennai
4 PUBLICATIONS 51 CITATIONS
SEE PROFILE
Chitrakala Gopalan
Anna University, Chennai
115 PUBLICATIONS 211 CITATIONS
SEE PROFILE
All content following this page was uploaded by N. Moratanch on 07 November 2017.
The user has requested enhancement of the downloaded file.
IEEE International Conference on Computer, Communication, and Signal Processing (ICCCSP-2017)
A Survey on Extractive Text Summarization
N.Moratanch* ,S.Chitrakala t
*Research Scholar, t Associate Professor
*tDepartment of CSE
Anna University,CEG,Chennai
* tancyanbil@ gmail.com, t aU.chitras@ gmail.com
Abstract-Text Summarization is the process of obtaining
salient information from an authentic text document. In this
technique, the extracted information is achieved as a summarized
report and conferred as a concise summary to the user. It is very
crucial for humans to understand and to describe the content
of the text. Text Summarization techniques are classified into
abstractive and extractive summarization. The extractive summarization
technique focuses on choosing how paragraphs,important
sentences, etc produces the original documents in precise form.
The implication of sentences is determined based on linguistic
and statistical features. In this work, a comprehensive review
of extractive text summarization process methods has been
ascertained. In this paper, the various techniques, populous
benchmarking datasets and challenges of extractive summarization
have been reviewed. This paper interprets extractive text
summarization methods with a less redundant summary, highly
adhesive, coherent and depth information.
Index Terms-Text Summarization, Unsupervised Learning,
Supervised Learning, Sentence Fusion, Extraction Scheme, Sentence
Revision, Extractive Summary
I. INTRODUCTION
In a recent advance, the significance of text summarization
[1] accomplishes more attention due to data inundation on
the web. Hence this information overwhelms yields in the
big requirement for more reliable and capable progressive text
summarizers. Text Summarization gains its importance due to
its various types of applications just like the summaries of
books, digest- (summary of stories), the stock market, news,
Highlights- (meeting, event, sport), Abstract of scientific papers,
newspaper articles, magazine etc. Due to its tremendous
growth, many finest universities like Faculty of Informatics
- Masaryk University, Czech Republic, Concordia University,
Montreal, Canada- Semantic Software Lab, IHR Nexus Lab
at Arizona State University, Arizona, USA and finally Lab of
Topic Maps-Leipzig University, Germany has been persistently
working on its rapid enhancements.
Text summarization has grown into a crucial and appropriate
engine for supporting and illustrate text content in the latest
speedy emergent information age. It's far very complex for
humans to physically summarize oversized documents of
text. There is a wealth of textual content available on the
internet. But, usually, the internet contribute more data than
is desired. Therefore, a twin problem is detected: Seeking for
appropriate documents through an awe-inspiring number of
reports offered, and fascinating a high volume of important
information. The objective of automatic text summarization is
to condense the origin text into a precise version preserves
978-1-5090-3716-2/17/$31.00 ©2017 IEEE
its report content and global denotation. The main advantage
of a text summarization is reading time of the user can be
reduced. A marvelous text summary system should reproduce
the assorted theme of the document even as keeping repetition
to a minimum. Text Summarization methods are publicly
restricted into abstractive and extractive summarization.
An extractive summarization technique consists of selecting
vital sentences, paragraphs, etc, from the original manuscript
and concatenating them into a shorter form. The significance
of sentences is strongly based on statistical and linguistic
features of sentences. This paper generally summarizes the
extensive methodologies fitted, issues launch, exploration and
future directions in text summarization. This paper [1] is
organized as follows. Section 2 depicts about the features for
extractive text summarization, Section 3 describes extractive
text summarization methods, Section 4 illustrate inferences
made, Section 5 represent challenges and future research
directions, Section 6 detail about evaluation metrics and the
final sketch is the conclusion.
II. FEATURES FOR EXTRACTIVE TEXT
SUMMARIZATION
Earlier techniques involve assigning a score to sentences
based on a countenance that are predefined based on the
methodology applied. Both word level and sentence level
features are employed in text summarization literature. Certain
features discussed are [2] [3] [4]used to exclusive sentences
to be included in the summary are:
1. WORD LEVEL FEATURES
1.1 Content Word feature
Keywords are essential in identifying the importance of the
sentence. The sentence that consists of main keywords is most
likely included in the final summary. The content (keyword)
words are words that are nouns, verbs, adjectives and adverbs
that are commonly determined based on tf x idf measure.
1.2 Title Word feature
The sentences in the original document which consists of
words mentioned in the title have greater chances to contribute
to the final summary since they serve as indicators of the theme
of the document.
IEEE International Conference on Computer, Communication, and Signal Processing (ICCCSP-2017)
1.3 Cue phrase feature
Cue phrases are words and phrases that indicate the structure
of the document flow and it is used as a feature in
sentence selection. The sentence that contains cue phrases
(e.g. "denouement", "because", "this information", "summary",
"develop", "desire" etc.) are mostly to be included in the final
summary.
1.4 Biased word feature
The sentences that consist of biased words are more likely
important. The biased words are a list of the predefined set
of words that may be domain specific. They are relatively
important words that describe the theme of the document.
1.5 Upper case word feature
The words which are in uppercase such as "UNICEF" are
considered to be important words and those sentences that
consist of these words are termed important in the context of
sentence selection for the final summary.
2. SENTENCE LEVEL FEATURES
2.1 Sentence location feature
The sentences that occur in the beginning and the conclusion
part of the document are most likely important since most
documents are hierarchically structured with important information
in the beginning and the end of the paragraphs.
2.2 Sentence length feature
The sentence length plays an important role in identifying
key sentences. Shorter texts do not convey essential information
and very long sentences also need not be included in the
summary. The normalized length of the sentence is calculated
as the ratio between a number of words in the sentence to the
number of words in the longest sentence in the document.
2.3 Paragraph location feature
Similar to sentence location, paragraph location also plays
a crucial role in selecting key sentences. A Higher score is
assigned to the paragraph in the peripheral sections (beginning
and end paragraphs of the document).
2.4 Sentence-to-Sentence Cohesion
The cohesion between sentences for every sentence(s), the
similarity between s and alternative sentences are calculated
which are summed up and coarse value of the aspect is
obtained for s. The feature values are normalized between
[0, 1] where value closer to 1.0 indicates a higher degree of
cohesion between sentences.
III. EXTRACTIVE TEXT SUMMARIZATION
METHODS
Extractive Text Summarization methods can be broadly
classified as Unsupervised Learning and Supervised learning
methods. Recent works rely on Unsupervised Learning methods
for text summarization.
978-1-5090-3716-2/17/$31.00 ©2017 IEEE
Fig. 1. Overview of Extractive Summarization
A. UNSUPERVISED LEARNING METHODS
In this section, unsupervised techniques for sentence extraction
task is discussed. The unsupervised approaches do not
need human summaries (user input) in deciding the important
features of the document, it requires the most sophisticated
algorithm to provide compensation for the lack of human
knowledge. Unsupervised summaries provide a higher level
of automation compared to supervised model and are more
suitable for processing Big Data. Unsupervised learning models
have proved successful in text summarization task.
Fig. 2. Overview of Unsupervised Learning Methods
1. Graph based approach
Graph-based models are extensively used in document summarization
since graphs can efficiently represent the document
structure. Extractive text summarization using external knowledge
from Wikipedia incorporating bipartite graph framework
[4 ]has been used. They have proposed an iterative ranking
algorithm (variation of HITS algorithm [5]) which is efficient
in selecting important sentences and also ensures coherency
in the final summary. The uniqueness of this paper is that
it combines both graph based and concept based approach
towards summarization task. Another graph based approach
LexRank [6], where the salience of the sentence is determined
by the concept of Eigen vector centrality. The sentences in the
document are represented as a graph and the edges between
the sentences represents weighted cosine similarity values. The
sentences are clustered into groups based on their similarity
measures and then the sentences are ranked based on their
LexRank scores similar to PageRank algorithm [7]except that
the similarity graph is undirected in LexRank method. The
method outperforms earlier versions of lead and centroid based
approaches. The performance of the system is evaluated with
DUC dataset [8].
2. Fuzzy logic based approach
The fuzzy logic approach mainly contains four components:
defuzzifier, fuzzifier, fuzzy knowledge base and inference engine.
The textual characteristics input of Fuzzy logic approach
are sentenced length, sentence similarity etc which is later
given to the fuzzy system [9] [10].
IEEE International Conference on Computer, Communication, and Signal Processing (ICCCSP-2017)
TABLE I
SUPERVISED AND UNSUPERVISED LEARNING METHODS FOR TEXT SUMMARIZATION
Categories Methodology Concept
SUPERVISED Machine Learning ap- Summarization task
LEARNING proach Bayes rule modelled as classification
APPROACHES problem
Trainable summarization -
SUPERVISED neural network is trained,
LEARNING Artificial Neural Net- pruned and generalized to
APPROACHES work filter sentences and classify
them as "summary" or
"non-summary sentence"
SUPERVISED Statistical modelling ap-
LEARNING Conditional Random proach which uses CRF as
APPROACHES Fields (CRF) a sequence labelling prob- lem
UNSUPERVISED Graph based Construction of graph to
LEARNING Approach capture relationship be-
APPROACHES tween sentences
Importance of sentences
UNSUPERVISED Concept oriented ap- calculated based on
LEARNING the concepts retrieved
APPROACHES proach from external knowledge
base(wikipedia, HowNet)
UNSUPERVISED Fuzzy Logic based ap- Summarization based on
LEARNING fuzzy rule using various
APPROACHES proach sets of features
Fig. 3. Example of Sentence concept bipartite graph proposed in [4]
Ladda Suanmali et al [11] proposed fuzzy logic approach
is used for automatic text summarization which is the initial
step , the text document is pre-processed followed by feature
extraction(Title features, Sentence length, Sentence position,
Sentence-sentence similarity, term weight, Proper noun and
Numerical data. The summary is generated by ordering the
ranked sentences in the order they occur in the original
document to maintain coherency. The proposed method shows
improvement in the quality of summarization but issues such
as dangling anaphora are not handled.
3. Concept-based approach
In concept-based approach, the concepts are extracted from
a piece of text from external knowledge base such HowNet
[l2]and Wikipedia [4]. In the methodology proposed [12], the
importance of sentences is calculated based on the concepts
retrieved from HowNet instead of words. A conceptual vector
model is built to obtain a rough summarization and similarity
measures are calculated between the sentences to reduce
redundancy in the final summary. A good summarizer focuses
on higher coverage and lower redundancy. Ramanathan
978-1-5090-3716-2/17/$31.00 ©2017 IEEE
Advantages Limitations
Large set of training data im- Human interruption required for proves the sentence selection for generating manual summaries summary
The network can be trained ac- I)Neural Network is slow in
cording to the style of human training phase and also in apreader.
The set of features can be plication phase. 2) It is difficult
altered to reflect user's need and to determine how the net makes
requirements decision. 3) Requires human interruption
for training data
Identifies correct features and 1) focuses on domain specific
provides better representation of which requires an external do- sentences and groups terms ap- main specific corpus for training
propriately into its segments step. 2) Limitation is that linguistic
features are not considered
1) Captures redundant inforrna- Doesn't focus on issues such as
tion 2)Improves coherency dangling anaphora problem
incorporation of similarity mea- Dangling anaphora and verb refsures
to reduce redundancy erents not considered
improved quality in summary by membership functions and work
maintaining coherency of the fuzzy logic system
Fig. 4. Overall architecture of text summarization based on fuzzy logic
approach proposed in [10]
Fig. 5. Example of concepts retrieved for sentence from Wikipedia as
proposed in [4]
et al [4] proposed a Wikipedia-based summarization which
utilizes graph structure to produce summaries. The method
uses Wikipedia to obtain concept for each sentence and
builds a sentence-concept bipartite graph as already mentioned
in Graph-based summarization. The basic steps in concept
based summarization are: i) Retrieve concepts of a text from
IEEE International Conference on Computer, Communication, and Signal Processing (ICCCSP-2017)
external knowledge base(HowNet, WordNet, Wikipedia) ii)
Build a conceptual vector or graph model to depict relationship
between concept and sentences iii) Apply ranking algorithm to
score sentences iv) Generate summaries based on the ranking
scores of sentences
4. Latent Semantic Analysis Method(LSA)
Latent Semantic Analysis(LSA) [13] [14] is a method which
excerpt hidden semantic structures of sentences and words
that are popularly used in text summarization task. It is an
unsupervised learning approach that does not demand any sort
of external or training knowledge. LSA captures the text of the
input document and excerpt information such as words that
frequently occur together and words that are commonly seen in
different sentences. A high number of common words amongst
the sentences illustrate that the sentences are semantically
related. Singular Value Decomposition(SVD) [13], is a method
used to find out the interrelations between words and sentences
which also has the competence of noise reduction that helps
to improve accuracy. SVD, [15] when enforced to document
word matrices, can group documents that are semantically
associated to one other despite them sharing no common
words. The set of words that ensue in connected text is
also connected within the same peculiar dimensional space.
LSA technique is applied to excerpt the subject-related words
and important content conveying sentences from report. The
advantage of adopting LSA vectors for summarization over
word vectors is that conceptual relations as represented in the
human brain are naturally captured in the LSA. Choice of
the representative sentence from every scale of the capacity
ensures relevancy of sentence to the document and ensures
non-redundancy. LS works with text data and the principal
ambit due to the collection of topics can be located.
Considering an example to depict LSA representatieach
otheron, Example 1: Consider following 3 sentences given to
LSA based system. dO: 'The man was walked the dog. dl:
'The man took the dog to the park in the evening. d2: 'The
dog went to the park in the evening. From Fig6 [13] it is to
be noted in order that dl is associated to d2 than dO and the
conversation 'walked' is linked to the talk 'man' but it is not
significant to the word 'park'. These kind of interpretations
can be built by using input data and LSA, beyond need for
any extraneous awareness.
B. SUPERVISED LEARNING METHODS
Supervised extractive summarizationrelated techniques are
based on a classification approach at sentence level where
the system learns by examples to classify between summary
and non-summary sentences. The major drawback with the
supervised approach is that it requires known manually created
summaries by a human to label the sentences in the original
training document enclosed with "summary sentence" or "nonsummary
sentence" and it also requires more labeled training
data for classification.
978-1-5090-3716-2/17/$31.00 ©2017 IEEE
Fig. 6. Representation of LSA for Example [13]
Fig. 7. Overview of Supervised Learning Methods
1. Machine Learning Approach based on Bayes rule
A set of training documents along with its extractive summaries
is fed as input to the training stage. The machine
learning approach views classification problem in text summarization.
The sentences are restricted as a non-summary
and summary sentence based on the feature possessed by the
sentence. The probability of classification are learned from the
training data by the following Bayes rule [16]: where s represents
the set of sentences in the document and fi represents
the features used in classification stage and S represents the
set of sentences in the summary. P (s E< SII1,h,h, .... In)
represents the probability of the sentences to be included in
the summary based on the given features possessed by the
sentence.
2. Neural Network based approach
Fig. 8. Neural network after training (a) and after pruning (b) [17]
In the approach proposed in [18], RankNet algorautomaticallyithm
using neural nets to identify the important sentences
in the document. It uses a two-layer neural network with
IEEE International Conference on Computer, Communication, and Signal Processing (ICCCSP-2017)
back propagation trained using RankNet algorithm. The first
step involves labeling the training data using a machinelearning
approach and then extract features of the sentences
in both test set and train sets which is then inputted to the
neural network system to rank the sentences in the document.
Another approach [17]uses a three layered feed-forward neural
network which learns in the training stage the characteristics
of summary and non-summary sentences. The major phase
is the feature fusion phase where the relationship between
the features are identified through two stages 1) eliminating
infrequent features 2) collapsing frequent features after
which sentence ranking is done to identify the important
summary sentences.Neural Network [17]after feature fusion
is depicted in Fig 8. Dharmendra Hingu, Deep Shah and
Sandeep S.Udmale proposed an extractive approach [19]for
summarizing the Wikipedia articles by identifying the text
features and scoring the sentences by incorporating neural
network model [5]. This system gets the Wikipedia articles
as input followed by tokenisation and stemming. The preprocessed
passage is sent to the feature extraction steps, which
is based on multiple features of sentences and words. The
scores obtained after the feature extraction are fed to the
neural network, which produces a single value as output score,
signifying the importance of the sentences. Usage of the words
and sentences is not considered while assigning the weights
which results in less accuracy.
3. Conditional Random Fields
Conditional Random Fields are a statistical modeling approach
that focuses on machine leaning to provide a structured
prediction. The proposed system overcomes the issues faced
by non-negative matrix Factorization (NMF) methods by incorporating
conditional random fields (CRF) to identify and
extract correct features to determine the important sentence
of the given text. The main advantage of the method is that
it is able to identify correct features and provides a better
representation of sentences and groups terms appropriately
into its segments. The major drawback of the method is that it
focuses on domain-specific which requires an external domain
specific corpus for training step, thus this method cannot be
applied generically to any document without building a domain
corpus which is a time-consuming task. The approach specified
in [20]uses CRF as a sequence labelling problem and also
captures interaction between sentences through the features
extracted for each sentence and it also incorporates complex
features such as LSA_scores [21] and lilTS_score [22] but
the limitation is that linguistic features are not considered.
IV. INFERENCES MADE
• Abounding variations of the extractive path [15] have
been focused in the prior ten years. However, it is difficult
to say how analytical improvement (sentence or text
level) devote to work.
• Beyond NLP, the achieved summary might endure a lack
of semantics and cohesion. In texts consist of numerous
topics, the provoked summary may not be fair. Conclusive
978-1-5090-3716-2/17/$31.00 ©2017 IEEE
proper weights for respective features is vital to the
quality of concluding summary depends on it.
• Feature weights should be given more importance because
it plays a major role in choosing key sentences.
In text Summarization, the most challenging task is
to summarize the contented from a number of semistructured
sources and textual, which includes web pages
and databases, in the proper way (size, format, time,
language,) for an explicit user.
• Text summary software should crop effective summary
within a fewer amount of redundancy and time. Summarization
appraise using extrinsic or intrinsic part.
• Intrinsic parts pursuit to measure summary nature adopting
human evaluation whereas, extrinsic parts measure
the same over a effort-based work measure being the
information rehabilitation-oriented task.
V. EVALUATION METRICS
Numerous benchmarking datasets [1] are used for experimental
evaluation of extractive summarization. Document Understanding
Conferences (DUC) is the most common benchmarking
datasets used for text summarization. There are a
number of datasets like TIPSTER, TREC , TAC , DUC, CNN.
It contains documents along with their summaries that are
created automatically, manually and submitted summaries[20].
From papers surveyed within the previous sections et al in
literature, it's been found that agreement between human
summarizers is sort of low, each for evaluating and generating
summaries quite the shape of the outline, it is tough to judge
the outline content.
i) Human Evaluation
Human judgment usually has wide variance on what's
thought-about a "good" outline, which implies that creating
the analysis method automatic is especially tough. Manual
analysis is used, however, this can be each time and laborintensive
because it needs humans to browse not solely the
summaries however conjointly the supply documents. Other
issues are those regarding coherence and coverage.
ii) Rouge
Formally, ROUGE-N is an n-gram recall between a candidate
summary and a set of reference summaries. ROUGE-N
is computed as follows:
ROUGE-N = LSEreference_summaries LN-grams Countmatch{N-gram)
L S Ereference_summaries LN-grams Count{N -gram)
where, n stands for the length of the n-gram Countmatch(Ngram)
is the maximum number of n-grams co-occurring in
a candidate summary and a set of reference summaries.
Count(N-gram) is the number of N-grams in the set of
reference summaries.
... ) R 11 R _ ISref n Scandl
m eca - ISrefl
where Sref n Scand indicates the number of sentences that
co-occur in both reference and candidate summaries.
IEEE International Conference on Computer, Communication, and Signal Processing (ICCCSP-2017)
t.V ) P rect.s.w n (P) P = ISreIf n ScIa ndl
Scand
)
_ 2(Precision)(Recall)
v F -measure F - PreCI.S l.O n + Rec all
vi) Compression Ratio Cr = Slen . Iien
where, Slen and Iien are the length of summary and source
text respectively.
VI. CHALLENGES AND FUTURE RESEARCH
DIRECTIONS
Evaluating summaries (either automatically or manually) is
a difficult task. The main problem in evaluation comes from
the impossibility of building a standard against which the
results of the systems that have to be compared. Further, it
is very hard to find out what a correct summary is because
there is a chance of the system to generate a better summary
that is different from any human summary which is used as an
approximation to correct output. Content choice [23] is not a
settled problem. People are completely different and subjective
authors would possibly select completely different sentences.
Two distinct sentences expressed in disparate words will
specific a similar can explicit the same meaning also known
as paraphrasing. There exists an approach to automatically
evaluate summaries using paraphrases (paraEval). Most text
summarization systems perform extractive summarization approach
(selecting and photocopying extensive sentences from
the professional documents). Though humans can cut and paste
relevant data from a text, most of the times they rephrase sentences
whenever necessary, or they may join different related
data into one sentence. The low inter-annotator agreement
figures observed during manual evaluations suggest that the
future of this research area massively depends on the capacity
to find efficient ways of automatically evaluating the systems.
VII. CONCLUSION
This review has shown assorted mechanism of extractive
text summarization process. Extractive summarization process
is highly coherent, less redundant and cohesive (summary and
information rich). The aim is to give a comprehensive review
and comparison of distinctive approaches and techniques of
extractive text summarization process. Although research on
summarization started way long back, there is still a long way
to go. Over the time, focused has drifted from summarizing
scientific articles to advertisements, blogs, electronic mail
messages and news articles. Simple eradication of sentences
has composed satisfactory results in massive applications.
Some trends in automatic evaluation of summary system have
been focused. However, the work has not focused the different
challenges of extractive text summarization process to its full
intensity in premises of time and space complication.
REFERENCES
[1] N. Moratanch and S. Chitrakala, "A survey on abstractive text summarization,"
in Circuit, Power and Computing Technologies (ICCPCT),
2016 International Conference on. IEEE, 2016, pp. 1-7.
978-1-5090-3716-2/17/$31.00 ©2017 IEEE
[2] F. Kiyomarsi and F. R. Esfahani, "Optimizing persian text summarization
based on fuzzy logic approach," in 2011 International Conference on
Intelligent Building and Management, 2011.
[3] F. Chen, K. Han, and G. Chen, "An approach to sentence-selectionbased
text summarization," in TENCON'02. Proceedings. 2002 IEEE
Region 10 Conference on Computers, Communications, Control and
Power Engineering, vol. 1. IEEE, 2002, pp. 489-493.
[4] Y. Sankarasubramaniam, K. Ramanathan, and S. Ghosh, "Text summarization
using wikipedia," Information Processing & Management,
vol. 50, no. 3, pp. 443-461, 2014.
[5] J. M. Kleinberg, "Authoritative sources in a hyperlinked environment,"
Journal of the ACM (JACM), vol. 46, no. 5, pp. 604--632, 1999.
[6] G. Erkan and D. R. Radev, "Lexrank: Graph-based lexical centrality
as salience in text summarization," Journal of Artificial Intelligence
Research, pp. 457-479, 2004.
[7] S. M. R .. W. T. L., Brin, "The pagerank citation ranking: Bringing
order to the web," Technical report, Stanford University, Stanford, CA.,
Tech. Rep., (1998).
[8] (2004) Document understanding conferences dataset. Online. [Online].
Available: http://duc.nist.gov/data.htrnl.
[9] F. Kyoomarsi, H. Khosravi, E. Eslami, P. K. Dehkordy, and A. Tajoddin,
"Optimizing text summarization based on fuzzy logic," in Seventh
IEEElACIS International Conference on Computer and Information
Science. IEEE, 2008, pp. 347-352.
[10] L. SUanmali, M. S. Binwahlan, and N. Salim, "Sentence features fusion
for text summarization using fuzzy logic," in Hybrid Intelligent Systems,
2009. HIS'09. Ninth International Conference on, vol. 1. IEEE, 2009,
pp. 142-146.
[11] L. Suanmali, N. Salim, and M. S. Binwahlan, "Fuzzy logic based method
for improving text summarization," arXiv pre print arXiv:0906.4690,
2009.
[12] X. W. Meng Wang and C. Xu, "An approach to concept oriented text
summarization," in in Proceedings of ISClTS05, IEEE international
conference, China,1290-1293" 2005.
[13] M. G. Ozsoy, F. N. Alpaslan, and 1. Cicekli, "Text summarization using
latent semantic analysis," Journal of Information Science, vol. 37, no. 4,
pp. 405-417, 2011.
[14] 1. Mashechkin, M. Petrovskiy, D. Popov, and D. V. Tsarev, "Automatic
text summarization using latent semantic analysis," Programming and
Computer Software, vol. 37, no. 6, pp. 299-305, 2011.
[15] V. Gupta and G. S. Lehal, "A survey of text summarization extractive
techniques," Journal of emerging technologies in web intelligence, vol. 2,
no. 3, pp. 258-268, 2010.
[16] J. L. Neto, A. A. Freitas, and C. A. Kaestner, "Automatic text summarization
using a machine learning approach," in Advances in Artificial
Intelligence. Springer, 2002, pp. 205-215.
[17] K. Kaikhah, "Automatic text summarization with neural networks,"
2004.
[18] K. M. Svore, L. Vanderwende, and C. J. Burges, "Enhancing singledocument
summarization by combining ranknet and third-party sources."
in EMNLP-CoNLL, 2007, pp. 448-457.
[19] D. Hingu, D. Shah, and S. S. Udmale, "Automatic text summarization
of wikipedia articles," in Communication, Information & Computing
Technology (ICCICT), 2015 International Conference on. IEEE,2015,
pp.I-4.
[20] D. Shen, J.-T. Sun, H. Li, Q. Yang, and Z. Chen, "Document summarization
using conditional random fields." in IJCAI, vol. 7, 2007, pp.
2862-2867.
[21] Y. Gong and X. Liu, "Generic text summarization using relevance
measure and latent semantic analysis," in Proceedings of the 24th annual
international ACM SIGIR conference on Research and development in
information retrieval. ACM, 2001, pp. 19-25.
[22] R. Mihalcea, "Language independent extractive summarization," in
Proceedings of the ACL 2005 on Interactive poster and demonstration
sessions. Association for Computational Linguistics, 2005, pp. 49-52.
[23] N. Lalithamani, R. Sukumaran, K. Alagamrnai, K. K. Sowmya, V. Divyalakshmi,
and S. Shanmugapriya, ''A mixed-initiative approach for
summarizing discussions coupled with sentimental analysis," in Proceedings
of the 2014 International Conference on Interdisciplinary Advances
in Applied Computing. ACM, 2014, p. 5.
View publication stats