Skip to content

pranavghate94/DeepNews

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepNews License

Generates headline out of a given text of data.

DeepNews is a high-level news generating tool, written in Python and capable of running on top of either Keras, TensorFlow or Theano. It was developed for media orgnizations or writters where they can quickly come up with headline that is short and information conveying.


Getting started

Installing

DeepNews in written on top of Python and Keras, ThensorFlow and Theano.

Installing Python:

Installing Keras

  • sudo pip install keras
  • Windows Based System can follow this steps Stackoverflow

Installing TensorFlow

Amazon AWS (All libraries are installed in the AMI image)

Neural networks are computations heavy, GPU configuration is recommended.


Deep News

Import model

from deepnews import *

Train model

#Fill code to train new dataset

Test model

#Fill code to test new dataset

Using Pretrained model

#Fill code on how to use pretrained model

Evaluate Model

#Fill code on how to use evaluate results

Text Preprocessing

#Fill code on how to use text preprocessing

In the examples folder of the repository, you will find more examples.


Dataset

Word2Vec (Hindi Language)

Word2Vec Link Image

Neural Network Model

Input Model

Input NN Model

Dataset Statistics

Length of Article histogram

Length of Article Histogram

Length of Headline histogram

Length of Headline Histogram

FIRE Dataset stats

features values
no of articles 2,97,965
no of tokens 85,940,081 (85.94M)
no of unique tokens in articles 3,88,449
no of unique tokens in headlines 58,448
avg length of article 272
avg length of headline 7
size of dataset 1.06GB
avg. of (ratio len(article)/len(headline)) (Behind 43 words of description, headline contain 1 word) 43

Crawled Dataset stats

features values
no of articles 5,95,847
no of tokens 20,92,32,922 (209M)
no of unique tokens in articles 10,26,083
no of unique tokens in headlines 1,24,965
avg length of article 316
avg length of headline 11
size of dataset 3.70GB
avg. of (ratio len(article)/len(headline)) (Behind 43 words of description, headline contain 1 word) 34

Number of Crawled Articles per source

News Website Number of Articles URL
Aaj Tak 92765 http://www.aajtak.intoday.in
ABP News 13654 http://www.abpnews.abplive.in
Amar Ujala 181 http://www.amarujala.com
BBC Hindi 28861 http://bbc.com/hindi
Deshbandhu 3174 http://deshbandhu.co.in
Economic Times 993 http://hindi.economictimes.indiatimes.com
Jagran 73290 http://www.jagran.com
Navbharat Times 10329 http://www.navbharattimes.indiatimes.com
NDTV 92942 http://www.khabar.ndtv.com/news/
News18 38833 http://www.news18.com
Patrika 68288 http://www.patrika.com
Punjab Kesari 15494 http://www.punjabkesari.in
Rajasthan Patrika 89038 http://www.rajasthanpatrika.patrika.com
Zee News 10463 http://www.zeenews.india.com/hindi

About

Generates headline given a text of data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%