GitHub - ludwigwittgenstein2/supermarket_elasticsearch: We use Django and Elastic search to improve revenues in a supermarket data

Kodai: A Software Architecture and Implementation for Segmentation

by Rick Rejeleene

This software was written for my M.S dissertation in Computer Science. I was immensely benefited by guidance of Dr. Joan Peckham, Dept. Chair of Computer Science and Statistics.

My Problem Statement:

I was given a random dataset on supermarket dataset. Using the dataset, help a small business to understand about their customers and a method to test revenue improvement.

Challenges:

a) Large-Dataset b) Which tools to use to solve this problem? c) Apply Software Engineering Methods d) Time-Deadline e) No Knowledge of Marketing/Business

My Immediate questions:

a) How would I use Software Engineering to solve this problem?

link to Dissertation!

Screenshots:

Figure 1: Kodai - Web-application

Figure 2: Kodai -- Retrieving Top Consumers from dataset

Preface:

Kodai is a web-application written in Django Framework using Python. We used a-priori method to implement segmentation in the dataset. Also, we tested a revenue increase hypothesis in supermarket data.

Our Dataset had information about product details, purchases, demographics, coupons. Using this, our goal was to improve revenues by targeting segments in the dataset.

We use dunnhumby's data for a Supermarket.

Abstract:

The purpose of this thesis is to design and implement a software architecture for segmentation models to improve revenues for a supermarket. This tool supports analysis of supermarket products and generates results to interpret consumer behavior, to give businesses deeper insights into targeted consumer markets. The software design developed is named as Kodai. Kodai is horizontally reusable and can be adapted across various industries. This software framework allows testing a hypothesis to address the problem of increasing revenues in supermarkets. Kodai has several advantages, such as analyzing and visualizing data, and as a result, businesses can make better decisions. In addition to these advantages, Kodai is open-source, which means any developer can access the code, and develop into client requirements. With the described features, it is better than other similar tools such as Gephi, a free visualization and manipulation tool.

This tool is developed to facilitate segmentation of consumer data based on purchase behavior with the goal of allowing the user to test a hypothesis to address the problem of increasing revenues in supermarkets. Most importantly, the software is reusable and can be adapted horizontally across various industries.

Data Set:

This dataset contains household level transactions over two years from a group of 2,500 households who are frequent shoppers at a retailer. It contains all of each household’s purchases, not just those from a limited number of categories. For certain households, demographic information as well as direct marketing contact history are included.

Below is the link for whole data-Set:

The Complete Journey

[CSV-Files] https://www.dunnhumby.com/sourcefiles
YAML-FILES

DATA TABLES:

The following contains the structure of the Dataset, which we used for the whole prototype.

We use (a) to denote that a record is connected to another table.

CAMPAIGN_TABLE

(1584 households mailed 30 Campaigns)

This table lists the campaigns received by each household in the study. Each household received a different set of campaigns. File: campaign_table.csv

HOUSEHOLD_KEY -- Uniquely identifies each household

(a)CAMPAIGN_TABLE -- Uniquely identifies each campaign. Ranges 1-30

DESCRIPTION -- Type of campaign(TypeA, TypeB or TypeC )

CAMPAIGN_DESC

This table gives the length of time for which a campaign runs. So, any coupons received as part of a campaign are valid within the dates contained in this table. (30 Campaigns)

File: campaign_desc.csv

(a)CAMPAIGN -- Uniquely identifies each campaign. Ranges 1-30

DESCRIPTION -- Type of Campaign(TypeA, TypeB, TypeC)

START_DAY -- Start Date of Campaign

END_DAY -- End Date of Campaign

COUPON_REDEMPT

(434 households redeemed 556 coupons from 30 Campaigns)

This table identifies the coupons that each household redeemed.

File: coupon_redempt.csv

HOUSEHOLD_KEY -- Uniquely identifies each household

DAY -- Day when transaction occurred

COUPON_UPC -- Uniquely identifies each coupon

CAMPAIGN -- Uniquely identifies each campaign

COUPON

(1135 coupons promoted 44133 products for the 30 campaigns)

This table lists all the coupons sent to customers as a part of a campaign, as well as the products for which each coupon is redeemable. Some coupons are redeemable for multiple products. One example is a coupon for any private label frozen vegetable.

For campaign TypeA, this table provides pool of possible coupons. Each customer participating in a TypeA campaign received 16 coupons out of the pool. The 16 coupons were selected based on the customer's prior purchase behavior. Identifying the specific 16 coupons that each customer received is outside the scope.

File: coupon.csv

CAMPAIGN -- Uniquely identifies each campaign. Ranges 1-30

COUPON_UPC -- Uniquely identifies each coupon( unique to household and campaign)

PRODUCT_ID -- Uniquely identifies each product

HH_DEMOGRAPHIC

(801 household) This table contains demographic information for a portion of household.

File: hh_demographic.csv

HOUSEHOLD_KEY -- Uniquely Identifies each household

AGE_DESC -- Estimated age range

MARITAL_STATUS_CODE -- Marital Status (A - Married, B- Single, U-Unknown)

INCOME_DESC -- Household income

HOMEOWNER_DESC -- Homeowner, renter

HH_COMP_DESC -- Household composition

HOUSEHOLD_SIZE_DESC -- Size of household up to 5+

KID_CATEGORY_DESC -- Number of children present up to 3+

TRANSACTION_DATA.csv

(2500 Households shopped 92339 products)

This table contains all products purchased by household within this study. Each line found in this table is essentially the same line that would be found on a store receipt.

File: transaction_data2.csv

HOUSEHOLD_KEY -- Uniquely identifies each household

BASKET_ID -- Uniquely identifies a purchase occasion

DAY -- Day when transaction occurred

PRODUCT_ID -- Uniquely identifies each product

QUANTITY -- Number of the products purchased during the trip

SALES_VALUE -- Amount of dollars retailer receives from sale

STORE_ID -- Identifies unique stores

COUPON_MATCH_DISC -- Discount applied due to retailer's match of manufacturer coupon

COUPON_DISC -- Discount applied due to manufacturer coupon

RETAIL_DISC -- Discount applied due to retailer's loyalty card program

TRANS_TIME -- Time of day when transaction occurred

WEEK_NO -- Week of transaction. Ranges 1 -102

PRODUCT.csv

(92353 products)

This table contains information on each product sold such as type of product, national or private label and a brand identifier.

File:product.csv

PRODUCT_ID -- Number that uniquely identifies each product

DEPARTMENT -- Groups similar products together

COMMODITY_DESC -- Groups similar products together at a lower level

SUB_COMMODITY_DESC -- Groups similar products together at a lowest level

MANUFACTURER -- Code that links products with same manufacturer together

BRAND -- Indicates Private or National label brand

CURR_SIZE_OF_PRODUCT -- Indicates package size

CASUAL_DATA

(68377 products)

This table signifies whether a given product was featured in the weekly mailer or was part of an in-store display.

File:causal_data.csv

PRODUCT_ID -- Uniquely identifies each product

STORE_ID -- identifies unique stores

WEEK_NO -- Week of the transaction

DISPLAY -- Display location

MAILER -- Mailer Location

Release Notes

`v1.0.1`

Contributors

Rick

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
Segmentation		Segmentation
bootstrap		bootstrap
bower_components/ng-table		bower_components/ng-table
chartist-js-develop		chartist-js-develop
components/bower_components		components/bower_components
django-graphos		django-graphos
elasticSearch_index_script		elasticSearch_index_script
graphos		graphos
media		media
myapp		myapp
plot		plot
productAnalysis		productAnalysis
sampleMatplotCode		sampleMatplotCode
static		static
templates		templates
upload		upload
z_score		z_score
.gitignore		.gitignore
Kodai_LOGO.png		Kodai_LOGO.png
Kodai_LOGO_types.png		Kodai_LOGO_types.png
LICENSE		LICENSE
Product_Customer.py		Product_Customer.py
README.md		README.md
_config.yml		_config.yml
chartitdemodb		chartitdemodb
charts.txt		charts.txt
combine_coupon_script.py		combine_coupon_script.py
customerProduct.py		customerProduct.py
db.sqlite3		db.sqlite3
example_animated_histogram.py		example_animated_histogram.py
hello1.txt		hello1.txt
highcharts.js		highcharts.js
javascript_sample.js		javascript_sample.js
manage.py		manage.py
out		out
plot_matplotlib.py		plot_matplotlib.py
plot_matplotlibMarried.py		plot_matplotlibMarried.py
plot_matplotlibRenter.py		plot_matplotlibRenter.py
plot_mostBought.py		plot_mostBought.py
ques1.py		ques1.py
ruleOne.py		ruleOne.py
sampleDelete.py		sampleDelete.py
shopperCount.py		shopperCount.py
shopperVisits.py		shopperVisits.py
text.txt		text.txt
topCoupons.py		topCoupons.py
topCustomers.py		topCustomers.py
top_product.py		top_product.py
viewsBackup.py		viewsBackup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kodai: A Software Architecture and Implementation for Segmentation

by Rick Rejeleene

My Problem Statement:

Screenshots:

Preface:

Abstract:

Data Set:

DATA TABLES:

CAMPAIGN_TABLE

CAMPAIGN_DESC

COUPON_REDEMPT

COUPON

HH_DEMOGRAPHIC

TRANSACTION_DATA.csv

PRODUCT.csv

CASUAL_DATA

Release Notes

`v1.0.1`

Contributors

About

Releases

Packages

Contributors 2

Languages

License

ludwigwittgenstein2/supermarket_elasticsearch

Folders and files

Latest commit

History

Repository files navigation

Kodai: A Software Architecture and Implementation for Segmentation

by Rick Rejeleene

My Problem Statement:

Screenshots:

Preface:

Abstract:

Data Set:

DATA TABLES:

CAMPAIGN_TABLE

CAMPAIGN_DESC

COUPON_REDEMPT

COUPON

HH_DEMOGRAPHIC

TRANSACTION_DATA.csv

PRODUCT.csv

CASUAL_DATA

Release Notes

v1.0.1

Contributors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

`v1.0.1`

Packages