Skip to content

Latest commit

 

History

History

00-data-preparation

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Data Preparation for SAP BTP Data-to-Value Bootcamp

Important Note: No data preparation required for the bootcamp participants, which all the data sources have been prepared by SAP. System Access information are communicated to the participants within the dedicated MicroSoft Teams > General Channel > System Access Tab. Here the data preparation guide is for anyone who would like to go through the data-to-value exercises on their own.

This bookshop dataset is designed for the SAP BTP Data-to-Value Bootcamp, which is based on the British Library Dataset about Children's Literature under the Creative Commons CC0 1.0 Universal Public Domain Dedication License.

Description about the dataset

This bookshop dataset(csv format) is made of

  • Books (10058 Children Books)
    • Book ID, Title, Description, Author ID, ISBN13 and Publisher are extract from British Library dataset with grounded truth. To simplify the data model, only the first author is extracted in case of multiple authors for one book.
    • Genre ID: Default value as 0 - unknown, which will be clustered based on the title and description with machine learning algorithm as bootcamp exercise.
    • Price: Randomly generated decimal value with two decimal place between 10.00~100.00
  • Authors: 2942 Authors associated with the books.Schema as Author ID, Name
  • Genres: 11 generes. Genre ID and Name(values as unknown,gener1~10 as placeholders), which will be updated after all books have been clustered based on the title and description with machine learning algorithm as bootcamp exercise.
  • Book Sales Order Items: 287,906 transaction records for the book sales since 2011. To simplify the data model, we only take sales order transaction for the Quote-to-Cash process. Delivery notes, Billing Document and Payments etc are not part of the dataset.
    The schema of sales order item except Live Book Sales Order Items since 2021: order_ID, order_date, book_ID, quantity, net_amount
  • Book Monthly Sales per Book Genre since 2011: Used to forecast next 12 months' book sales(quantity) per genre with time-series forecast. Schema as Month(YYYY-MM),cluster(book genre cluster),Book Sales(Quantity).

Data Preparation

To simplify the data preparation for the bootcamp, we have prepared the data(Book Products, Book Sales Order since 2021) for SAP S/4HANA Cloud and archived historic sales order item for 2011~2020 in AWS S3 bucket. However, if you would like to go through this data-to-value journey on your own, you also can prepare the data in your own SAP S/4 HANA Cloud tenant and AWS S3.

#1-Bookshop Solution Data in SAP HANA Cloud

No data preparation needed for the bootcamp, which all the data sources has been prepared by SAP.

The online bookshop solution data is stored in SAP HANA Database of SAP HANA Cloud, including the Books, Authors, Genres and Book Sales Order Items. The bookshop solution enables the booshop manager to maintain the book catalog, and the end customer of bookshop to place book order online, and synchronised to SAP S/4HANA Cloud for order-to-cash process.

Data Preparation Options for bookshop solution on SAP HANA Cloud.

#2-Book Products and Sales Order since 2021 in SAP S/4HANA Cloud

No data preparation required for the bootcamp participants. All the data sources has been prepared by SAP.

#3-Archived Historic Book Sales Order items(2011~2020) in AWS S3

No data preparation required for the bootcamp participants. All the data sources has been prepared by SAP.

License

Copyright (c) 2021 SAP SE or an SAP affiliate company. All rights reserved. This project is licensed under the Apache Software License, version 2.0 except as noted otherwise in the LICENSE file.