Important Note: No data preparation required for the bootcamp participants, which all the data sources have been prepared by SAP. System Access information are communicated to the participants within the dedicated MicroSoft Teams > General Channel > System Access Tab. Here the data preparation guide is for anyone who would like to go through the data-to-value exercises on their own.
This bookshop dataset is designed for the SAP BTP Data-to-Value Bootcamp, which is based on the British Library Dataset about Children's Literature under the Creative Commons CC0 1.0 Universal Public Domain Dedication License.
This bookshop dataset(csv format) is made of
- Books (10058 Children Books)
- Book ID, Title, Description, Author ID, ISBN13 and Publisher are extract from British Library dataset with grounded truth. To simplify the data model, only the first author is extracted in case of multiple authors for one book.
- Genre ID: Default value as 0 - unknown, which will be clustered based on the title and description with machine learning algorithm as bootcamp exercise.
- Price: Randomly generated decimal value with two decimal place between 10.00~100.00
- Authors: 2942 Authors associated with the books.Schema as Author ID, Name
- Genres: 11 generes. Genre ID and Name(values as unknown,gener1~10 as placeholders), which will be updated after all books have been clustered based on the title and description with machine learning algorithm as bootcamp exercise.
- Book Sales Order Items: 287,906 transaction records for the book sales since 2011. To simplify the data model, we only take sales order transaction for the Quote-to-Cash process. Delivery notes, Billing Document and Payments etc are not part of the dataset.
The schema of sales order item except Live Book Sales Order Items since 2021: order_ID, order_date, book_ID, quantity, net_amount- Live Book Sales Order Items since 2021: The format as one or multiple book id, and order date. To be imported into your SAP S/4HANA Cloud tenant via csv2s4 tool by @Ralphive. However, a ready-use SAP S/4HANA Cloud with this data will be prepared for you during our bootcamp.
- Archived Historic Book Sales Order Item for 2011~2020: Stored in external cloud storage or data lake. In our bootcamp storyline, we take AWS S3 for example.
- Book Monthly Sales per Book Genre since 2011: Used to forecast next 12 months' book sales(quantity) per genre with time-series forecast. Schema as Month(YYYY-MM),cluster(book genre cluster),Book Sales(Quantity).
To simplify the data preparation for the bootcamp, we have prepared the data(Book Products, Book Sales Order since 2021) for SAP S/4HANA Cloud and archived historic sales order item for 2011~2020 in AWS S3 bucket. However, if you would like to go through this data-to-value journey on your own, you also can prepare the data in your own SAP S/4 HANA Cloud tenant and AWS S3.
No data preparation needed for the bootcamp, which all the data sources has been prepared by SAP.
The online bookshop solution data is stored in SAP HANA Database of SAP HANA Cloud, including the Books, Authors, Genres and Book Sales Order Items. The bookshop solution enables the booshop manager to maintain the book catalog, and the end customer of bookshop to place book order online, and synchronised to SAP S/4HANA Cloud for order-to-cash process.
- data preparation via sql: Creating table structures and import the data via SQL. This approach will be used in our bootcamp for simplicity. Please follow this document step by step to prepare the bookshop data via sql.
- data preparation via cap(SAP Cloud Application Programing Model) project deployment: This bookshop solution is forged from the bookshop exercise project(part of our BTP Extension Suite bootcamp) prepared by our colleague Jacob Tan. We have updated it with our the bookshop dataset for data-to-value bootcamp. Have to acknowledge that it is orginally forged from the famous cap sample about bookshop by a bunch of SAP CAP gurus from the community. To deploy the bookshop solution with data, please follow this document.
No data preparation required for the bootcamp participants. All the data sources has been prepared by SAP.
No data preparation required for the bootcamp participants. All the data sources has been prepared by SAP.
Copyright (c) 2021 SAP SE or an SAP affiliate company. All rights reserved. This project is licensed under the Apache Software License, version 2.0 except as noted otherwise in the LICENSE file.