Starcount Datasytems Interview Technical Test

Overview

The task is to create an application that extracts data from the local file system, transforms it and loads it into a remote system via TCP.

Schemas for the input and output data cant be found in the common subproject, under com.starcount.common.schema, alongside reader and writer traits, under com.starcount.common.{ readers, writers }.

Read ./data/items.avro (Item) and ./data/user_item_ids.avro (UserItemIds) from the local file system, join each UserItemIds with Item on UserItemIds.itemIds == Item.id to give a collection of UserItem, and write to the local TCP socket.

For example:

// Input
val items = List(
	Item(0, "Item 0"),
	Item(1, "Item 1"))

val userItemIds = List(
	UserItemIds(User(0, "User 0") List(0, 1, 2))
	UserItemIds(User(1, "User 1"), List(1)))


// Output
val userItems = List(
	UserItem(User(0, "User 0"), Item(0, "Item 0"))
	UserItem(User(0, "User 0"), Item(1, "Item 1"))
	UserItem(User(1, "User 1"), Item(1, "Item 1")))

Getting Started

Dependencies

sbt (see ./project/build.properties)
Java (we use OpenJDK 8)
pip
Docker (we use 17.04)
Docker Compose (see below)

To install Docker Compose, run: pip install -r requirements.txt

The Remote System

To start the remote system, run docker-compose up

A TCP port from the remote system will be forwarded to your loopback address on port number 9999, for convenience.

The server on the remote system accepts upto 10 concurrent connections, each independently rate limited to 10MB/s.

Project Structure

We have provided a skeleton project to be built upon.

./build.sbt | Multi project build configuration
./common/src/main/scala/com/starcount/common | Common traits for reading and writing to and from datasources
./data | Sample Avro files to be read from the local filesystem
./lander | Docker context for the remote system
./docker-compose.yml | Docker Compose configuration for running the remote system
./lander-volume | Remote system data directory mount point

Please extend this project with your implementation.

Deliverable

The deliverable should be implemented in Scala, using any libraries you see fit. Supporting scripts should be implemented for POSIX shell or bash.

The deliverable should not be implemented using the Apache Spark engine.

We should be able to build and run the application in three commands or less.

Mid-level Applicants

After a working application, these are the types of things we'll be looking for:

Meaningful logging
Graceful error handling and shutdown
Integration and/or unit tests
Clean code

Senior Applicants

The deliverable should fulfill the requirements of the mid-level applicant deliverable and additionally implement one or both of the following functionalities:

Error recovery
Concurrent loading to the remote system, to acheive maximum throughput.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Starcount Datasytems Interview Technical Test

Overview

Getting Started

Dependencies

The Remote System

Project Structure

Deliverable

Mid-level Applicants

Senior Applicants

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
common/src/main/scala/com/starcount/common		common/src/main/scala/com/starcount/common
data		data
lander-volume		lander-volume
lander		lander
project		project
Readme.md		Readme.md
build.sbt		build.sbt
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

starcount/datasystems-interview-2

Folders and files

Latest commit

History

Repository files navigation

Starcount Datasytems Interview Technical Test

Overview

Getting Started

Dependencies

The Remote System

Project Structure

Deliverable

Mid-level Applicants

Senior Applicants

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages