1. Assignment 1¶

Application Overview

Throughout this lab, you will learn many different types of technologies and how to use them to solve problems in a complex distributed enterprise application. The tasks are (loosely) tied together under a common scenario: a ridesharing or Mobility Service Provider (MSP) platform (like Uber or Lyft). Our system manages organizations, drivers, vehicles, riders, and trips; provides a matchmaking system for drivers and riders; allows riders to create multi-stop trips; manages the transactional billing process for trips; and provides a rating system for drivers and riders. Note that we will not implement a fully functional end-to-end application, but rather cherry-pick specific modules of such an application. In this first assignment, we will focus on the domain model and persistence layer of the system. In later stages of the lab, you will solve business problems, implement framework features, and tackle real-time analytics and infrastructure scalability issues.

Assignment 1

In this assignment you will learn how to work with an Object-Relational Mapping (ORM) framework. You will map a given object model to a relational data model using the JavaEE Java Persistence API (JPA), and use JPA to write queries and transactions. You will use jOOQ, to implement additional entities and queries. You will also learn about non-traditional data models such as the document-oriented model using MongoDB, and key-value stores using Redis.

Points

Task	Points
JPA	9.25
jOOQ	5.5
NoSQL: Document Databases	5.5
NoSQL: NoSQL: Key-Value Stores	4.75
Assignment Interview	25

1.1. Object-Relational Mapping¶

In this task, you will create the persistence layer of our ridesharing platform. In particular, you will use JPA to define the object-relational mapping and write queries on the data model. Because JPA is only a service provider interface (SPI), we need a service provider which is what we use Hibernate for. Fig. 1.1 shows the domain model of the system as a UML class diagram.

Fig. 1.1 The domain model of our ridesharing platform¶

In the template, interfaces (we use the Hungarian notation where interfaces are prefixed with I) are provided for all model classes and data access objects (DAO). Create an implementation class for each of the interfaces and make sure to correctly instantiate objects using the ModelFactory and the DAOFactory. Put all the implementation classes you add into separate impl sub-packages, i.e., dst.ass1.jpa.model.impl and dst.ass1.jpa.dao.impl.

To test/demonstrate your mapping and implementations the template provides unit tests that reads the Hibernate configuration, creates a persistence context, and inserts a test fixture into the database. You may extend the pre-defined tests and also include additional test classes to cover any special situations and corner cases, if necessary. Make sure that your solution works in its entirety, that is, no unnecessary tables are created, no unintended information is lost when deleting entities (e.g., because of unexpected cascades), and that there are no exceptions thrown when performing any of the tasks.

Hint

This part of the solution is required for other tasks in later assignments, so we strongly recommend that you solve at least this part of the assignment. Additionally, have a look at the provided dst.ass1.jpa.util.Constants class, which defines various constants that should be considered for implementing the described domain model. However, note that in most cases the constants simply explicate the default values generated by JPA.

To get started, we suggest you read the Oracle JPA tutorial. You don’t have to worry about the configuration and bootstrapping of Hibernate, as this is all pre-configured in the test template, so you can focus on the core functionalities of JPA. However, if you are using Windows, you may need to change the target path of the temporary H2 database instance, which is configured in persistence.xml in the connection.url field. Changing it to a relative path ./tmp/database/dst, for example, will work.

1.1.1. Basic Mapping¶

JPA provides two ways of declaring object-relational mappings: Java annotations in the code, and XML-based configuration files. You should map the model classes and all associations using the javax.persistence.* annotations in your respective model implementations. For the Trip entity, you should define the mapping using the XML-based configuration (this has no particular practical purpose, but it allows you to play with both methods. Many frameworks provide annotation and XML based configurations, both of which have advantages and disadvantages). Define the XML mapping in the Trip.xml file in the resources directory of the ass1-jpa module.

Mapping details

Make sure that you implement the navigation in the data model as specified in the diagram (for instance, the Driver–Organization association is bidirectional, so the entities should reflect this). You should be able to explain how navigation affects the underlying relational model (e.g., foreign keys and relation tables).
The relationship between a Driver and an Organization is unique. Employments are identified via the drivers’s and the organization’s IDs, and should make up a composite key (IEmploymentKey). The employment association object also stores whether the driver is currently active in this organization.
Geospatial data and details of Locations are stored in the Document DB (see Section 1.4). In the relational model we only store a reference to the actual location ID, and a name (e.g., the address string or a restaurant name). The relationships between Trip and Location are abbreviated in the diagram, but each trip has a mandatory pickup and destination location, and a (possibly empty) list of additional stops.
To represent monetary values, we use a Money value type. Money objects do not live as entities on their own, but should be embedded into the entities that use them (TripReceipt and Match).
Make sure to use the concrete type of the IMoney entity when using as property.
Entities that contain multiple embedded properties of the same type require column renaming to avoid column duplication. Look into AttributeOverrides to rename them. We have defined constant values to use the AttributeOverrides annotation in Constants.java.
The partOf association in the Organization entity defines organizational hierarchies. For flexible modeling, we allow an Organization to have both multiple parents and children. For this relation you need to make sure that the created join table is named as defined in the constant J_ORGANIZATION_PARTS and uses the join columns as defined in I_ORGANIZATION_PART_OF and I_ORGANIZATION_PARTS.
Passwords of Riders are stored as a cryptographic hash value of the password string (never store plain text passwords to the database!). SHA-1 is used to generate a digest and store the result as binary data. SHA-1 hashes are fixed in size, so you should configure the password column to use the optimal data type and length. You can assume that the byte array passed is already the digest. Please note that SHA-1 is broken and not recommended to be used in production systems. Instead use, for example, the SHA-2 standard.
Make sure that the Rider’s email is unique.

Note

The XML mapping of Trip.xml requires the elements inside the attributes tag to be defined in a specific order. Checkout the XML Schema for information about the ordering.

1.1.2. Inheritance Mapping¶

Inheritance is an essential mechanism in object-oriented modeling. However, the relational model does not support the concept of inheritance, making inheritance one of the well-known object-relational impedence mismatches. There are, however, several ways an inheritance hierarchy of classes can be represented in relational tables.

You may have noticed that the Rider and Driver entities have many commonalities. For this task, you should abstract these commonalities into the abstract entity PlatformUser. Choose one of the three well-known inheritance patterns to map the class hierarchy via JPA to the database. Make sure that a Rider’s email and name make up a composite unique constraint. The inheritance hierarchy are shown in Fig. 1.2.

Fig. 1.2 Abstracting common properties via inheritance¶

Note

During the discussion sessions you should be able to explain the different inheritance patterns and discuss your specific choice.

1.2. Querying Entities¶

In this task, you will learn how to query JPA entities using JPQL and the JPA Criteria API. If you haven’t already, read the respective chapters on querying entities in the Java EE Tutorial. Implement the DAO interfaces provided in the template, and return your implementations in the DAOFactory. Execute the queries in the respective DAO methods. You can find the correct methods via their names (they should be descriptive) and signatures.

1.2.1. Simple Named Queries¶

Implement the following queries as JPQL named queries (no plain SQL, no inline queries, and no programmatic filtering with Java code!). Declare them in the appropriate entities and execute them in the corresponding DAO methods.

Find all drivers with an average rating above X who are currently active in any organization.
Find all trips that have more than X stops and are in COMPLETED state.
Find organizations that have active drivers with specified vehicle type and minimum rating.

1.2.2. Complex Named Queries¶

Implement the following query as a combination of one or more named queries and Java code.

Find all drivers who have completed at least X trips with a rating above 5 in a given date range and have never received a rating below 3 stars.

Note

You do not need to find a single query to solve this task (you can use a combination of Java code and named queries), but you have to keep ORM-performance in mind, i.e., make sure that your solution is also reasonably fast if you have many entities. During the discussion session you should be able to explain what types of problems can arise with badly written queries.

1.2.3. JPA Criteria API Queries¶

Implement the following query using the JPA Criteria API (i.e., you should not use JPQL, but write the queries using the CriteriaBuilder fluent interface). Again, put the code into your implementation of the respective DAO methods. You should implement the method as a single query. Note that all parameters should be optional (i.e., if the parameter is null, don’t add the criteria when building the query).

Now do the following query using Criteria API - find all trips where:
- The total fare is between min and max values (optional parameters)
- The driver’s average rating is above X (optional parameter)
- The trip has at least Y stops (optional parameter)
- Order results by creation date descending
- You can ignore the currency and just filter by value.

1.3. jOOQ: Object Oriented Querying¶

Until now we have used JPA as ORM to access our data. We have seen benefits and disadvantages using such technology and the complexity it can add to an application. In contrast to that, SQL mapper frameworks offer an API that most often resembles SQL as much as possible and does not contain as much magic as ORMs.

The trend of thriving for a simple API can also be seen in the microservice application paradigm. The paradigm splits monolithic application into smaller units (= microservices) that allow flexible and fast development with minimal dependencies between services. This entails a clear seperation of database access and each microservice has to implement its own data access layer and use its own database. In this part of the assignment we take on the spirit of microservices and extend the existing entity schema by including tables to store arbitrary rider preferences. To this end, we use the library jOOQ that offers a SQL-centric abstraction for Java.

In comparison to JPA, you will have to write your own SQL schema and jOOQ will generate classes accordingly. We highly recommend to go over the following tutorial from jOOQ to familiarize yourself with the API: Tutorial.

Implementation

This part is split into three parts and similar to what you have done already in JPA:

Defining the SQL schema
Implementing the models
Implementing the DAOs

In the following we reference to packages that are located in the ass1-jooq module and omit the most of the package path. For example, if we write about the model package, we specifically talk about the dst.ass1.jooq.model package in the ass1-jooq module.

1.3.1. SQL Schema¶

Fill out the SQL schema located in resources/schema.sql based on the shown entity model. Afterwards, in the generate-sources phase jOOQ will scan your existing tables in your database and generate sources from it. You can find the generated files under target/generated-sources/joq/dst.ass1.jooq.model/public_/tables, make sure to have a look and verify your solution. The pom.xml of the ass1-jooq module is already configured to generate these classes as an implementation of the interaces located in the entity package.

1.3.2. Models¶

Afterwards, you can implement the IRiderPreference interface located in the model package. jOOQ should generate the implementations for the entities located in the entity package! You also must implement the IModelFactory to create an instance of the IRiderPreference interface.

1.3.3. Datasource¶

If you have familiarized yourself already with jOOQ (e.g., by going through the tutorial we have linked above), you know that jOOQ also needs a DSLContext to execute database queries and other SQL statements. We provide this implementation for you and you can create connections using the getConnection method in the DataSource class, located in the connection package.

Note

Please familiarize yourself with the concepts of connection creation in jOOQ. You should know which features/options our implementation activates and what that means for the execution. Specifically, look at the module’s pom.xml (e.g., which jOOQ arguments are present) and how the DSLContext from getConnection is created.

1.3.4. jOOQ DAOs¶

In this part of the assignment you will implement the DAO to retrieve, create, query and update methods for the entities. Implement the interfaces IRiderPreferenceDAO and IDAOFactory using jOOQ. You have to implement the following methods in the IRiderPreferenceDAO:

insert: creates a new Rider Preference and saves all preferences that are included.
updatePreferences: this method receives a new Rider Preference instance with which we updates the old one. That means, you have to update all fields and overwrite the values of existing keys present in the passed object. You can view it as a partial update, which does not delete keys and only adds new keys and updates existing ones.
delete: deletes all data for a specified id.
findById: Retrieves the Rider Preferences for a specific rider.
getMostPopularVehicleClassPerArea: Calculate the most popular vehicle class preferences per area, in case of a tie sort by vehicle type.

Note

Make sure that all methods that all write operations are done in a transaction and therefore executed at once. You should be able to answer during the group interview how you achieved this and also what other guarantees you can make about the execution (e.g., think about the data source).

1.4. NoSQL: Document Databases¶

Not all data handled by enterprise applications is well suited to be stored in traditional tabular relations. For example, consider a database of different geospatial locations (e.g., pickup and drop-off locations, popular places, etc.). First, creating relational schemata for all the different types of data we want to store with a location (e.g., whether it is a restaurant, and all the information associated with it) may not be possible or make their structure very rigid, which may be undesirable for different reasons. Also, answering questions such as “which other users have requested a pickup within a 5 kilometer radius of the given coordinates”, is not possible in traditional SQL. For this task, you will use MongoDB to store and query geospatial locations. MongoDB is a document-oriented NoSQL (non/not-only SQL) database, which can store schemaless semi-structured JSON data called documents. Similar to the previous tasks, the unit tests for this task spawn an in-memory MongoDB instance so you don’t have to worry about running a server or any type of configuration. Note that, the embedded MongoDB downloads a library the first time you run a test, which can take several minutes. To access the MongoDB instance, you will use the MongoDB Java Driver.

Implementation

The package dst.ass1.doc in the project template contains an interface for inserting data into MongoDB (IDocumentRepository) and one for running queries against the MongoDB instance (IDocumentQuery). Use these interfaces for the remaining parts of this section and put your implementation classes into a sub-package dst.ass1.doc.impl. Also make sure to properly instantiate and return your implementation classes in DocumentServiceFactory.

Document structure

Locations stored as documents have four predefined properties and zero or more additional properties. The location_id refers back to the locationId of the Location entity stored in the relational database. The type describes whether the Location describes, e.g., a place or a generic address. The name refers back to the name property of the Location entity. The geo property holds a GeoJSON Object that specifies the geospatial data. Additional properties are dependent on the type. For example, a place may fall into a specific category. An example is given below.

[{
  "location_id": 9241753,
  "type": "place",
  "name": "TU Wien",
  "openHour": 10,
  "closingHour": 18,
  "geo": {
    "type": "Point",
    "coordinates": [48.199, 16.3699]
  },
  "category": "University"
},
{
  "location_id": 9241754,
  "type": "address",
  "name": "...",
  "geo": { /* ... */ }
}]

1.4.1. Documents and Collections¶

Data in MongoDB is organized into collections that belong to a database. All location data will be stored in one collection in our database instance. The name of the database and collection can be found in the Constants class. The unit tests insert test data loaded from DocumentTestData.

For this task, implement the IDocumentRepository#insert(ILocation,Map) method to insert a new document into the collection. The document’s location_id and name properties should be read from the given entity instance. All additional properties are passed via a Map (maps naturally mix well with BSON Documents).

Furthermore, as we will often retrieve documents by location_id and geo, you should add appropriate indices to the collection to speed up these queries.

1.4.2. Query by Example¶

The lack of a structured schema makes it difficult to provide a well structured query language such as SQL. Instead, many document databases use the query-by-example pattern to query data, i.e., using documents to express queries. Also, MongoDB natively supports geospatial queries, which you will use. In this task, you should use MongoDB’s Query Documents to implement a query in IDocumentQuery:

Given a type find all documents that have this specific type.

Provide reasonable logging output with information about the results of your query.

1.4.3. Aggregation Pipelines¶

For this task, to gain insights about the stored documents, you will implement two more complex queries using the powerful Aggregation Pipeline . An aggregation pipeline consists of one or more stages that process documents:

Each stage performs an operation on the input documents. For example, a stage can filter documents, group documents, and calculate values.
The documents that are output from a stage are passed to the next stage.
An aggregation pipeline can return results for groups of documents. For example, return the total, average, maximum, and minimum values.

Familiarize yourself with aggregation concepts and refer to the mongodb-java-driver aggregation docs on how to execute an aggregation pipeline.

Implement the query in the method IDocumentQuery#findOpenRestaurantsInRadius(). The output of your aggregation query should be a list of all existing places with the document property category equal to “Restaurant”, which can be found in a radius of a certain amount of KM, around a center point, that are open at a specific hour (from 0 to 23). The documents should also include the location of the restaurant, its name, and its opening hours.

Example output:

{
  "location_id" : {
    "$numberLong" : "2"
  },
  "type" : "place",
  "name" : "Cool Restaurant",
  "geo" : {
    "type" : "Point",
    "coordinates" : [ 16.368528, 48.200939 ]
  },
  "category" : "Restaurant",
  "openHour" : 6,
  "closingHour" : 20
}

Assume that one degree of latitude is 111.2 km and one degree of longitude is 74.5 to calculate the area of the polygon.

https://en.wikipedia.org/wiki/Latitude

Implement the query in the method IDocumentQuery#calculateLocationDensity(). The output of your aggregation query should be a single document, which holds the number of locations identified in the polygon in the count property as a long number, the area of the polygon, and the density of locations in the polygon.

Example output:

{
  "count" : {
    "$numberLong" : "4"
  },
  "areaKm2" : 4.692970346744005,
  "densityPerKm2" : 0.8523386479045644
}

As in the previous task, provide logging output with information about the results of your queries.

1.5. NoSQL: Key-Value Stores¶

Sessions are a fundamental concept of web applications and typically handled by the web server. However, in large-scale scenarios, it is beneficial to implement a dedicated session management component that can be scaled on demand. In this task, you will use the Redis key–value store to create a scalable session management service. Sessions belong to a user ID and are identified by unique session tokens (e.g., UUIDs). Sessions can hold arbitrary data as key–value pairs. When a session’s timeout has elapsed, the session expires and all variables associated with the session should be removed.

Implement the ISessionManager interface and instantiate your implementation in the SessionManagerFactory. To communicate with Redis, use the Jedis client library which is already provided in the template. All data must be distributed through Redis, meaning your implementation cannot rely on Java in-memory data structures. Also, make sure that all Redis operations that manipulate multiple keys (for example when creating a session) are atomic. To that end, use Redis’ MULTI, and/or WATCH–EXEC transaction mechanisms. Furthermore, your implementation should not rely on searching the Redis keyspace with the KEYS command, or on executing Lua scripts.

Unlike in the previous tasks, Redis does not come with Java in-memory facilities, so we need a running Redis server instance to execute the JUnit tests. The Docker Compose file starts a Redis instance, so to execute the tests you simply need to start the docker-compose.yml file. The redis.properties file contains the host and port configuration, which default to the local Redis instance started via docker-compose.yml.

Hint

To interact with the Redis instance, ou can either locally install the redis-cli or connect with the Redis container (docker exec -it <container_id> redis-cli).

1.5.1. Data Structures¶

First, implement the createSession method to create a new session for the given user. To create a random session token you can, for example, use Java’s java.util.UUID utility. Use appropriate Redis data structures to store session variables. If you want you can hold the user ID of a session in the session variable userId. Don’t forget that all session related keys should be removed once the session expires (hint: use the EXPIRE command).

Note

In the discussion sessions, you should explain the data structures you used, and why you chose them over others that Redis provides.

1.5.2. Check-and-Set Operations¶

The requireSession method is a typical check-and-set operation. A new session should only be created if the given user does not already have an active session (e.g., through previous calls of requireSession). Atomicity is particularly important for this method. Consider that the method may be invoked concurrently by multiple distributed clients (i.e., you cannot rely on Java locking mechanisms), so make sure your implementation guarantees that a concurrently created session isn’t overwritten (note: you can ignore the case in which a key may expire during a transaction). For requireSession you might also need auxiliary keys that store the session ID of the user relation (may be overwritten by createSession, but not by requireSession).