Software Achitecture


Document
Author1:	ziyu
Version:	2nd th
Date:	22.04.2024

1 INTRODUCTION

1.1 Purpose and coverage

In this document, we'll delve into the intricacies of our technical stack. We'll provide a concise explanation of each technology, detail the reasons for using them and how to use them in the web application of Tukko.

1.2 Product and environment

The team develops features for Tukko (Traffic Visualizer), which is a full-stack web application that shows general data and hotspots of Finnish traffic information. Tukko is running on CSC (IT Center for Science) servers, where we have a Linux-based virtual machine instance. As a modern web application, Tukko will support modern browsers for all operating systems.

1.3 Definitions, notations and abbreviations

Here, some academic concepts are defined, which might be helpful for further learning.

CI/CD (Continuous Integration/Continuous Deployment):
CI (Continuous Integration) refers to developers continuously pushing small changes to a central repository. These changes are verified by automated computer software that runs the tests that the programmers or testers have defined. It runs comprehensive tests to ensure no major issues are seen by customers.

CICD terms
* pipeline: the top-level component used to define a CI/CD process. Stages and jobs can be defined in pipelines. * jobs: defining the actual steps to be executed. * stages: defininng the chronological order of jobs * runners: open-source application that executes the instructions defined within jobs.

Code Coverage
It is a methodology that quantitatively measures how comprehensive a code base's tests are. Increasing code coverage often increases stability and rudeces bugs.

CSC
CSC is a non-profit state enterprise with special tasks. As part of the national research system, we develop, integrate and provide high-quality information technology services and ensure that Finland remains at the forefront of development.

cPouta
cPouta is a generic service which can be used for most tasks. cPouta helps build own service, do a quick test, have a development platform, build a data processing pipeline, or any other purpose you can think of.

Container
find it in the docker part

DevOps
A methodology that helps engineering teams build products by continuously getting user feedback. It includes plan, code, build, test, release, deploy, operate, monitor. In the planning stage, team members and customers discuss to take a set of features that they want to build. In the coding stage, developers will build out these features so they can be released in the ending. In the building block, developers might take the source codes and bundle it so that the use's interface could run. Then, the working flow moves to testing. Testing includes both automatic and manual testing. Automatic testing is colloquially known as continuous integration. Once the testing is done, and the stakeholders have all given their feedback, the features are released. It is then publicly accessible on the Internet (deploying). In the operating stage, making sure resources are provided for the loading, and configuring, dealing with architecture problems. In the monitoring part, the application get feedback from users or minotoring process. The Feedback is put back to the planning stage, so the planning stage take all the feedback and restart the whole process.

Docker
Docker is a platform for bulding, running and shipping applications in a consistent manner, so that the application which works in a specific developing machine could also run and function the same way on other machines. For example, in daily life, we might meet this situation: an application work well on my own machine, but it doesn't work on other machines. There might be three reasons: files loosing; software versions mismatched; or different configuration settings on different machines. All these three factors might lead to the app doesn't work on other machines. So docker comes to help! Docker packages the app with everything the app needs and run it anywhere on any other machines with docker. So docker consistently runs an application on different machines, and safely remove an application with all its dependencies to clean up the developing machines. When the app is installed elsewhere, the developers dont need to strictly install and configure all these dependencies. Docker bring up the application and automatically download, run them in an isolated environment called container. This isolated environment allows multiple applications use different versions of software. Container and visual machines are different, the prior one is an isolated environment for running an application, and the visual machine is an abstraction of a machine (physical hardware). So, dokcer is lightweight and needs less hardware.

FDD (Feature Driven Development)
FDD is an agile framework which organizes software development around making progress on features. Its primmary goal is to develop real, working software and meet deadlines systematically. There are five steps in FDD, developing a feature list, building a feature list, planning by the features, designing by the feature, and building by the feature. Different from scrum of which teams typically meet daily, FDD relys on team's documantation to communicate important information, hence they dont meet frequently.

Feature development process
* define a valuable hypothesis to guide the rest of the developing process * chat with stakeholders, making sure if the customers want the feature before implementation. * proposing a feature solution * creatung a prototype * user testinng with a feature prototype * make decisions whether implementinng or not * scoping the features * writing user stories * ready-for engineering design * feature implementation * implementing tracking * releasing the features * gathering user feedback * iterate

Gitlab
Gitlab is an open-source software development platform.brings teams together to shorten cycle times, reduce costs, strengthen security, and increase developer productivity.Git is a version control system and Gitlab is a source management system.

Gitlab Terms
* Group: manage settings across multiple projects. Logically categorize multiple users or projects provide a cross-project view * Merge Requests: merge one branch into another * Issue: track work related to a Gitlab project * Branch: a branch is a parallel version of a repository codebase, that allows to work on different features, bug fixes, or changes without affecting the main codebase. Branches are useful for collaborative development and managing different streams of work. * Branching Strategy: A software development workflow within the context of Git. It describes how a development team is created, and how they collaborate on, and merge branches of source code in a database. It enables concurrent development in the database. There are many considerating factors to choose a branching strategy: team requirements, source code management system, application environments and codebase structure.

Hypervisor:
A hypervisor is a software layer that creates and manages virtual machines (VMs) on a physical host machine. These virtual machines act as independent instances of operating systems, each running its own applications and processes, isolated from one another.

Linux-distributions:
Linux is an open-source softeware, and many companies or individual experts create their own versions of Linnux called distribution. Each of the distribution is made to fit specialized needs like running servers, desktop computers mobile phones and so on.

TDD (Testing Driven Devlopment Definition)
TDD is a coding methodology where tests are written before the code is written. Its goal is making sure the stakeholders know if the whole system is working correctly.

Visual Machines:
A Virtual Machine (VM) is a software-based emulation of a physical computer system. It operates as an independent entity, running its own operating system (known as the guest OS) and applications within a virtualized environment. Multiple virtual machines can coexist on a single physical server, each with its own set of resources and configurations.

1.4 References

Videos
DevOps: https://www.youtube.com/watch?v=j5Zsa_eOXeY
CICD: https://www.youtube.com/watch?v=8aV5AxJrHDg
Docker: https://www.youtube.com/watch?v=pTFZFxd4hOI
Docker Installation: https://www.youtube.com/watch?v=XgRGI0Pw2mM
Visual Machine: https://www.youtube.com/watch?v=mQP0wqNT_DI
React.js: https://www.youtube.com/watch?v=SqcY0GlETPk
Documentations
CSC: https://www.csc.fi/csc
cPouta: https://research.csc.fi/-/cpouta
Leaflet.js: https://leafletjs.com/reference.html
Turf.js: https://turfjs.org/
Docker: https://www.docker.com/

1.5 Overview of the document

This section describes the structure, content, and organization of the document; what is covered in any chapter. This is especially important if the reader is not used to reading design documents. If the first chapter is entirely on the same page as this section (1.5 overview) or is very short, it does not need to be mentioned in this section, but you can start with the things in Chapter 2. The contents of each chapter are described more than just browsing the table of contents. Possible appendices are also described here, e.g., appendices 1-4 contain class diagrams of the main parts of the system.

2. SYSTEM OVERVIEW

The chapter presents an overview of the system to be implemented, an introduction to the customer's environment and application area.

2.1 Description of the application area

Overall, the environment of Tukko (Traffic Visualizer) revolves around providing users with access to comprehensive traffic information through a modern and accessible web application. By leveraging CSC servers and a Linux-based virtual machine instance, Tukko delivers a reliable and scalable solution for visualizing Finnish traffic data, catering to the needs of users across various platforms and devices.

2.2 The integration of the system into its environment

Tukko (Traffic Visualizer), serves as a web-based application designed to display general data and hotspots of Finnish traffic information. Its primary function is to aggregate and present real-time and historical traffic data in an intuitive and visually engaging manner. Users can access Tukko through modern web browsers on various operating systems to gain insights into traffic conditions, identify congestion points, and plan routes effectively.

2.3 Hardware environment

Tukko operates across four Docker containers, consuming approximately 140MiB of RAM when all containers are running, before extensive logging occurs. Among these containers, MongoDB generates the most logs but is subject to limits that scale in proportion to the available RAM on the hardware running the container.

2.4 Software Environment

As the entire stack is dockerized, all your environment needs is docker support.

2.5 Key boundary conditions for implementation

Important boundary conditions for the Tukko (Traffic Visualizer) system include:

Hardware Requirements: The implementation hardware must meet the specifications necessary for running Docker containers and supporting virtualized software efficiently. This includes adequate RAM, CPU capabilities, and storage capacity to accommodate the Tukko application and its associated containers.
Software Dependencies: The system relies on specific software components, including Docker for containerization, MongoDB for data storage, and various libraries and frameworks for web application development. Compatibility with these software dependencies is essential for the proper functioning of Tukko.
Response Times: Tukko should provide timely responses to user requests for traffic data visualization. Response times should be optimized to ensure a smooth user experience, with minimal latency in loading and updating traffic information.
Accuracy of Data: The accuracy of the traffic data presented by Tukko is critical for providing reliable insights to users. Data sources and algorithms used for data processing must be accurate and up-to-date to ensure the integrity of the information displayed.
Security Requirements: Tukko must adhere to stringent security standards to protect user data, prevent unauthorized access, and mitigate potential security vulnerabilities. This includes implementing encryption, access controls, authentication mechanisms, and secure communication protocols to safeguard sensitive information.
Compliance with Laws and Regulations: The system must comply with relevant laws and regulations governing data privacy, transportation, and software usage in the jurisdiction where it operates. This includes adherence to GDPR (General Data Protection Regulation) requirements for handling personal data and any other applicable regulations.
Criticality and Reliability: Tukko may be considered a critical system for users relying on traffic information for planning and navigation. Therefore, it must demonstrate high reliability and uptime to ensure uninterrupted access to traffic data when needed.
Programming Language and Code Standards: Instructions regarding the programming language(s) used in the development of Tukko, as well as guidelines for code comments, variable naming conventions, function definitions, and other coding standards, should be documented to ensure consistency and maintainability of the codebase.
Documentation and Maintenance: Comprehensive documentation of the system architecture, configuration settings, APIs, and deployment procedures should be provided to facilitate system maintenance and troubleshooting. Additionally, guidelines for future updates, bug fixes, and enhancements should be outlined to ensure the longevity and scalability of the Tukko application.

3. DESCRIPTION OF THE ARCHITECTURE

This is the most important point in the design document. The chapter contains things that everyone who implements the system needs to know and understand. The chapter describes (with reasons) e.g. design principles, technology choices, and software architecture in general. It is not necessarily advisable to subdivide into the chapters presented here, but to consider what is the most reasonable order of presentation in each case. For example, sections 3.1 and 3.2 should sometimes be combined.

Example of Deployment Diagram

3.1 Design principles

Our team is embarking two features: a language switch for Turkku and aims to achieve the data output from the database by exporting it as CSV. Our approach to designing a traffic visualizer revolves around three fundamental principles: aesthetics, performance, and usability. Previous iterations of traffic visualizers have been marred by poor user experiences, characterized by unattractive designs and sluggish performance.

This section presents the “basic philosophy” of the implementation of the system to be developed. Philosophy defines the smallest and simplest possible set of basic concepts and rules by which design decisions are made now and in the future. The basic concepts and rules may be so closely related to some of the key modules in the architectural description in section 3.2 that it is worth moving their description here (or even combining sections 3.1 and 3.2). The technology choices made can also be part of the “rules”. Philosophy can be thought of as involving the implementation of a system in things that remain (probably) unchanged throughout its life cycle. The philosophy facilitates communication between implementers and harmonizes design solutions in different parts of the system. Examples:

The control system is implemented in a microcontroller without operating system support.
The system is divided into the following: hardware abstraction layer, operating system layer, and application module layer.
The purpose of the hardware abstraction layer is to hide the properties of the circuit board used, so that the implementation platform can be changed later if necessary.
The operating system implements process scheduling and interrupt handling control of the application module layer for the application module layer.
Application module layer modules are either passive (libraries) or active (processes). An example of a code frame for both types is given in Appendix x.
For each external interrupt source, the active application module dealing with shutdown.

For example, class and event sequence diagrams can be used to clarify the description.

Our team is embarking two features: a language switch for Turkku and aims to achieve the data output from the database by exporting it as CSV. Our approach to designing a traffic visualizer revolves around three fundamental principles: aesthetics, performance, and usability. Previous iterations of traffic visualizers have been marred by poor user experiences, characterized by unattractive designs and sluggish performance.

3.2 Software architecture, modules and processes

React.js React.js often referred to simply as React, is an open-source JavaScript library developed by Facebook. It is designed for building user interfaces, specifically for single-page applications (SPAs) and reusable UI components.

VITE Vite is a build tool for modern web development, designed to streamline the process of building, testing, and deploying web applications. Developed by Evan You, the creator of Vue.js, Vite is built on top of ESBuild and leverages modern JavaScript features to deliver fast and efficient development workflows. Let's delve deeper into what Vite is and why it's gaining popularity among web developers.

Leaft.js Leaflet.js is an open-source JavaScript library used for creating interactive maps in web applications. It is lightweight, easy to use, and highly customizable, making it a popular choice among developers for displaying geographical data.

Typescript TypeScript is a statically typed superset of JavaScript developed by Microsoft. It adds optional static typing to the JavaScript language, enabling developers to catch errors early in the development process and write more maintainable and scalable code.

Node.js with express Node.js with Express is a powerful combination for building web applications and APIs. Node.js is a runtime environment that allows developers to run JavaScript on the server-side, while Express is a minimalist web framework for Node.js that simplifies the process of building web applications and APIs.

MongoDB MongoDB is a popular, open-source, NoSQL database that provides a flexible and scalable solution for storing and managing structured, semi-structured, and unstructured data. It is designed to handle large volumes of data and is widely used in modern web development for its flexibility, performance, and ease of use.

REDIS Redis is an open-source, in-memory data structure store that is widely used as a caching layer, message broker, and database in modern web applications. It is known for its fast performance, scalability, and versatility, making it a popular choice for real-time applications and high-performance systems.

Dokcer Docker is a popular platform that enables developers to build, ship, and run applications in containers. Containers are lightweight, portable, and self-sufficient environments that package all the necessary dependencies, libraries, and configuration files required to run an application. Docker provides a consistent environment across different machines, making it easier to develop, deploy, and scale applications.

3.3 Database Architecture

Database Management System (DBMS):
Tukko utilizes a combination of MongoDB and Redis for data storage and caching respectively, ensuring a flexible and high-performance solution for managing traffic information.
Database Structure:
a. MongoDB Collections: Stores user information including usernames, hashed passwords, email addresses, and user preferences. TrafficData: Contains general traffic information such as traffic volume, congestion levels, and weather conditions for various locations. Hotspots: Stores information about traffic hotspots, including location coordinates, severity, and affected areas. UserPreferences: Holds user-specific settings and preferences for customization purposes. b. Redis Cache: Redis is used as a caching layer to enhance application performance by storing frequently accessed data, such as user sessions or recently queried traffic information.
Data Relationships:
MongoDB collections maintain relationships using references or embedded documents as per the requirements. For example, the TrafficData collection may reference Hotspots to associate traffic data with specific hotspot incidents.
Indexing:
Appropriate indexes are created on key fields of MongoDB collections to optimize data retrieval and querying performance. Redis data structures are optimized for fast data access.
Security Measures:
MongoDB security features, such as authentication and access control, are configured to ensure data integrity and prevent unauthorized access. Redis security measures, including authentication and network isolation, are implemented to protect cached data.
Backup and Recovery:
Regular backups of MongoDB data are scheduled to prevent data loss in case of system failures or disasters. Redis data persistence mechanisms are configured to ensure data durability and facilitate quick recovery.
Maintenance and Upkeep:
Routine maintenance tasks, such as database optimization and performance monitoring, are performed to ensure the smooth operation of the system. MongoDB and Redis instances are regularly updated with the latest patches and security fixes to address vulnerabilities.

Example

3.4 Error and Exception Procedures

Container Restart Mechanism:

The system implements a mechanism where containers automatically restart in the event of unexpected crashes. This ensures continuous availability of services and minimizes downtime. Additionally, to prevent infinite loops caused by multiple rapid restarts, containers will stop attempting to restart after a certain threshold, enhancing system stability and preventing resource exhaustion.

CSC Server Error Handling:

The CSC (IT Center for Science) servers are equipped with robust error handling mechanisms that ensure graceful handling of errors. In the event of errors or failures, CSC servers redistribute virtual computing resources accordingly, optimizing resource utilization and maintaining system performance.

Application Error Handling: While the current application may not encounter many critical errors that disrupt its functionality, it is essential to acknowledge the importance of implementing error handling mechanisms. Although the application may not exhibit frequent application-breaking states at present, anticipating and preparing for potential errors ensures the resilience and reliability of the system in the face of unforeseen challenges.

// these are requirements. I~~~~ temporarily leave them here, in case someone might learn Error and exception procedures at the general (architecture) level. Chapter 4 describes in more detail at the module level. The texts of the error messages should be attached at the latest during design (it would be better to think about them already at the definition stage). The following types of error handling are taken into account: * general error handling rules * common error handling modules * Recognition of error messages * saving error messages (to memory, to disk) * grouping of error messages (severity, user or system) * error message texts. Action in abnormal situations is part of the specification document, but a position must be taken at the latest in the design. For example, how the system behaves in the event of a power outage: "get up itself" or "get stuck".

4. MODULE / CATEGORY / PROCESS DESCRIPTIONS

The reading structure of this chapter is designed according to the architecture of the program: If a one-level breakdown is sufficient, the method presented here is used (4.1 Module X, 4.2 Module Y…). For example, if a program is divided into packages that contain multiple categories, you should make a separate section for each package, with each category described in the subsections. Things common to all classes in a package are described at the beginning of the chapter, and if the package has an interface, it is also described here. In large-scale projects, a separate design document is written for the internal structure of each package or subsystem. Each module describes its function, interfaces with other parts, interface and implementation aspects. The technical details must be explained in such detail that the description can be used to test the module as a black box test.

4.1 Module X (each module has its own section 4.i)

4.1.1 Overview

Module name: Module type: (class, function, process, package, subsystem, library) Overview: A brief description of the module - why it exists, what it does. Customers: which / what type of parts of the system need the services of this part (in the case of a general purpose component this item is missing). Dependencies and interfaces to other modules: Briefly describe how a module takes advantage of other modules and services in its environment (can often be combined with an overview).

4.1.2 Interface in general

The services provided by the module and the common features of the interface functions (eg error handling) are described in general terms. In some cases, it is useful to give examples of using the module by describing the communication between the client and the module, for example as an event sequence diagram. Mention is also made here of any Standard and similar definitions that may appear outside the interface, possible capacity restrictions and their modification, status information stored by the module, etc.

4.1.3 Interface Functions

Each interface function is described separately in its own subsection: * Function name * Function parameters and return value * Action: what the function does * Prerequisites: describes what the state of the program must be before calling the function. * Post-conditions: describes the state of the program after the function call (eg side effects). * Error situations: exceptions and other error situations, operation when preconditions do not apply when called

4.1.4 Implementing the module

If necessary, instructions for implementation may be provided, for example: * Thoughts on the implementation of the internal data structures of the module. * Thoughts on the algorithms used. * Known potentially reusable components. * If the module is complex, pseudocode, activity diagrams, etc. can be used. If necessary, a separate module design document can be made.

4.1.5 Error handling

Describes error and exception handling at the module level.

5. FINISHED COMPONENTS AND SPECIAL TECHNICAL SOLUTIONS

If there are finished parts, ie external components, then such are described: * where they are obtained * where they are placed * use * other essentials (so that someone else can compile or add the application). If some things differ from the usual working methods of the project. "Solutions that deviate from standard industry practice" that a person in the industry might not immediately guess. For example, the following, if necessary: * security, safety * backups * recoveries * maintainability * flexibility * portability or portability. Especially if there is some special or unusual way to do something. Implementing tools can also be mentioned here, if it is indeed important to tell already at this (design) stage (rare and not recommended, for example B compiler version 2.77 which supports D library 4.56). The project plan contains detailed information on the implementation tools. For example, can a program automatically recover from power outages or operating system "crashes"?

6. SOLVED SOLUTIONS

Considered, but rejected, solutions should be recorded with their rationale in an appropriate chapter or section with dates. Thus, the next reader of the document sees that something has been thought about as well. Also, if you are reading a design document yourself in six months, it may be difficult to remember what things have been considered when making the system.

At the end of the project, the rejected solution options are collected at the end of the project plan

7. FURTHER DEVELOPMENT IDEAS

Gather useful ideas that come to mind along the way, but which are not planned or implemented in this project; for example due to lack of time, lack of money, lack of resources or skills and competences. For example, ideas for further development should be numbered to make it easier to refer to them later. The date and the name (letters) of the proposer will help in the follow-up, especially if the source is outside the project, if after one year the project unexpectedly receives funding for further development. At the end of the project, this chapter is collected at the end of the project plan. The ideas for further development can also be presented as a separate appendix, which can be appended to other project documents if necessary.

7. IDEAS FOR FURTHER DEVELOPMENT

Gather useful ideas that come to mind along the way, but which are not planned or implemented in this project; for example due to lack of time, lack of money, lack of resources or skills and competences. For example, ideas for further development should be numbered to make it easier to refer to them later. The date and the name (letters) of the proposer will help in the follow-up, especially if the source is outside the project, if after one year the project unexpectedly receives funding for further development. At the end of the project, this chapter is collected at the end of the project plan. The ideas for further development can also be presented as a separate appendix, which can be appended to other project documents if necessary.

8. ITEMS STILL OPEN

The figure is unofficial and should no longer be at the end of the project. This can be used to mark issues that are open during the life cycle of the document, ie that need to be resolved, so that they can be clarified before the document is finally completed.

FUTURE ADDITIONS

Software architecture thought like Zachman

Framework and 4+1 View Model
Reference Model for Open Distributed Processing (RM-ODP)

Enterprise viewpoint --> Business Case
Information viewpoint -- >
Computational viewpoint
Engineering viewpoint
Technology viewpoint

Link Jari Suni here :)

Original Source http://www.cs.tut.fi/ohj/dokumenttipohjat/pohjat/suunnittelu/hytt_drsuunnittelu.doc

Thank you for original authors!