This is the final project of System programming university course. Professor's directions say it should be a server-client system for remote files ciphering (and/or deciphering), following a notorious malware path known as ransomware. Once installed to the attacked machine, the client will be the rudder for the attack. The application needs to support both Linux and Windows systems, operating exactly the same way on both.
Name choice has not been imposed by the professor, and descends from two fundamental reasons:
-
the paronomasia with cryptolocker, one of the most famous ransomware attacks since 2013;
-
gives a playful idea if the project, developed for university-related cognitive purposes, marking the simplicity of the alphabetic encryption system used.
The project contains two main elements:
-
src
(folder): contains the source code, both for Linux and Windows platforms; -
Makefile
(refular file): contains all the rules needed to automate the build processes.
Physical files and folders structure has been defined that way it is to handle the distinction of every modulem, communication edge and host platform:
-
communication edge:
-
server
-
client
-
-
host platform:
-
Linux
-
Windows
-
Communication edge one gets done with client
and server
folders (immediately after src
one); common
folder is used to contain code used by both edges.
Any of this containers - server
and/or client
- contain the latter distinction, via linux
and windows
folders and, if needed, common
one, too (similarly to the one already mentioned, it's used for cross-platform code).
As per professor's directions, server
is made of several parts:
-
a TCP socket needs to be instanciated to be listening on any network interface on conventional
8888
port (otherwise specified by-p
input flag); -
the handling of any request gets delegated to a thread. The
n
-threads (conventrionally 4, or specified by the-n
flag) are handled by a threadpool (more details below); -
server
operations are limited to the elements (recursively) contained into the folder mandatorily specified with-c
flag. -
supported operations are the following:
-
LSTF
Lists the files contained in the folder, along with their bytes size;
-
LSTR
Recursively lists the files contained in the folder, along with their bytes size;
-
ENCR seed file
Ciphers, with a key generated using
seed
, the content offile
into thefile_enc
and then removesfile
; -
DECR seed file_enc
Deciphers, with a key generated using
seed
, the content offile_enc
into thefile
and then removesfile_enc
;
-
-
it's possible to specify a configuration files, with
-f
flag, within which to indicate the values corresponding to-p
and-n
flags.
It implementation follows some simple steps, explained below.
First operation that gets done is the input arguments scan and their parsing. In order to do this, getopt
library has been mainly used, because of its use simplicity in the flag-value association and in reporting facoltatives / mandatory input combinations. Later, parameters get validated, instead:
-
the existence of configuration file: if existing, its values get parsed;
-
the existence of the folder within which to relegate the application execution;
-
specified port number validity;
-
specified maximum threads number validity for threadpool.
Once user input has been validated and handled, application effectively configures itself:
-
threadpool
init()
function gets called to initialize the maximum threads number (more details below); -
WSADATA
structure get instanciated: it's actually used for socket handling procedures (only on Windows platform); -
passive TCP socket gets instanciated on the specified port and fired to be listening for eventual connections.
Once an active socket is generated, its pointer gets passed as paramenter to the handle_connection()
function, delegated of the server-client conversation management, for all its duration. The implementation of this method, within a for
cycle, scans and reacts to every command requested by the other side of the communication:
-
LSTF
/LSTR
: these two commands use the same ricorsivelist(char *ret_out, int recursive)
function (that respectively callslist_opt(char *ret_out, int recursive, char *folder, char *folder_suffix)
function): while calling it,LSTF
indirectly sets the boolean (an integer)recursive
to 0, whileLSTR
does the same to 1. In both cases,*folder
parameter will match with the*arg_folder
pointer (the folder within which the application is in execution) and*folder_suffix
will be NULLed. So, while scanning the*folder
folder content, if another folder is met andrecursive
is true, then the function will call itself again, populating*folder_suffix
variable adequately, to indicate the suffix that needs to be added to the initial forlder to construct the path of the just met folder. -
ENCR
/DECR
: exploiting the peculiarity of the ciphering made using the exclusive disjunction (XOR) operator (given a k key and ciphered a characters sequence applying the XOR bitwise operation with k key, reapplying the same operation with the same k, the the same initial sequence will be obtained), the implementation of the two commmands has been unified. In fact, they execute the same procedure (described more in detail below), except for the configuration of the input/output file.
threadpool uses custom data structures to simplify both the complexity of the problems that it handles, and multi-platform. job_t
is the most atomic structures, it contains all the informations about a task that needs to be executed within the threadpool: a function and its arguments pointers, and a pointer to the next job_t
. Then, threadpool_t
structure gathers useful informations for the correct functioning of the threadpool, such as the maximum threads number, or the threads list itself, or the mutex and the condition variable, both used to regulate the interactions with internal fields of the structure itself.
There're several ways to interace with the threadpool from the outside:
-
threadpool_init()
This function initializes the data structures used by the module, and it's used to specify the maximum threads number usable in the threadpool context.
-
threadpool_add_job()
This function is delegated to handle the new operations - that will be marked as in pending - adding procedure to the threadpool queue.
-
threadpool_bye()
Finally, this function is invoked to make all the memory cleaning operations, before stopping the threadpool.
Every thread is configured to execute the module static function thread_boot
, which do nothing but executing a task, or better, a job_t
, actually in pending status on the threadpool queue. This gets done once it has acquired the lock on the mutex, so to update the informations about the next task and about the number of pending tasks.
cipher module follows a very simple structure, composed by only a cipher()
function which gets multiple arguments: two char
pointers - which correspond to the paths of the input/output files of the ciphering procedure - and an unsigned int
used as seed for the key generation. The body of this function has three steps:
-
initialization of file descriptor (s) (on Linux platform) or
HANDLE
(s) (on Windows platform) and of memory maps of the input/output files, once obtained the lock on the first one; -
effective ciphering of the first file to the latter;
-
closing of file descriptor (s) (on Linux platform) or
HANDLE
(on Windows platform) and of the memory maps of the input/output files, once released the lock on the first one.
Regarding the parallelization problem, has been allowed to use the OpenMP API. This choice is motivated by two fundamental reasons:
-
it's a multi-platform API, so it won't need any code difference between the systems;
-
it's extremely simple to use.
As for the game rules imposed by the professor, the parallelization needs to be applied only if the file that will be ciphered is greater than 256 Kbyte. This is the reason why it has been chosen make the process work with two nested for
cycles. The first one is parallelized using OpenMP, and will iterate on every 256 Kbyte long portion of the file. This way, if the file is less than 256 Kbyte, the for
will include only a cycle, as it's executed sequentially. On the other side, the nested for
cycle will iterate on every 4 byte, corresponding to an integer, that compose the 256 Kbyte. For any of these, ciphering will be calculated.
One of the encountered problems was about the needing of finding the simplest way to let the one-time pad ciphering method and the parallelization on files greater than 256 Kbyte coexist. In fact, although the partitioning of the file into 256 Kbytes long blocks has simplified the parallelization problem, the parallelization itself has generated a new problem, because of its missing systematic of execution. In a sequential scenario it's provably true that for every iteration element, starting from the same seed, always the same key will be generated. In a parallelized scenario, this is not provable, as there's no way to foresee the for
cycle execution order. In order to solve this problem, a new additional memory map is used to preventively and sequentially generate the ciphering keys. In fact, before executing the two nested for
cycles for the effective ciphering procedure, a new memory map (of the same size of the input file) is instanciated and populated - with a new for
cycle - with the keys generated using rand_r()
(or cipher_rand()
, on Windows platform). On the old next for
cycles, instead of dinamically generate the keys, every memory map key item corresponding to the iteration element will be used.
As easily deductible reading Windows module's code variant, there's a function which is not present on the same module's Linux implementation: cipher_rand()
. It consists of a pseudo-random number generator, used to generate the key from the seed. The reason behind this choice is about the lack of a system implementation of such a function on Windows platform. So, in this case, the method - taken from the implementation of rand_r()
offered by MinGW (more details below) - has been provided.
static int cipher_rand(unsigned int *seed) {
long k;
long s = (long)(*seed);
if (s == 0) {
s = 0x12345987;
}
k = s / 127773;
s = 16807 * (s - k * 127773) - 2836 * k;
if (s < 0) {
s += 2147483647;
}
(*seed) = (unsigned int)s;
return (int)(s & RAND_MAX);
}
client
is the simplest code portion of the project. It's made of three parts:
-
arguments parsing
Unlike the
server
implementation, in this case no external auxiliary library has been used to handle the arguments parsing problem; instead, a more artisanal method that could fit around the case needs has been preferred. In fact, the ambiguity between the input flags that don't need arguments and the ones that actually do, between flags that indicate analogues commands, or that -server
side - need arguments, has brought to this choice. -
creation of socket, needed to connect to a
server
Regarding the socket management, it's the most reduntant code portion, if compared to the one from
server
module; they only have a difference: theserver
socket is waiting for connections, theclient
one is requesting a connection to a previouslyserver
allocated one, instead. -
back-and-forth with server
In this phase, what happens will follow this simple scheme:
-
translation of the
client
input flags intoserver
supported commands (e.g.,client
flag-l
gets translated intoserver
commandLSTF
); -
sending command to
server
via socket; -
receiving a reply from
server
and printing the reply itself.
-
The configuration of the development environment and the subsequent compilation on Linux platform is relatively simple, as most of Linux distribution provides a base packages group for the development. In the case of the environment where the code has been written:
# eopkg it -c system.devel
Now, just a make
is enough:
# make [server|client]
This software has been written and tested on the following environment:
Linux | 4.9.45 x86_64 |
---|---|
Distribution | Solus Project |
RAM | 20 Gb |
CPU | Intel Core i7-4770k Haswell |
Type | Phisical machine |
Configuration and compilation of project on a Windows environment is a little bit more complicated. Auxiliary libraries have been used to simplify the compilation phase, and reduce the differentiation of compilation template listed in the Makefile
to the minimum: it's the reason why msys2
and mingw-w64
have been adopted. The configuration proceeds this way:
-
msys2
installation via official site: http://www.msys2.org -
Application launching and subsequent update of packages database:
# pacman -Syu
-
Effective packages update:
# pacman -Su
-
minGW-w64
installation:# pacman -S mingw-w64-x86_64-gcc
(on 32 bit architecture, installmingw-w64-i686-gcc
instead) -
Base development dependencies installation:
# pacman -S base-devel
-
optional: in order to use the packages installed above even from the PowerShell, adding the path of the binary files to the global Windows
PATH
variable is required:C:\path\to\msys2\usr\bin
eC:\path\to\msys2\mingw64\bin
.
This software has been written and tested on the following environment:
Windows | 10 Pro x86_64 |
---|---|
RAM | 4096 Mb |
CPU | Intel Core i7-4770k Haswell |
Type | Virtual machine |
server
can be used the following ways (assuming pwd
is the root folder of the project, once it has been compiled):
# ./bin/cryptoloackerd -c /path [-n max-threads -p port]
You can specify the /path
and the port
nnumber in a configuration file and let the server
load those values reading the configuration file itself, specifying it as parameter:
# ./bin/cryptoloackerd -f /path/file.txt [-n max-threads]
In this case, file will be populated following this template:
# cat /path/file.txt
folder = /path
port = 8888
client
can be used the following ways (assuming pwd
is the root folder of the project, once it has been compiled):
-
To execute
LSTF
orLSTR
:# ./bin/cryptoloacker -h server-ip -p port [-l|-R]
-
To execute
ENCR
orDECR
:# ./bin/cryptoloacker -h server-ip -p port [-e|-d] seed /path/file