Assignment: Distributed password cracking - the brute force way

In this assignment you are going to program a distributed application to crack passwords! Using more cores or computers should make the program run faster.

The goal is to get a large speed-up: That is to see how much faster you can make the program run, compared to the centralized (non-distributed) version of the program.

Centralized (non-distributed) version of the program

The general idea is that you have a password file with usernames and password_hash_values (nicknamed encrypted passwords) separated by ":" e.g. (username:password_hash_value).

You also have a large dictionary (list of words) that users might have used as passwords. The words of the dectionary are run through a Hash Function to generate a "fingerprint"/hash_value, then this "fingerprint" is compared with each password_hash_value from the password file. If you have a match, you have found a password, now in clear text.

Some users might not use an exact word from the dictionary, but may have made some kind of change to the words (transformations), like

Here are given some examples of the changes from above (Be aware not all these transformations are implemented in the centralized version):

As can be seen from this list http://splashdata.com many users chose passwords which are more or less takes from a dictionary.

In this assignment the general idea can be implemented using a number of distributed architectures.

Each architecture can be implemented using threads (useful if the computer has more than one CPU) and by using processes with socket for communication. You only choose one architecture to work with.

Master/slave (data parallelism)

The master holds the dictionary. Each slave gets a part of the dictionary.

A number of different architectures

  1. Master / Slave where the master and the slaves are running in separate threads. They can use shared memory.
  2. Master / Slave where the master and the slaves are running in separated processes, e.g. on different computers communicating over TCP sockets.
    This comes in two variations:
    1. The master is the server. The slaves are clients. Each client/slave ask the master/server for a part of the job.
    2. The master is a client. The slaves are servers. The master ask each slave/server to do a part of the job.
    The variation a. is the realistic one and variation whereas the architecture in variation b. is more simpel.

Group work

The mandatory assignment must be done in groups of 4-5 students. The students will form the groups under the supervision of the teachers.

Each group must make an experiment of one main architecture.Depending on the groups motivation and time they can chose to extend and implement a supplementary architecture after the main architecture has been finished.
A combined effort between two or more groups connecting their servers together is also possible and interesting.

Getting started

To get you started you must download a centralized C# version of the password cracker or the java version centralized Java version of the password cracker

Run it - and see how much time it uses running on your computer(s).

Dictionaries

The centralized version project includes a dictionary "webster-dictionary". For example inside the bin/debug in the Visual studio project.

Hint: During developement you might use a reduced dictionary "webster-dictionary-reduced.txt" to cut down the executing time.

Password file

The centralized version project also includes a password file. Each line in the file contains username + password_hash_value.
The passwords hash_values are encoded using BASE64 encoding to make them into text strings storeable in a text file.

You can actuall also try a few other dictionaries.

Technical requirements

Experiment

In this assignment the class will experiment with different architectures to solve a known problem (cracking passwords, the brute force way).

Your assignment can be implemented and described according to "Scientific Investigation" http://www2.lv.psu.edu/jxm57/irp/sci_inv1.html

  1. The question
  2. The hypothesis
  3. The experiment (focus on variation e.g. of number of cpu's or number of threads or number of clients)

Requirements to the assignment

  1. The architecture must be visualized.
  2. The communication between the different processes (what is sent between the sockets and/or threads (which parameters do you have) must be documented.
  3. The experiment must include time metrics, and at least one other metric.
  4. You must compare your solution with the centralized solution handed out (see getting started).
  5. The experiment(s) and point 1-4 can be described in a pdf document or power point slides.

Hand in

The final project-solution as a .zip file for each group with the is to be uploaded on Wiseflow not later than 19th October 2018 14.00 o'clock.
There is no written report but the group must demonstrate their solution and present the results as power-point slide show (10 minutes) in week 43 Wednesday.