Knowledge is power, and this is especially true for software. Telemetry puts software engineers in the driver's seat by exposing the underlying performance of a system. Identifying possible problems before they occur and proactively fixing them is better than waiting for users to point them out for you.
Telemetry refers to the instrumentation of a system. When you use your web application and take action, you emit events. These events can be aggregated to calculate statistics—which we call metrics—or written to logs. Later, they can be decorated with information that links them together.
This article will start by examining the built-in telemetry and metrics that come for free with the Phoenix framework in Elixir. Elixir is built on the BEAM virtual machine, which is cluster-friendly, so distributed telemetry works out of the box.
What is telemetry?
The dictionary definition of telemetry is:
The process of practice of obtaining measurements for recording or display to a point at a distance. The transmission of measurements by the apparatus making them. Also apparatus used for this; information so transmitted.
In short - tele- - at a distance, -metry - of or pertaining to metrics.
At the heart of telemetry is designing software systems to make their internal operations visible externally - so that someone can take action based on that data. Those actions might be backward looking - how did this happen? They might also be forward-looking - if we don't address this, then bad things might happen .
The relationship between a telemetry event and metrics
We call the raw actions coming out of a system events . A properly tagged set of continuous events is a critical resource in backward-looking telemetry, helping developers figure out what happened inside a system.
For forward-looking telemetry, an aggregated view of events is more useful. We call these events metrics .
The metrics reported might be operationally related - "the Dilithum Crystals cannae handle it Captain" or they might be business focused - "aggregated sales volume is 20% below trend" .
Starting a tech company in Linlithgow, where Engineer Scottie from Star Trek will be born in 2222, I obviously have Dilithium Crystals hanging about, but don't worry young 'un, keep at it and the good tech will come your way - if you focus on the business side and get the money in.
Telemetry is a holistic game. The customer doesn't care if they didn't get your order because of a bug in your application software, or your server ran out of memory or rain came through the roof and fried all the electronics in the data center.
Before we start digging into it, let's start looking at what telemetry comes out of the box with Phoenix. https://www.honeybadger.io/ "This was the easiest signup experience I've ever had. Amazing work." — Andrew McGrath Get started for free Simple 5-minute setup — No credit card required
Quick start for telemetry with Phoenix
We'll start by creating a dummy Phoenix application called Brock
, a guid Scots word for badger. Be sure to install Elixir and Phoenix before continuing.
For this tutorial, we'll be using Phoenix with Elixir 1.17.3, and assume that you already also have Postgres .
Building Brock
will show us Phoenix's telemetry tooling. (For readers for whom English is a 2nd language that's good Scots , for all you native speakers, Saor Alba! [strikes Mel Gibson in Braveheart pose])
`mix phx.new brock`
You will notice as the terminal scrolls past that it has created a telemetry module in brock/lib/brock_web
. Perhaps not surprisingly it is called telemetry.ex
.
Next, do the rest of the setup:
`cd brock
mix ecto.create`
Depending on your setup you might need to fiddle with database passwords and hostnames that Phoenix uses to connect to Postgres. These configuration settings are usually in config/dev.exs
Looking for the telemetry module
Before we start the application, let's poke around the existing code and see what we see.
In mix.exs
we can see that Phoenix is pulling in the dependencies telemetry_poller and telemetry_metrics and they, in turn, pull in the base Erlang telemetry package telemetry . By default, Phoenix starts the telemetry supervisor and we are good to go.
The poller is a library that collects metrics from the underlying virtual machine - the BEAM. Remember that the customer doesn't care what caused the problem - but for programmers understanding the behavior of the underlying BEAM or the operating systems or hardware is also important in telematics.
Skimming the code in telemetry.ex
we can see that there are a load of metrics defined, for example:
`summary ( "phoenix.endpoint.start.system_time" , unit: { :native , :millisecond }), # … summary ( "brock.repo.query.total_time" , unit: { :native , :millisecond }, description: "The sum of the other measurements" ), # … summary ( "vm.memory.total" , unit: { :byte , :kilobyte }),`
Phoenix is collecting metrics from the Phoenix application itself, the Ecto repo instantiated inside the new Brock app, and the Erlang virtual machine. Later on, we will see how these are presented to the user.
This module is where you would add your custom metrics . In your code, you would write emitters for the telemetry events .
The only metric type already configured is summary
, but there are another 4 available; counter
, sum
, last_value
and distribution
. You can read about them in the telemetry metrics documentation .
One last thing before we fire up Phoenix - let's add operating system monitoring to receive VM-related events. We do this by editing the application
function in mix.exs
to add a new parameter :os_mon
:
`# Type `mix help compile.app` for more information. def application do [ mod: { Brock . Application , []}, extra_applications: [ :logger , :runtime_tools , :os_mon ] ] end`
Now let's see Phoenix telemetry in action! Start your Phoenix app in a shell:
`iex -S mix phx.server`
If we head over to the browser at http://localhost:4000/dev/dashboard , we can see the dashboard in action:
The built-in telemetry events are aggregated into metrics. The metrics collector is invoked periodically to collect them, and you can see that it does so on a 15-second cycle. It might appear that the telemetry poller is doing this, but it isn't, it's polling the OS and BEAM for you.
This is a presentation of holistic telemetry. Scan along the menu bar - you have tabs for metrics about the Operating System, the use of memory by the BEAM, the atom table, the supervision trees, details of running applications, and more - and a particular tab called Metrics - which displays the specific telemetry that we have looked at. Let's have a peek:
A side-by-side comparison of the three tabs here ( Brock
, Phoenix
and VM
) will show that there is a graph generated for each metric defined in telemetry.ex
. The Brock app is getting metrics from its Ecto repo which makes sense. Everybody needs to graph query execution time for their database, right?
You might have a single Phoenix app, starting multiple web applications with their own databases and Ecto repos. You'd probably want to see metrics from each one separately.
Phoenix is built on top of Elixir which is built on top of the BEAM and Erlang OTP. OTP is a very mature platform dating back to the 1990s with very strong operational support. You don't build systems with 99.9999999% uptime (31.56 milliseconds downtime per year ) accidentally.
The entire stack is wired for telemetry with the veritable SASL (System Architecture Support Libraries) underpinning it all. OTP is instrumented, Phoenix is instrumented, and now it's up to you to live up to the great traditions of the BEAM community.
Writing your own telemetry in Phoenix
Let's start by adding new metrics to our stack. We'll first add some new metrics for Brock in the telemetry.ex
module. All the existing Phoenix/Repo metrics are summary
so let's mix it up a bit and start by defining metrics of all the other varieties:
`def metrics do [ # Phoenix Metrics # … # Database Metrics # … # VM Metrics # … summary ( "vm.total_run_queue_lengths.io" ), # Brock metrics counter ( "brock.bingobongo" ), sum ( "brock.bingobongo" ), last_value ( "brock.bingobongo" ), distribution ( "brock.bingobongo" ) ] end`
Pop over to the dashboard and refresh and we can see that the graphs are all there:
Adding our own telemetry events
Now let's populate those graphs. To do that we need to emit telemetry events. Rather than go through the whole faff of setting up a dummy app, let's just do it from the shell. It's how Joe Armstrong always worked, so you are in good company.
At the prompt, we can just fire off telemetry events:
`iex ( 1 ) > :telemetry . execute ([ :brock ], %{ bingobongo: 9 }) :ok iex ( 2 ) > :telemetry . execute ([ :brock ], %{ bingobongo: 33 }) :ok iex ( 3 ) > :telemetry . execute ([ :brock ], %{ bingobongo: 5 }) :ok iex ( 4 ) > :telemetry . execute ([ :brock ], %{ bingobongo: 2.3 }) :ok`
And voila they appear in the dashboard:
We are aggregating the same event in 4 different ways here. The telemetry library we are using is written in Erlang which is why you are specifying the module name as an atom - Erlang doesn't have a hierarchical module namespace like Elixir. There is a common open telemetry framework so that different monitoring solutions can be attached and this library is a key part of that.
Looking at the graphs, they don't make much sense, which does make sense as they are fake metrics. In your real app you will need to work out what you want to measure and how.
There are various configuration options which you can explore and metrics data can be saved, although that is a bit more work.
Driving multiple metrics with the same telemetry events
A single telemetry event can feed multiple metrics. Let's add a couple of new metrics in the telemetry.ex
module:
`def metrics do [ # Brock metrics # ... sum ( "brock.bingo.jammies" ), sum ( "brock.bingo.jimmies" ) ] end`
Let's get some telemetry events emitted:
`iex ( 5 ) > :telemetry . execute ([ :brock , :bingo ], %{ jammies: 53 , jimmies: 133 }) :ok iex ( 6 ) > :telemetry . execute ([ :brock , :bingo ], %{ jammies: 15 , jimmies: 15 }) :ok`
We get both graphs populated:
The metrics come in a natural hierarchical namespace which you need to design, so counter(“brock.bingo.bongo”)
would be invoked like :telemetry.execute([:brock, :bingo], %{bongo: 11})
.
Telemetry event metadata
We can extend the model with event metadata. Event metadata allows you to slice and dice your metrics. Imagine you have a common registration process for developers, small companies, and enterprise users. Now imagine you have instrumented it! You want to understand its behaviour as a whole, but also be able to compare how different classes of users execute the process — tagging your events with event metadata is the how you would do that.
Adding event metadata is simple, and helps in extracting valuable data from all your metrics:
`:telemetry . execute ([ :db , :query ], %{ duration: 198 }, %{ table: "users" , operation: "insert" })`
When the telemetry event arrives at the metrics engine, it sets about tagging metrics. The metrics definitions include tag values which are used to extract tagging from the metadata. You can read about tagging in the documentation .
We can also build spanning that runs across multiple telemetry events, providing a holistic view of a process across the HTTP request lifecycle and beyond.
Distributed telemetry with Elixir clusters
So far we have looked at the -metry end of the telescope. Now it's time to look at the tele- bit, getting data out of one system.
Leveraging the power of the BEAM we can start exploring telemetry. The first step is to build an Elixir cluster:
`cd ./config cp dev.exs dev2.exs`
Let's edit dev2.exs
so we can create a cluster. There are 2 settings we need to change. First up we rename the database:
`config :brock , Brock . Repo , username: "postgres" , password: "postgres" , hostname: "db" , database: "brock_dev2" ,`
Next, we need to bind the webserver to a different port:
`config :brock , BrockWeb . Endpoint , # Binding to loopback ipv4 address prevents access from other machines. # Change to `ip: {0, 0, 0, 0}` to allow access from other machines. http: [ ip: { 0 , 0 , 0 , 0 }, port: 4001 ],`
We also need to enable Phoenix.LiveReload
in mix.exs
:
`{ :phoenix_live_reload , "~> 1.2" },`
Now we are going to start two sessions in two different terminals. In one type:
`iex --sname dev1 --cookie jerko -S mix phx.server`
And in the other, type:
`MIX_ENV = dev2 mix ecto.create MIX_ENV = dev2 iex --sname dev2 --cookie jerko -S mix phx.server`
We should now have two instances of Phoenix running on two Erlang VMs - one bound to port 4000 and the other to 4001.
These two have the same cookies - we can check with :erlang.get_cookie()
and we can make them into a cluster simply by pinging one from the other. My nodes are called dev1@gordon
and dev2@gordon
("gordon" is the hostname), so executing :net_adm.ping(:'dev1@gordon')
on dev2
will do the trick. You should get a pong
back and not a pang
(it's Swedish humor, that is).
If we now run :erlang.nodes()
in one shell we should get back a list with the other node name in it.
Clustered Elixir telemetry metrics
Now we'll power over to the browser and look at the dashboard:
In the top right corner we can detach the dashboard and connect it to another node. Pretty neat, huh?
So how does Elixir do that? The best way to understand it is to compare the processes of the same BEAM from 2 dashboards:
First up is the view of dev1
from dev1
:
And then the same view from dev2
:
The process with the PID <0.50.0>
on dev1
is <29582.50.0>
on dev2
and all the way down the list.
The PID (or process ID) is a fundamental element of the BEAM and it has 3 visible parts (and an invisible one). The middle number is the number of the process on the machine it is running one, the left number is the number of the machine on the physical host, and the right number is the number of the physical node in the cluster.
The BEAM implements an actor model - discrete processes that communicate by sending each other messages. The PID is the postal address of the system even if the process is on another VM on another machine.
Thanks to PIDs, the whole core of Erlang is network transparent, and building clusters and working across a cluster is super-easy, like impossibly easy. So the dashboard running on one server can simply swap over and attach to the telemetry supervision tree on the other server and show that instead.
(The invisible number is the generation number, if a process dies and then a new process gets its PID, it's a different PID. Generation numbers are a bit obscure, and you needn't normally worry about them, but it's the reason :erlang.list_to_pid/1
has a warning not to be used in production.)
Free distributed telemetry
But there's more. Let's start a new node with the command:
`iex --sname nophoenix --cookie jerko`
Now do the same ping
trick ( :net_adm.ping(:'nophoenix@gordon')
) on another node. Joining this new node to one of the nodes in the cluster connects it to both. You can check by running :erlang.nodes()
in both shells. This 3rd node, which is not running Phoenix, appears on our dashboard list and we can attach the dashboard to it:
Obviously the dashboard has a few less elements. The node under observation isn't running Phoenix, so any of the Phoenix-specific bits aren't there. The os_mon
application isn't started either so none of that, and telemetry is missing. But you can see how distributed telematics really can work in Elixir, with or without Phoenix.
Show this to your ops team
Often times a software shop will switch to Elixir and Phoenix for concurrency and performance reasons. The developers will often be excited but the ops teams are not as excited. This simple cluster trick is a good way to persuade your ops teams that they really need to understand how different working with the BEAM and OTP is to working with the JVM or PHP or Ruby or anything else.
It's a best practice to take the live dashboard out of your non-development environments but the network transparency allows you to build a separate jump server inside your secure DMZ with monitored access. You can connect that to a production machine round the back securely and use your dashboard for diagnostics.
Show this to your security team
Once upon a time, when I was a BEAM guru at a big company, I tried and tried to persuade them to manage their cookies properly and have cluster separation across the piece. “Pfft” they said.
We later accidentally clustered the testing and production clusters while deploying a new system that was literally catching money falling from the sky. Then other words were said - words I blush at the memory of.
After that cookies and clustering were taken seriously. For a bit of fun, show your security team this command and watch them sweat :kernel.call(SomeNode, System, :cmd, \[“rm”, “-rf”, “/\*”\])
. ( Hint : is letting somebody on another machine in the cluster wipe your file system a good thing or a bad thing?)
When clustered Elixir is good, its very, very good—and when it's bad, it's horrid!
Use this trick for developing instrumented libraries
Imagine you are writing a library and you want it to emit telemetry events to be a good citizen. You can add the telemetry library to your application, then in development, you can just fire up an empty Phoenix app and attach it to the node you are running your library in. You'll get a free dashboard!
Take telemetry to the next level in your Elixir and Phoenix apps
Phoenix telemetry out of the box gives you a great start on managing and inspecting your application. Phoenix's telemetry events take you a long way, but your app is unique to you and your company, so you should add custom events relevant to your business model.
The Phoenix dashboard also brings in native BEAM telemetry by reporting the underlying metrics. However, the value of telemetry lies not in a particular measurement but upon acting on the measurement. It is mapping an event emitted to a metric on which there is a threshold. That threshold should alert a person to take action.
Telemetry without alerts and user actions is just vanity programming. You need to wire your instrumentation up to your operational processes, and Honeybadger can help with that. Honeybadger's full-stack Phoenix monitoring platform allows you to graph, query, and alert on your Phoenix telemetry data.
In addition to Honeybadger's Elixir error tracking, logging, and uptime monitoring tools, it's the best way to gain real-time insights into your application's health and performance.
Throttling and Blocking Bad Requests in Phoenix Web Applications with PlugAttack
Paraxial.io, 2022-02-02 A credential stuffing attack
Web applications that accept username and password pairs for authentication may experience credential stuffing by malicious clients. We use the term “credential stuffing” to refer to the act of using credentials, taken from a website’s public data breach, to preform many authentication attempts against victim accounts on a different website. This tutorial will demonstrate how to mitigate credential stuffing against a Phoenix Framework application, using PlugAttack .
1. Setup victim application, Orru
The victim web application for this tutorial is named orru
, created with Elixir 1.12 and Phoenix 1.6. You must have both Elixir and the Phoenix Framework installed locally to run orru
. See the appropriate guide for your operating system. PostgreSQL is always required, see the official wiki for installation instructions.
To check that your local environment is setup correctly, open the “Up and Running” page from the official Phoenix documentation , and follow the instructions to run the hello
application locally. Running mix phx.server
from the hello
directory, then navigating to http://localhost:4000 in your web browser, should produce the “Welcome to Phoenix!” page. This confirms your local environment is configured for this tutorial.
Open a new terminal, change to the directory where you want orru
to be located, then use git to get a copy of the orru
source code:
`git clone https://github.com/paraxialio/orru.git`
Run the following commands to setup orru
:
`cd orru
mix deps.get
mix ecto.setup`
Start the server:
`mix phx.server`
Open http://localhost:4000 in your web browser, and you should see a page that is similar to the hello
application you setup earlier. The difference is the “Register” and “Log in” text in the upper right hand corner, indicating that orru has login functionality. The authentication code for orru was created with the mix phx.gen.auth
mix task .
Create a user to ensure the authentication system is working correctly, with the credentials:
`Email - crow@example.com
Password - crowSecret2022`
After completing registration, you should be taken to http://localhost:4000/ and see the message “User created successfully.” In the upper right hand corner of the page, “crow@example.com” should be visible, indicating you are now logged in. You have now completed the setup of orru
, the victim web application for this tutorial.
2. Introducing the credential stuffing script, Envy
The credential stuffing program you use in this tutorial, envy
, takes a list of username/password pairs as input, and attempts to log into each pair. Sending too many requests in a short period of time may overwhelm the victim server, so we introduce a rate limit argument to the program, which is used to determine how many login requests to send in a period of 1 minute.
Open a new terminal, change to the directory where envy
will be located, then use git to download envy
:
`git clone https://github.com/paraxialio/envy.git`
Run the following commands to setup envy
:
`cd envy
mix deps.get`
Run the do_ato.exs
script with no arguments:
`envy % mix run do_ato.exs
Account Takeover Tool v1.0
Example usage, default rate limit (60 requests per minute):
mix run do_ato.exs credentials_10.txt
Example usage, custom rate limit (100 requests per minute):
mix run do_ato.exs credentials_10.txt 100`
The do_ato.exs
script takes two arguments. The first is a file containing username/password pairs, for use in credential stuffing. The second argument is an integer that represents the maximum number of HTTP requests per minute envy
will send.
The command:
`mix run do_ato.exs credentials.txt 60`
means, “Use the username/password pairs in credentials.txt to send login requests at a maximum rate of 60 per minute”.
envy
sends login requests to http://localhost:4000, and performs the necessary steps to send a POST request with a CSRF token, so the request will succeed. Now that you have envy
and orru
installed and running locally, let’s continue.
3. How to run Envy
Now you will be able to see an account takeover attack from both sides, as the person performing it and as the web application owner.
Ensure orru
is running in one terminal:
`orru % mix phx.server`
Now switch to a different terminal, change to the envy
directory, and run:
`mix run do_ato.exs credentials_10.txt`
If everything is setup correctly, the output will be:
`envy % mix run do_ato.exs credentials_10.txt
2022-01-31 21:11:57 crow0@example.com/crowSecret99 Login POST status 200
2022-01-31 21:11:58 crow1@example.com/crowSecret98 Login POST status 200
2022-01-31 21:11:59 crow2@example.com/crowSecret97 Login POST status 200
2022-01-31 21:12:00 crow@example.com/crowSecret2022 Login POST status 302
2022-01-31 21:12:01 crow3@example.com/crowSecret96 Login POST status 200
2022-01-31 21:12:02 crow4@example.com/crowSecret95 Login POST status 200
2022-01-31 21:12:03 crow5@example.com/crowSecret94 Login POST status 200
2022-01-31 21:12:04 crow6@example.com/crowSecret93 Login POST status 200
2022-01-31 21:12:05 crow7@example.com/crowSecret92 Login POST status 200
2022-01-31 21:12:06 crow8@example.com/crowSecret91 Login POST status 200`
Only one pair of credentials resulted in a successful login. The server responds with a 200 status code on a failed login because the HTTP POST request itself is successful, but the HTML response from the server indicates the login failed. When the login request is successful, the server replies with a 302.
The credentials_10.txt file only contains 10 pairs. Let’s increase the maximum number of requests per minute to 500, and use the credentials_100.txt file.
`envy % mix run do_ato.exs credentials_100.txt 500
2022-01-31 21:12:33 crow0@example.com/crowSecret0 Login POST status 200
2022-01-31 21:12:33 crow1@example.com/crowSecret1 Login POST status 200
...
2022-01-31 21:13:26 crow98@example.com/crowSecret98 Login POST status 200
2022-01-31 21:13:26 crow@example.com/crowSecret2022 Login POST status 302`
One client attempting hundreds of logins per minute is a clear sign of malicious intent. We will now add the PlugAttack project to orru
, and examine different strategies for dealing with bad clients.
4. Adding PlugAttack to Orru
PlugAttack is a, “toolkit for blocking and throttling abusive requests”. To use it effectively, you should be familiar with the Plug project, the conn struct, and how Phoenix projects use plugs. Open the Github repo README and official documentation , then read through the examples. This should give you an idea of how the project is intended to be used. Now let’s add it to the orru
project.
To add PlugAttack as a dependency, open orru/mix.exs
and add one line:
`defp deps do [ { :bcrypt_elixir , "~> 2.0" } , { :phoenix , "~> 1.6.0" } , ... { :plug_cowboy , "~> 2.5" } , { :plug_attack , "~> 0.4.3" } # <- Add this line ] end`
Now add storage to the supervision tree with a child specification. Open orru/lib/orru/application.ex
and add:
`def start ( _type , _args ) do children = [ # Add this line { PlugAttack.Storage.Ets , name : Orru.PlugAttack.Storage , clean_period : 60_000 } , # Start the Ecto repository Orru.Repo ,`
Run mix deps.get
, then run orru
to check everything is working correctly.
5. Throttle login requests per IP address
Now define the PlugAttack plug, with a rule that limits the number of login requests one IP address can make to 10 per minute.
Create a new file, orru/lib/orru/plug_attack.ex
, and enter the following:
`defmodule Orru.PlugAttack do use PlugAttack rule "throttle login requests" , conn do if conn . method == "POST" and conn . path_info == [ "users" , "log_in" ] do throttle conn . remote_ip , period : 60_000 , limit : 10 , storage : { PlugAttack.Storage.Ets , Orru.PlugAttack.Storage } end end end`
The throttle rule is defined, but incoming requests need a way to reach it. Open orru/lib/orru_web/router.ex
, define a new pipeline called :plug_attack
, and add Orru.PlugAttack
to it. Finally, add :plug_attack
to the Authentication routes in the router.
`defmodule OrruWeb.Router do use OrruWeb , :router import OrruWeb.UserAuth pipeline :browser do ... end # Add this pipeline to the router pipeline :plug_attack do plug Orru.PlugAttack end pipeline :api do plug :accepts , [ "json" ] end ... ## Authentication routes scope "/" , OrruWeb do # Edit this line to include plug_attack pipe_through [ :browser , :redirect_if_user_is_authenticated , :plug_attack ] get "/users/register" , UserRegistrationController , :new ...`
Use the envy
program to test if this rule works. Run orru
with mix phx.server
, then open envy
in a different terminal and run these commands:
`envy % mix run do_ato.exs credentials_10.txt 500
2022-01-31 21:23:05 crow0@example.com/crowSecret99 Login POST status 200
2022-01-31 21:23:05 crow1@example.com/crowSecret98 Login POST status 200
2022-01-31 21:23:05 crow2@example.com/crowSecret97 Login POST status 200
2022-01-31 21:23:05 crow@example.com/crowSecret2022 Login POST status 302
2022-01-31 21:23:05 crow3@example.com/crowSecret96 Login POST status 200
2022-01-31 21:23:05 crow4@example.com/crowSecret95 Login POST status 200
2022-01-31 21:23:05 crow5@example.com/crowSecret94 Login POST status 200
2022-01-31 21:23:05 crow6@example.com/crowSecret93 Login POST status 200
2022-01-31 21:23:06 crow7@example.com/crowSecret92 Login POST status 200
2022-01-31 21:23:06 crow8@example.com/crowSecret91 Login POST status 200
envy % mix run do_ato.exs credentials_10.txt 500
2022-01-31 21:23:09 crow0@example.com/crowSecret99 Login POST status 403
2022-01-31 21:23:09 crow1@example.com/crowSecret98 Login POST status 403
2022-01-31 21:23:09 crow2@example.com/crowSecret97 Login POST status 403
2022-01-31 21:23:09 crow@example.com/crowSecret2022 Login POST status 403
2022-01-31 21:23:09 crow3@example.com/crowSecret96 Login POST status 403
2022-01-31 21:23:09 crow4@example.com/crowSecret95 Login POST status 403
2022-01-31 21:23:09 crow5@example.com/crowSecret94 Login POST status 403
2022-01-31 21:23:09 crow6@example.com/crowSecret93 Login POST status 403
2022-01-31 21:23:10 crow7@example.com/crowSecret92 Login POST status 403
2022-01-31 21:23:10 crow8@example.com/crowSecret91 Login POST status 403`
You have successfully limited the number of requests one IP address can send to 10 per minute.
6. How PlugAttack uses ETS (Erlang Term Storage)
Earlier you configured storage for PlugAttack, with the name Orru.PlugAttack.Storage
. When an incoming request matches a rule you defined, it is placed in Erlang Term Storage, or ETS.
Run orru
with iex -S mix phx.server
. Then run:
`iex(2)> :ets . tab2list ( Orru.PlugAttack.Storage ) [ ]`
The output should be empty, because you did not send any login requests. Now make one failed login request in orru
, through your web browser, then run tab2list
again:
`iex(5)> :ets . tab2list ( Orru.PlugAttack.Storage ) [ { { :throttle , { 127 , 0 , 0 , 1 } , 27391530 } , 1 , 1643491860000 } ]`
PlugAttack stores details about requests that match the rules you define. Now send 100 login requests with envy
:
`envy % mix run do_ato.exs credentials_100.txt 500
...`
Then check ETS:
`iex(8)> :ets . tab2list ( Orru.PlugAttack.Storage ) [ { { :throttle , { 127 , 0 , 0 , 1 } , 27391533 } , 101 , 1643492040000 } ]`
This is how PlugAttack keeps track of requests that match the rule you defined. Your output may be different if the 60 second period ended while the requests were running.
7. Multiple rules and blocking IP addresses
Now that you have completed a basic example of throttling login requests, consider a more complex requirement.
`1. Limit login requests from one IP address to 10 per minute
2. If one IP does 50 login requests in a minute, ban it for seven days (one week)`
Your first idea may be to add the following to plug_attack.ex
, after the “throttle login requests” rule:
`# This is wrong rule "block login requests if over 50 in 60 seconds" , conn do if conn . method == "POST" and conn . path_info == [ "users" , "log_in" ] do fail2ban conn . remote_ip , period : 60_000 , limit : 50 , ban_for : 60_000 * 60 * 24 * 7 , storage : { PlugAttack.Storage.Ets , Orru.PlugAttack.Storage } end end`
To demonstrate why this rule is wrong, you may add it in plug_attack.ex
, then run orru
and send 100 login requests with:
`envy % mix run do_ato.exs credentials_100.txt 500`
Check ETS:
`iex(2)> :ets . tab2list ( Orru.PlugAttack.Storage ) [ { { :throttle , { 127 , 0 , 0 , 1 } , 27391548 } , 27 , 1643492940000 } , { { :throttle , { 127 , 0 , 0 , 1 } , 27391547 } , 73 , 1643492880000 } ]`
The reason for 27 and 73 is because the script was running as the clock changed minutes. This demonstrates the fail2ban rule was never matched, only the throttle rule. If you move the fail2ban rule above the throttle rule, only fail2ban will match, meaning an attacker can send 49 requests per minute without being throttled.
A new plug for the fail2ban rule will fix this problem. Create orru/lib/orru/plug_attack_fail2ban.ex
and enter:
`defmodule Orru.PlugAttackFail2Ban do use PlugAttack @week 60_000 * 60 * 24 * 7 rule "fail2ban on login" , conn do if conn . method == "POST" and conn . path_info == [ "users" , "log_in" ] do fail2ban conn . remote_ip , period : 60_000 , limit : 50 , ban_for : @week , storage : { PlugAttack.Storage.Ets , Orru.PlugAttack.Storage } end end end`
Now we need to add the PlugAttackFail2Ban plug to our router. The order here is important. Remember we want to have two rules in place - only allow 10 login requests per minute and if one IP does 50 requests in a minute, ban for a week. Let’s consider the wrong order first:
`pipeline :plug_attack do # This order is wrong plug Orru.PlugAttack plug Orru.PlugAttackFail2Ban end`
The problem is that after 10 requests in a 60 second period, the conn of all future requests will be marked as throttled, and not count toward the fail2ban rule. The client will be throttled, but never banned for a week.
With orru
running, and the plug order wrong, do:
`envy % mix run do_ato.exs credentials_100.txt 500`
Then check ETS:
`iex(5)> :ets . tab2list ( Orru.PlugAttack.Storage ) [ { { { :fail2ban , { 127 , 0 , 0 , 1 } } , 1643495107200 } , 0 , 1643495167200 } , { { { :fail2ban , { 127 , 0 , 0 , 1 } } , 1643495107325 } , 0 , 1643495167325 } , { { { :fail2ban , { 127 , 0 , 0 , 1 } } , 1643495106755 } , 0 , 1643495166755 } , { { { :fail2ban , { 127 , 0 , 0 , 1 } } , 1643495106838 } , 0 , 1643495166838 } , { { :throttle , { 127 , 0 , 0 , 1 } , 27391585 } , 100 , 1643495160000 } , { { { :fail2ban , { 127 , 0 , 0 , 1 } } , 1643495106735 } , 0 , 1643495166735 } , { { { :fail2ban , { 127 , 0 , 0 , 1 } } , 1643495107081 } , 0 , 1643495167081 } , { { { :fail2ban , { 127 , 0 , 0 , 1 } } , 1643495107696 } , 0 , 1643495167696 } , { { { :fail2ban , { 127 , 0 , 0 , 1 } } , 1643495107446 } , 0 , 1643495167446 } , { { { :fail2ban , { 127 , 0 , 0 , 1 } } , 1643495107569 } , 0 , 1643495167569 } , { { { :fail2ban , { 127 , 0 , 0 , 1 } } , 1643495106967 } , 0 , 1643495166967 } ]`
10 requests matched :fail2ban, then 100 matched :throttle.
Change orru/lib/orru_web/router.ex
so the plugs are in the correct order:
`# Correct order
pipeline :plug_attack do
plug Orru.PlugAttackFail2Ban
plug Orru.PlugAttack
end`
Then send 100 login requests:
`envy % mix run do_ato.exs credentials_100.txt 500`
Check ETS:
`iex(4)> :ets . tab2list ( Orru.PlugAttack.Storage ) [ { { { :fail2ban , { 127 , 0 , 0 , 1 } } , 1643495266511 } , 0 , 1643495326511 } , ... { { :fail2ban_banned , { 127 , 0 , 0 , 1 } } , true , 1644100066630 } , { { :throttle , { 127 , 0 , 0 , 1 } , 27391587 } , 50 , 1643495280000 } , ... ]`
This fulfills our original requirements:
`1. Limit login requests from one IP address to 10 per minute
2. If one IP does 50 login requests in a minute, ban it for seven days (one week)`
Careful readers of PlugAttack’s documentation may have noticed we used the same storage for both our rules. The documentation says, “Be careful not to use the same key for different rules that use the same storage”.
To demonstrate how reuse of storage can cause problems, let’s add another throttle rule in orru/lib/orru/plug_attack.ex
`defmodule Orru.PlugAttack do use PlugAttack rule "throttle login requests" , conn do if conn . method == "POST" and conn . path_info == [ "users" , "log_in" ] do throttle conn . remote_ip , period : 60_000 , limit : 10 , storage : { PlugAttack.Storage.Ets , Orru.PlugAttack.Storage } end end # Should not be using this storage rule "throttle register page GETs" , conn do if conn . method == "GET" and conn . path_info == [ "users" , "register" ] do throttle conn . remote_ip , period : 60_000 , limit : 20 , storage : { PlugAttack.Storage.Ets , Orru.PlugAttack.Storage } end end end`
In your web browser, go to http://localhost:4000/users/register
, then refresh the page 9 times. Then attempt to login. Your attempt will fail, because any throttle rule for conn.remote_ip will increment the same :throttle tuple in ETS.
`iex(8)> :ets . tab2list ( Orru.PlugAttack.Storage ) [ { { :throttle , { 127 , 0 , 0 , 1 } , 27391617 } , 11 , 1643497080000 } ]`
When PlugAttack checks Orru.PlugAttack.Storage, :throttle
has been incremented to 11, so any throttle rule for the key conn.remote_ip
will have count. Defining an additional storage in application.ex
will allow you to avoid this problem.
8. Conclusion and Future Work
This article does not cover deployment of a Phoenix application. One important note for anyone using PlugAttack is that conn.remote_ip
will likely not reflect the real client IP. Use the remote_ip library to fix this.
ETS is reset when you deploy your application, so the seven day ban example from this article may not work if you do frequent deploys. Writing login history to Postgres, then using that data to make throttling decisions may be a better solution. You may also wish to throttle on failed login requests only, instead of on all logins.
PlugAttack supports callbacks that are triggered when a request is allowed or blocked. This is a powerful feature, for example if you want to record statistics about how many requests are being blocked, block_action/3
could be used to trigger writes to the database.
Thank you to Michał Muskała for creating PlugAttack.
Paraxial.io stops data breaches by helping developers ship secure applications. Get a demo or start for free. Subscribe to stay up to date on new posts. Thank You Oops! Something went wrong, please try again
This guide collects some of the most common functions of the Elixir Programming Language and explain them conceptual and graphically in a simple way.
To request new entries, suggest corrections or provide translations go to the this project’s repository in Github .
Atom
Enum
- all?/2
- any?/2
- at/3
- chunk_by/2
- chunk_every/2
- chunk_every/4
- chunk_while/4
- concat/1
- concat/2
- count/1
- count/2
- dedup/1
- dedup_by/2
- drop/2
- drop_every/2
- drop_while/2
- each/2
- empty?/1
- fetch!/2
- fetch/2
- filter/2
- find/3
- find_index/2
- find_value/3
- flat_map/2
- flat_map_reduce/3
- frequencies/1
- frequencies_by/2
- group_by/3
- intersperse/2
- join/2
- map/2
- map_every/3
- map_intersperse/3
- map_join/3
- map_reduce/3
- max/3
- max_by/4
- member?/2
- min/3
- min_by/4
- min_max/2
- min_max_by/3
- random/1
- reduce/2
- reduce/3
- reduce_while/3
- reject/2
- reverse/1
- reverse/2
- reverse_slice/3
- scan/2
- scan/3
- shuffle/1
- slice/2
- slice/3
- sort/1
- sort/2
- sort_by/3
- split/2
- split_while/2
- split_with/2
- sum/1
- take/2
- take_every/2
- take_random/2
- take_while/2
- uniq/1
- uniq_by/2
- unzip/1
- with_index/2
- zip/1
- zip/2
Integer
Kernel
Keyword
- delete/2
- delete_first/2
- drop/2
- fetch!/2
- fetch/2
- get/3
- get_lazy/3
- get_values/2
- has_key?/2
- keys/1
- keyword?/1
- new/0
- new/1
- new/2
- pop!/2
- pop/3
- pop_first/3
- pop_lazy/3
- pop_values/2
- put/3
- put_new/3
- put_new_lazy/3
- replace!/3
- split/2
- take/2
- to_list/1
- update!/3
- update/4
- values/1
List
- delete/2
- delete_at/2
- duplicate/2
- first/1
- flatten/1
- flatten/2
- foldl/3
- foldr/3
- insert_at/3
- keydelete/3
- keyfind/4
- keymember?/3
- keyreplace/4
- keystore/4
- keytake/3
- last/1
- pop_at/3
- replace_at/3
- starts_with?/2
- to_integer/1
- to_integer/2
- to_tuple/1
- update_at/3
- wrap/1
- zip/1
Map
- delete/2
- drop/2
- fetch!/2
- fetch/2
- get/3
- get_and_update!/3
- get_and_update/3
- get_lazy/3
- has_key?/2
- keys/1
- merge/2
- merge/3
- new/0
- new/1
- new/2
- pop!/2
- pop/3
- pop_lazy/3
- put/3
- put_new/3
- put_new_lazy/3
- replace!/3
- split/2
- take/2
- to_list/1
- update!/3
- update/4
- values/1
Range
Stream
- chunk_by/2
- chunk_every/2
- chunk_every/4
- chunk_while/4
- cycle/1
- dedup/1
- dedup_by/2
- drop/2
- drop_every/2
- drop_while/2
- each/2
- filter/2
- flat_map/2
- intersperse/2
- interval/1
- iterate/2
- map/2
- map_every/3
- reject/2
- repeatedly/1
- scan/2
- scan/3
- take/2
- take_every/2
- take_while/2
- timer/1
- transform/3
- transform/4
- unfold/2
- uniq/1
- uniq_by/2
- with_index/2
- zip/1
- zip/2
Tuple
Building a World of Warcraft server in Elixir
Updates:
Thistle Tea is my new World of Warcraft private server project. You can log in, create a character, run around, and cast spells to kill mobs, with everything synchronized between players as expected for an MMO. It was floating around in my head to build this for a while, since I have an incurable nostalgia for early WoW. I first mentioned this on May 13th and didn’t expect to get any further than login, character creation, and spawning into the map. Here’s a recount of the first month of development.
Prep Work
Before coding, I did some research and came up a plan.
- Code this in Elixir, using the actor model.
- MaNGOS has done the hard work on collecting all the data, so use it.
- Use Thousand Island as the socket server.
- Treat this project as an excuse to learn more Elixir.
- Reference existing projects and documentation rather than working from scratch.
- Speed along the happy path rather than try and handle every error.
Day 1 - June 2nd
There are two main parts to a World of Warcraft server: the authentication server and the game server. Up first was authentication, since you can’t do anything without logging in.
To learn more about the requests and responses, I built a quick MITM proxy between the client and a MaNGOS server to log all packets. It wasn’t as useful as expected, since not much was consistent, but it did help me internalize how the requests and responses worked.
The first byte of an authentication packet is the opcode, which indicates which message it is, and the rest is a payload with the relevant data. I was able to extract the fields from the payload by pattern matching on the binary data.
The auth flow can be simplified as:
- Client sends a CMD_AUTH_LOGON_CHALLENGE packet
- Server sends back some data for the client to use in crypto calculations
- Client sends a CMD_AUTH_LOGON_PROOF packet with the client proof
- If the client proof matches what’s expected, server sends over the server_proof
- Client is now authenticated
It uses SRP6, which I hadn’t heard of before this. Seems like the idea is to avoid transmitting an unencrypted password and instead both the client and server independently calculate a proof that only matches if they both know the correct password. If the proof matches, then authentication is successful.
So basically, what I needed to do was:
- Listen for data over the socket
- Once data received, parse what message it is out of the header section
- Handle each one differently
- Send back the appropriate packet
This whole part is well documented, but I still ran into some issues with the cryptography. Luckily, I found a blog post and an accompanying Elixir implementation, so I was able to substitute my broken cryptography with working cryptography. Without that, I would’ve been stuck at this part for a very long time (maybe forever). Wasn’t able to get login working on day 1, but I was close.
Links:
- https://wowdev.wiki/Login
- http://srp.stanford.edu/design.html
- https://shadowburn-project.org/2018/10/17/logging-in-with-vanilla.html
- https://gitlab.com/shadowburn/shadowburn
- https://hexdocs.pm/elixir/main/binaries-strings-and-charlists.html
Day 2 - June 3rd
I spent some time cleaning up the code and found a logic error where I reversed some crypto bytes that weren’t supposed to be. Fixing that made auth work, finally getting a success with hardcoded credentials.
Next up was getting the realm list to work, by handling CMD_REALM_LIST and returning which game server to connect to.
This got me out of the tedious auth bits and I could get to building the game server.
Links:
Day 3 - June 4th
The goal for today was to get spawned into the world. But first more tedious auth bits.
The game server auth flow can be simplified as:
- When client first connects, server sends SMSG_AUTH_CHALLENGE, with a random seed
- Client sends back CMSG_AUTH_SESSION, with another client proof
- If client proof matches server proof, server sends a successful SMSG_AUTH_RESPONSE
This negotaties how to encrypt/decrypt future packet headers. Luckily Shadowburn also had crypto code for this, so I was able to use it here. The server proof requires a value previously calculated by the authentication server, so I used an Agent to store that session value. It worked, but I later refactored it to use ETS for a simpler interface.
After that, it’s something like:
- Client sends message to server
- Server decrypts header, which contains message size (2 bytes) and opcode (4 bytes)
- Server handles message and sends back 0 or more messages to client
First was handling CMSG_CHAR_CREATE and CMSG_CHAR_ENUM, so I could create and list characters. I originally used an Agent for storage here as well, which made things quick to prototype.
Then I got side-tracked for a bit trying to get equipment to show up, since I had all the equipment display ids hardcoded to 0. I looked through the MaNGOS database and hardcoded a few proper display ids before moving on.
After that was handling CMSG_PLAYER_LOGIN. I found an example minimal SMSG_UPDATE_OBJECT spawn packet, which was supposed to spawn me in Northshire Abbey.
That’s probably the most important packet, since it does everything from:
- Spawning things into the world, like players, mobs, objects, etc.
- Updating their values, like health, position, level, etc.
It has a lot of different forms, can batch multiple object updates into a single packet, and has a compressed variant.
Whoops, had the coordinates a bit off.
After fixing that, I was in the human starting area as expected. No player model yet, though.
I thought movement was broken, but it turns out all keybinds were being unset on every login, so the movement keys just weren’t bound. Manually navigating to the keybinding configuration and resetting them to default allowed me to move around.
Next up was adding more to that spawn packet to use the player race and proper starting area. The starting areas were grabbed from a MaNGOS database that I converted over to SQLite and wired up with Ecto.
Last for the night was to get logout working.
The implementation was something like:
- After receiving a CMSG_LOGOUT_REQUEST, use
Process.send_after/3
to queue a :login_complete message that would send SMSG_LOGOUT_COMPLETE to the client in 20 seconds - Store a reference to that timer in state
- If received a CMSG_LOGOUT_CANCEL, cancel the timer and remove it from state
This was the first piece that really took advantage of Elixir’s message passing.
The white chat box was weird, but it was nice being able to log in.
Links:
- https://wowdev.wiki/Opcodes
- https://wowdev.wiki/SMSG_UPDATE_OBJECT
- https://gtker.com/wow_messages/docs/smsg_update_object.html
- https://gtker.com/wow_messages/ir/implementing_world.html
- https://github.com/vdechef/mysql2sqlite
Day 4 - June 5th
First up was reorganizing the code, since my game.ex GenServer was getting too large.
My strategy for that was:
- Split out related messages into separate files
- auth.ex, character.ex, ping.ex, etc.
- wrapped in the
__using__
macro
- Include those back into game.ex with
use
It worked, but it messed with line numbers in error messages and made things harder to debug.
After that, I wanted to generate that spawn packet properly rather than hardcoding. The largest piece of this was figuring out the update mask for the update fields.
There are a ton of fields for the different types of objects SMSG_UPDATE_OBJECT handles. Before the raw object fields in the payload, there’s a bit mask with bits set at offsets that correspond to the fields being sent. Without that, the client wouldn’t know what to do with the values.
So, I needed to write a function that would generate this bit mask from the fields I pass in. Luckily it’s all well documented, but it still took a while to get to a working implementation.
Links:
- https://gtker.com/wow_messages/types/update-mask.html
- https://gtker.com/wow_messages/docs/object.html
Day 5 - June 6th
Referencing MaNGOS, I added some more messages that the server sends to the client after a CMSG_PLAYER_LOGIN. One of these, SMSG_ACCOUNT_DATA_TIMES, fixed the white chat box and keybinds being reset.
I also added SMSG_COMPRESSED_UPDATE_OBJECT, which compresses the SMSG_UPDATE_OBJECT packet with :zlib.compress/1
.
This was more straightforward than expected, and I made things use the compressed variant if it’s actually smaller.
I’m expecting this to have even more benefits when I get to batching object updates, but right now I’m only updating objects one by one.
Movement would come up soon, so I started adding the handlers for those packets.
Day 6 - June 7th
In the update packet, I still had the object guid hardcoded. This is because it wants a packed guid and I needed to write some functions to handle that. Rather than the entire guid, a packed guid is a byte mask followed by all non-zero bytes. The byte mask has bits set that correspond to where the following bytes go in the unpacked guid. This is for optimizing packet size, since a guid is always 8 bytes but a packed guid can be as small as 2 bytes.
This took a while, because the client was crashing when I changed the packed guid from <<1, 4>>
to anything else.
After trying different things and wasting a lot of time, I realized that the guid was in two places in the packet and they needed to match.
A quick fix later and things were working as expected.
Links:
Day 7 - June 8th
It was about time to start implementing the actual MMO features, starting with seeing other players. To test, I hardcoded another update packet after the player’s with a different guid, to try and spawn something other than the player.
Then I used a Registry to keep track of logged in players and their spawn packets.
After entering the world, I would use Registry.dispatch/3
to:
- spawn all logged in players for that player
- spawn that player for all other players
- both using SMSG_UPDATE_OBJECT
After that, I added a similar dispatch when handling movement packets to broadcast movement to all other players. This is where the choice of Elixir really started to shine, and I quickly had players able to see each other move around the screen.
I tested this approach with multiple windows open and it was very cool to see everything synchronized.
I added a handler for CMSG_NAME_QUERY to get names to stop showing up as Unknown and also despawned players with SMSG_DESTROY_OBJECT when logging out.
This is where I started noticing a bug: occasionally I wouldn’t be able to decrypt a packet successfully, which would lead to all future attempts for that connection failing too, since there’s a counter as part of the decryption function. I couldn’t figure out how to resolve it yet, though, or reliably reproduce.
Links:
- https://gtker.com/wow_messages/docs/movementinfo.html
- https://gtker.com/wow_messages/docs/msg_move_start_forward_client.html
- https://gtker.com/wow_messages/docs/msg_move_start_backward_server.html
- https://gtker.com/wow_messages/docs/cmsg_name_query.html
- https://gtker.com/wow_messages/docs/smsg_name_query_response.html
- https://gtker.com/wow_messages/docs/smsg_destroy_object.html
Day 8 - June 9th
To get chat working, I handled CMSG_MESSAGECHAT and broadcasted SMSG_MESSAGECHAT to players, using Registry.dispatch/3
here too.
I only focused on /say here and it’s all players rather than nearby.
Something to fix later.
Related to that weird decryption bug, I handled the case where the server received more than one packet at once. This might’ve helped a bit, but didn’t completely resolve the issue.
Links:
- https://gtker.com/wow_messages/docs/cmsg_messagechat.html
- https://gtker.com/wow_messages/docs/smsg_messagechat.html
Day 9 - June 10th
I still had authentication with a hardcoded username, password, and salt, so it was about time to fix that. Rather than go with PostgreSQL or SQLite for the database, I decided to go with Mnesia, since one of my goals was to learn more about Elixir and its ecosystem. I briefly tried plain :mnesia, but decided to use Memento for a cleaner interface.
So, I added models for Account and Character and refactored everything to use them. The character object is kept in process state and only persisted to the database on logout or disconnect. Saving on a CMSG_PING or just periodically could be a good idea too, eventually. Right now data isn’t persisted to disk, since I’m still iterating on the data model, but that should be straightforward to toggle later.
Links:
- https://www.erlang.org/doc/apps/mnesia/mnesia.html
- https://elixirschool.com/en/lessons/storage/mnesia
- https://github.com/sheharyarn/memento
Day 10 - June 11th
Today was standardizing the logging, handling a bit more of chat, and handling an unencrypted CMSG_PING. I was thinking that could be part of the intermittent decryption issues too, but looking back I don’t think I’ve ever had my client send that unencrypted anyways.
Day 11 - June 12th
I wanted equipment working so players weren’t naked all the time, so I started on that. Using the MaNGOS item_template table, I wired things up to set random equipment on character creation. Then I added that to the response to CMSG_CHAR_ENUM so they would show up in the login screen.
Up next was getting it showing in game.
Day 12 - June 13th
It took a bit to figure out the proper offsets for each piece of equipment in the update mask, but I eventually got it working.
Since equipment is part of the update object packet, it just worked for other players, which was nice.
Day 13 - June 14th
I had player movement synchronizing between players properly so I wanted to get sitting working too.
Whoops. Weird things happen when field offsets or sizes are incorrect when building that update mask.
After that, I wanted to play around a bit by randomizing equipment on every jump. Here I learned that you need to send all fields in the update object packet, like health, or they get reset. I was trying to just send the equipment changes but I’d die on every jump.
After making sure to send all fields, it was working as expected.
Day 14 - June 15th
Took a break.
Day 15 - June 16th
Today was refactoring and improvements.
I reworked things into proper modules, since it was getting hard to debug when all the line numbers were wrong.
Now game.ex called the appropriate module’s handle_packet/3
function, rather than combining everything with use
.
I also reworked things so players were spawned with their current position instead of the initial position saved in the registry. This included some changes to make building an update packet more straightforward.
Day 16 - June 17th
Today was just playing around and no code changes.
Not sure why the model is messed up here, but it seems like it’s something with my computer rather than anything server related.
Day 17 - June 18th
The world was feeling a bit empty, so I wanted to spawn in mobs. First was hardcoding an update packet that should spawn a mob and having it trigger on /say.
After that, I used the creature table of the MaNGOS database to get proper mobs spawning. I used a GenServer for this so every mob would be a process and keep track of their own state. Communication between mobs and players would happen through message passing. First I hardcoded a few select ids in the starting area to load, and after that worked I loaded them all.
Rather than spawn all ~57k mobs for the player, I wired things up to only spawn mobs within a certain range. This looked like:
- Store mob pids in a Registry, along with their x, y, z position
- Create a
within_range/2
function that takes in two x, y, z tuples - On player login, dispatch on that MobRegistry, using
within_range/2
to only build spawn packets for mobs within range - On player movement, do the same
It worked really well and I could run around and see the mobs.
Next up was optimization and despawning mobs that were now out of range.
Day 18 - June 19th
For optimization, I didn’t want to send duplicate spawn packets for mobs that were already spawned. I also wanted to despawn mobs that were out of range. To do this, I used ETS to track which guids were spawned for a player.
In the dispatch, the logic was:
- if in_range and not spawned, spawn
- if not in_range and spawned, despawn
- otherwise, ignore
Despawning was done through the same SMSG_DESTROY_OBJECT packet used for despawning a player after logging out.
After getting that working, I ran around the world and explored for a bit.
I noticed something wrong when exploring Westfall. Bugs were spawning in the air and then falling down to the ground. Turns out I wasn’t separating mobs by map, so Westfall had mobs from Silithus mixed in. To fix, I reworked both the mob and player registries to use the map as the key.
Having mobs standing in place was a bit boring and I wanted them to move around. Turns out this is pretty complicated and I’ll actually have to use the map files to generate paths that don’t float or clip through the ground. There are a few projects for this, all a bit difficult to include in an Elixir project. I’m thinking RPC could work, but not sure if it’ll be performant enough yet.
The standard update object packet can be used for mob movement here, since it has a movement block, but there might be some more specialized packets to look into later too.
Without using the map data, I couldn’t get the server movement to line up with what happened in the client. So, I settled with getting mobs to spin at random speeds.
That was a bit silly and used a lot of CPU, so I tweaked it to just randomly change orientation instead.
Links:
- https://gtker.com/wow_messages/docs/movementblock.html
- https://gtker.com/wow_messages/docs/compressedmove.html
- https://github.com/namreeb/namigator
Day 19 - June 20th
Here I got mob names working by implementing CMSG_CREATURE_QUERY. This crashed the client when querying mobs that didn’t have a model, so I removed them from being loaded. I also started loading in mob movement data and optimized the query a bit to speed up startup.
I finally got some people to help me test the networking later that day. It didn’t start very well.
Turns out I hadn’t tested this locally since adding mobs and the player/mob spawn/despawns were conflicting with each other due to guid collisions. Players were being constantly spawned in and out.
I did some emergency patching to make it so players are never despawned, even out of range. I also turned off /say spawning boars since that was getting annoying. That worked for now.
There were still some major issues. My helper had 450 ms latency and would crash when running to areas with a lot of mobs. I couldn’t reproduce, though, with my 60 ms latency.
Links:
- https://gtker.com/wow_messages/docs/cmsg_creature_query.html
- https://gtker.com/wow_messages/docs/smsg_creature_query_response.html
Day 20 - June 21
To reproduce the issue from the previous night, I connected to my local server from my laptop on the same network.
On my laptop, I used tc
to simulate a ton of latency and wired things up so equipment would change on any movement instead of just jump.
This sent a lot of packets when spinning and I was finally able to reproduce.
Turns out the crashing issues were from not receiving a complete packet, but still trying to decrypt and handle it. I was handling if the server got more than one packet, but not if the server got a partial packet.
Referencing Shadowburn’s implementation, the fix for this was to let the packet data accumulate until there’s enough to handle. This finally resolved the weird decryption issue I ran into on day 7.
For the guid collision issue, I added a large offset to creature guids so they won’t conflict with player guids.
Day 21 - June 22
Took a break.
Day 22 - June 23
Worked on CMSG_ITEM_NAME_QUERY a bit, but there’s still something wrong here. It could be that it’s trying to calculate damage using some values I’m not passing to the client yet.
Decided spells would be next, so I started on that. First was sending spells over with SMSG_INITIAL_SPELLS on login, using the initial spells in MaNGOS, so I’d have something in the spellbook. Everything was instant cast though, for some reason.
Turns out I needed to set unit_mod_cast_speed in the player update packet for cast times to show up properly in the client.
I started by handling CMSG_CAST_SPELL, which would send a successful SMSG_CAST_RESULT after the spell cast time, so other spells could be cast. I also handled CMSG_CANCEL_CAST, to cancel that timer. This implementation looked a bit like the logout logic.
The starting animation for casting a spell would play, but no cast bar or anything further.
Links:
- https://gtker.com/wow_messages/docs/smsg_initial_spells.html
- https://gtker.com/wow_messages/docs/cmsg_cast_spell.html
- https://gtker.com/wow_messages/docs/cmsg_cancel_cast.html
- https://gtker.com/wow_messages/docs/smsg_cast_result.html
Days 23 to 26 - June 24 to 27
Took a longer break.
Day 27 - June 28
I was able to get a cast bar showing up by sending SMSG_SPELL_START after receiving the cast spell packet.
The projectile effect took a bit longer to figure out. I needed to send a SMSG_SPELL_GO after the cast was complete, with the proper target guids.
Links:
- https://gtker.com/wow_messages/docs/smsg_spell_start.html
- https://gtker.com/wow_messages/docs/smsg_spell_go.html
Day 28 - June 29
I got self-cast spells working by setting the target guid to the player’s guid.
Day 29 - June 30
Another break.
Day 30 - July 1
Since I had spells somewhat working, next I had to clean up the implementation. I dispatched the SMSG_SPELL_START and SMSG_SPELL_GO packets to nearby players and fixed spell cancelling.
Day 31 - July 2
I added levels to mobs, random from their minimum to maximum level, previously hardcoded 1. Then I made spells do some hardcoded damage, so mobs could die. Mobs would still randomly change orientation when dead, so added a check to only move if alive.
That seemed like a good stopping point and was one month since I started writing code for the project.
Future Plans
I’ll slowly work on this, adding more functionality as I go. My goal isn’t a 1:1 Vanilla server, but more something that fits well with Elixir’s capabilities, so I don’t plan on accepting limitations for the sake of accuracy or similar. I’d like to see how many players this approach can handle and how it compares in performance to other implementations eventually too.
Some things on the list:
- proper mob + player stats
- proper damage calculations
- pvp
- quests
- mob movement + combat ai
- loot + inventory management
- more spells + effects
- tons of refactoring
- benchmarking
- gameplay loop, in general
So still plenty more work to do. :)
Thanks to all the projects I’ve referenced for this, most of which I’ve tried to link here.
I wouldn’t have gotten very far without them and their awesome documentation.
Comments
VS Code or Cursor Broken for Elixir and possible fixes



Additionally, be aware that when you change your version or Elixir or Erlang it will need to rebuild the full PLT. So you will encounter this process again.
/Users/dood/.vscode/extensions/jakebecker.elixir-ls-0.2.25/elixir-ls-release/language_server.sh: line 18: elixir: command not found [Info - 1:13:04 PM] Connection to server got closed. Server will restart. /Users/dood/.vscode/extensions/jakebecker.elixir-ls-0.2.25/elixir-ls-release/language_server.sh: line 18: elixir: command not found [Info - 1:13:04 PM] Connection to server got closed. Server will restart. /Users/dood/.vscode/extensions/jakebecker.elixir-ls-0.2.25/elixir-ls-release/language_server.sh: line 18: elixir: command not found [Info - 1:13:04 PM] Connection to server got closed. Server will restart. /Users/dood/.vscode/extensions/jakebecker.elixir-ls-0.2.25/elixir-ls-release/language_server.sh: line 18: elixir: command not found [Info - 1:13:04 PM] Connection to server got closed. Server will restart. /Users/dood/.vscode/extensions/jakebecker.elixir-ls-0.2.25/elixir-ls-release/language_server.sh: line 18: elixir: command not found [Error - 1:13:04 PM] Connection to server got closed. Server will not be restarted.
asdf reshim elixir <your-version>
Erlang not installed
asdf: No version set for command erl you might want to add one of the following in your .tool-versions file:erlang 21.0.7 erlang 21.1.3 erlang 22.0.7 erlang 19.3.6.9 erlang 22.1.7 [Info - 6:08:00 AM] Connection to server got closed. Server will restart.
Version Mismatch
Started ElixirLS Elixir version: "1.7.4 (compiled with Erlang/OTP 19)" Erlang version: "22" Compiling with Mix env test [Info - 7:41:33 AM] Compile took 224 milliseconds ...
Elixir umbrella projects
How did you launch VS Code?
code .
This opens the current directory in VS Code. This helps isolate environment issues. If it works when launching VS Code from the terminal, consider always doing it that way or look into fixing your environment setup.
ElixirLS stopped working
What happened?
mix do clean, compile
- https://github.com/srcrip/live_toast A beautiful drop-in replacement for the Phoenix Flash system.
- https://github.com/sezaru/flashy Flashy is a small library that extends LiveView's flash support to function and live components.
UI
- https://surface-ui.org/ - git - A server-side rendering component library for Phoenix Framework
mix cmd --app child_app_name mix test --color
Specific file/line
mix cmd --app child_app_name mix test test/child_app_name_nice_test.exs:69 --color
We can also define a Mix task to make our life easier!
def aliases do [ child_test: "cmd --app child_app_name mix test --color" ] end
Then you call
mix child_test
mix child_test test/child_app_name_nice_test.exs:69
- Passwords obtained from previous breaches
- Context-specific words, such as the name of the service, the username, and derivatives thereof
- Repetitive or sequential characters
- Dictionary words
Passwords obtained from previous breaches
def deps do [ ... {:ex_pwned, "~> 0.1.0"} ] end
defmodule MyApp.Users.User do use Ecto.Schema use Pow.Ecto.Schema# ...
def changeset(user_or_changeset, attrs) do user_or_changeset |> pow_changeset(attrs) |> validate_password_breach() end
defp validate_password_breach(changeset) do Ecto.Changeset.validate_change(changeset, :password, fn :password, password -> case password_breached?(password) do true -> [password: "has appeared in a previous breach"] false -> [] end end) end
defp password_breached?(password) do case Mix.env() do :test -> false _any -> ExPwned.password_breached?(password) end end end
Context-specific words, such as the name of the service, the username, and derivatives thereof
defmodule MyApp.Users.User do use Ecto.Schema use Pow.Ecto.Schema# ...
def changeset(user_or_changeset, attrs) do user_or_changeset |> pow_changeset(attrs) |> validate_password_no_context() end
@app_name "My Demo App"
defp validate_password_no_context(changeset) do Ecto.Changeset.validate_change(changeset, :password, fn :password, password -> [ @app_name, String.downcase(@app_name), Ecto.Changeset.get_field(changeset, :email), Ecto.Changeset.get_field(changeset, :username) ] |> Enum.reject(&is_nil/1) |> similar_to_context?(password) |> case do true -> [password: "is too similar to username, email or #{@app_name}"] false -> [] end end) end
def similar_to_context?(contexts, password) do Enum.any?(contexts, &String.jaro_distance(&1, password) > 0.85) end end
Repetitive or sequential characters
defmodule MyApp.Users.User do use Ecto.Schema use Pow.Ecto.Schema# ...
def changeset(user_or_changeset, attrs) do user_or_changeset |> pow_changeset(attrs) |> validate_password() end
defp validate_password(changeset) do changeset |> validate_no_repetitive_characters() |> validate_no_sequential_characters() end
defp validate_no_repetitive_characters(changeset) do Ecto.Changeset.validate_change(changeset, :password, fn :password, password -> case repetitive_characters?(password) do true -> [password: "has repetitive characters"] false -> [] end end) end
defp repetitive_characters?(password) when is_binary(password) do password |> String.to_charlist() |> repetitive_characters?() end defp repetitive_characters?([c, c, c | _rest]), do: true defp repetitive_characters?([_c | rest]), do: repetitive_characters?(rest) defp repetitive_characters?([]), do: false
defp validate_no_sequential_characters(changeset) do Ecto.Changeset.validate_change(changeset, :password, fn :password, password -> case sequential_characters?(password) do true -> [password: "has sequential characters"] false -> [] end end) end
@sequences ["01234567890", "abcdefghijklmnopqrstuvwxyz"] @max_sequential_chars 3
defp sequential_characters?(password) do Enum.any?(@sequences, &sequential_characters?(password, &1)) end
defp sequential_characters?(password, sequence) do max = String.length(sequence) - 1 - @max_sequential_chars
Enum<strong>.</strong>any?(0<strong>..</strong>max, <strong>fn</strong> x <strong>-></strong> pattern <strong>=</strong> String<strong>.</strong>slice(sequence, x, @max_sequential_chars <strong>+</strong> 1) String<strong>.</strong>contains?(password, pattern) <strong>end</strong>)
end end
Dictionary words
defmodule MyApp.Users.User do use Ecto.Schema use Pow.Ecto.Schema# ...
def changeset(user_or_changeset, attrs) do user_or_changeset |> pow_changeset(attrs) |> validate_password_dictionary() end
defp validate_password_dictionary(changeset) do Ecto.Changeset.validate_change(changeset, :password, fn :password, password -> password |> String.downcase() |> password_in_dictionary?() |> case do true -> [password: "is too common"] false -> [] end end) end
:my_app |> :code.priv_dir() |> Path.join("dictionary.txt") |> File.stream!() |> Stream.map(&String.trim/1) |> Stream.each(fn password -> defp password_in_dictionary?(unquote(password)), do: true end) |> Stream.run()
defp password_in_dictionary?(_password), do: false end
Require users to change weak password upon sign in
def check_password(conn, _opts) do changeset = MyApp.Users.User.changeset(%MyApp.Users.User{}, conn.params["user"])case changeset.errors[:password] do nil -> conn
error <strong>-></strong> msg <strong>=</strong> MyAppWeb<strong>.</strong>ErrorHelpers<strong>.</strong>translate_error(error) conn <strong>|></strong> put_flash(:error, "You have to reset your password because it #{msg}") <strong>|></strong> redirect(to: Routes<strong>.</strong>pow_reset_password_reset_password_path(conn, :new)) <strong>|></strong> Plug<strong>.</strong>Conn<strong>.</strong>halt()
end end
Conclusion
Unit tests
defmodule MyApp.Users.UserTest do use MyApp.DataCasealias MyApp.Users.User
test "changeset/2 validates context-specific words" do for invalid <- ["my demo app", "mydemo_app", "mydemoapp1"] do changeset = User.changeset(%User{}, %{"username" => "john.doe", "password" => invalid}) assert changeset.errors[:password] == {"is too similar to username, email or My Demo App", []} end
<em># The below is for email user id</em> changeset <strong>=</strong> User<strong>.</strong>changeset(%User{}, %{"email" <strong>=></strong> "john.doe@example.com", "password" <strong>=></strong> "password12"}) refute changeset<strong>.</strong>errors[:password] for invalid <strong><-</strong> ["john.doe@example.com", "johndoeexamplecom"] <strong>do</strong> changeset <strong>=</strong> User<strong>.</strong>changeset(%User{}, %{"email" <strong>=></strong> "john.doe@example.com", "password" <strong>=></strong> invalid}) assert changeset<strong>.</strong>errors[:password] <strong>==</strong> {"is too similar to username, email or My Demo App", []} <strong>end</strong> <em># The below is for username user id</em> changeset <strong>=</strong> User<strong>.</strong>changeset(%User{}, %{"username" <strong>=></strong> "john.doe", "password" <strong>=></strong> "password12"}) refute changeset<strong>.</strong>errors[:password] for invalid <strong><-</strong> ["john.doe00", "johndoe", "johndoe1"] <strong>do</strong> changeset <strong>=</strong> User<strong>.</strong>changeset(%User{}, %{"username" <strong>=></strong> "john.doe", "password" <strong>=></strong> invalid}) assert changeset<strong>.</strong>errors[:password] <strong>==</strong> {"is too similar to username, email or My Demo App", []} <strong>end</strong>
end
test "changeset/2 validates repetitive and sequential password" do changeset = User.changeset(%User{}, %{"password" => "secret1222"}) assert changeset.errors[:password] == {"has repetitive characters", []}
changeset <strong>=</strong> User<strong>.</strong>changeset(%User{}, %{"password" <strong>=></strong> "secret1223"}) refute changeset<strong>.</strong>errors[:password] changeset <strong>=</strong> User<strong>.</strong>changeset(%User{}, %{"password" <strong>=></strong> "secret1234"}) assert changeset<strong>.</strong>errors[:password] <strong>==</strong> {"has sequential characters", []} changeset <strong>=</strong> User<strong>.</strong>changeset(%User{}, %{"password" <strong>=></strong> "secret1235"}) refute changeset<strong>.</strong>errors[:password] changeset <strong>=</strong> User<strong>.</strong>changeset(%User{}, %{"password" <strong>=></strong> "secretefgh"}) assert changeset<strong>.</strong>errors[:password] <strong>==</strong> {"has sequential characters", []} changeset <strong>=</strong> User<strong>.</strong>changeset(%User{}, %{"password" <strong>=></strong> "secretafgh"}) refute changeset<strong>.</strong>errors[:password]
end end
https://github.com/open-telemetry/opentelemetry-erlang-contrib
What is OpenTelemetry?
OpenTelemetry is an open-source observability framework that provides a vendor-agnostic way to instrument applications for monitoring and tracing purposes. The framework offers a range of components and libraries that can be used to collect and report metrics, logs, and traces from applications. By leveraging OpenTelemetry, developers can gain a deeper understanding of their application's behavior, identify performance bottlenecks, and make data-driven decisions to optimize their systems.
OpenTelemetry in Erlang and Elixir
The OpenTelemetry project provides a range of libraries and tools for Erlang and Elixir developers to instrument their applications. The open-telemetry-erlang-contrib repository, available on GitHub, offers a set of pre-built instrumentation libraries for Erlang and Elixir. These libraries provide a simple and easy-to-use interface for collecting and reporting metrics, logs, and traces from Erlang and Elixir applications.
Benefits of Using OpenTelemetry in Erlang and Elixir
There are several benefits to using OpenTelemetry in Erlang and Elixir applications. Some of the key advantages include:
- Improved Monitoring and Troubleshooting: By leveraging OpenTelemetry, developers can gain a deeper understanding of their application's behavior and identify performance bottlenecks more easily.
- Increased Visibility: OpenTelemetry provides a vendor-agnostic way to collect and report metrics, logs, and traces from applications, giving developers a more complete picture of system performance.
- Better Data-Driven Decision Making: By leveraging OpenTelemetry, developers can make data-driven decisions to optimize their systems and improve overall performance.
Getting Started with OpenTelemetry in Erlang and Elixir
To get started with OpenTelemetry in Erlang and Elixir, developers can follow these steps:
- Install the OpenTelemetry library: Install the OpenTelemetry library for Erlang and Elixir using the instructions provided in the open-telemetry-erlang-contrib repository.
- Instrument Your Application: Use the OpenTelemetry library to instrument your Erlang and Elixir applications and collect metrics, logs, and traces.
- Configure OpenTelemetry: Configure OpenTelemetry to report metrics, logs, and traces to a compatible backend, such as a monitoring system or logging platform.
https://github.com/tompave/fun_with_flags
Introduction to Feature Flags in Elixir
Elixir is a dynamic, functional language designed for building scalable and maintainable applications. One of the key aspects of developing such applications is the ability to quickly and safely introduce new features. This is where feature flags come into play. Feature flags, also known as feature toggles, allow developers to enable or disable features in their application without requiring a full deployment. In this article, we'll explore how to use feature flags in Elixir with the fun_with_flags
library.
The fun_with_flags
library provides a simple and elegant way to manage feature flags in your Elixir application. With its easy-to-use API, you can quickly integrate feature flags into your development workflow. To get started, simply add fun_with_flags
to your mix.exs
file and run mix deps.get
.
Using Fun With Flags
Using fun_with_flags
is straightforward. Once you've added the library to your project, you can start defining your feature flags. For example, let's say you want to introduce a new feature that allows users to upload profile pictures. You can define a feature flag for this feature like so: FunWithFlags.enable(:profile_picture_uploads)
. You can then use this feature flag in your application code to conditionally enable or disable the feature.
For more information on how to use fun_with_flags
, check out the library's GitHub page. The documentation provides a comprehensive guide on how to get started with feature flags in your Elixir application.
Best Practices for Feature Flags
When using feature flags, it's essential to follow best practices to ensure they are used effectively. One key best practice is to keep your feature flags organized. This can be achieved by using a consistent naming convention for your feature flags. Another best practice is to ensure that your feature flags are properly tested. This can be done by writing tests that cover the different scenarios in which the feature flag is used.
By following these best practices and using fun_with_flags
, you can safely and quickly introduce new features into your Elixir application. Remember to always keep your feature flags organized and properly tested to ensure they are used effectively.
As we've seen, feature flags are a powerful tool for introducing new features into your Elixir application. By using fun_with_flags
, you can quickly and safely manage feature flags in your application. To learn more about fun_with_flags
and how to use it in your Elixir project, be sure to check out the library's GitHub page. With the right tools and best practices, you can take your Elixir development to the next level.
https://github.com/edgurgel/mimic
Mimic is a powerful mocking library for Elixir, designed to make testing easier and more efficient. This library provides a robust set of features for creating and managing mock objects, allowing developers to isolate dependencies and focus on testing specific components of their application.
Features of edgurgel/mimic
edgurgel/mimic offers a range of features that make it an ideal choice for Elixir developers. Some of the key features include:
1. Easy Mock Object Creation
edgurgel/mimic provides a simple and intuitive API for creating mock objects. With just a few lines of code, you can create a mock object that mimics the behavior of a real object, allowing you to test specific scenarios without affecting the rest of your application.
2. Flexible Mocking Options
edgurgel/mimic offers a range of mocking options, including stubbing, mocking, and spying. This flexibility allows you to choose the approach that best fits your testing needs, making it easier to write effective tests.
3. Support for Elixir's Built-in Testing Framework
edgurgel/mimic is designed to work seamlessly with Elixir's built-in testing framework, making it easy to integrate into your existing testing workflow.
Example Use Cases
edgurgel/mimic is a versatile library that can be used in a variety of scenarios. Here are a few example use cases to get you started:
1. Testing a Service Layer
Suppose you have a service layer that depends on a database connection. You can use edgurgel/mimic to create a mock database connection and test the service layer without affecting the real database.
2. Testing a Controller
Similarly, you can use edgurgel/mimic to create mock objects for controllers, allowing you to test the controller's behavior without affecting the underlying dependencies.
Conclusion
edgurgel/mimic is a powerful mocking library for Elixir that makes testing easier and more efficient. With its flexible features and seamless integration with Elixir's built-in testing framework, it's an ideal choice for developers looking to take their testing to the next level.
Visit the official edgurgel/mimic repository to learn more about this library and start using it in your projects today.
External Resource
Action Points
- Add edgurgel/mimic to your project's dependencies
- Start using edgurgel/mimic to create mock objects and test your application
- Explore the official repository for more information and examples