Red Team testing is designed to measure and improve an organisation’s ability to detect and respond to realistic threats. In a recent Threat Intelligence-Based Ethical Red-Team (TIBER) test, the mnemonic Red Team built and deployed a custom web shell to emulate the operations of a specific threat actor.

This blog series has three goals:

Part 1/3 - Explain how web shells fit into Red Team and TIBER testing.
Part 2/3 - Show how advanced threat actor tradecraft can be replicated in a controlled test through a custom web shell implementation.
Part 3/3 - Provide practical guidance on how to prevent and detect web shells.

Target audience: Detection engineers, Purple and Red Team operators, security practitioners.

Figure – A web shell provides an operator with a way to execute commands (input) and receive its results (output) on a target system.

Prevalence in real incidents

Incident reporting from both government agencies and commercial security vendors consistently shows web shells as a recurring feature of cyber-intrusions. Notable examples include the Microsoft Exchange exploitation campaigns of 2021, where web shells were deployed at scale to maintain access after the initial compromise. Similar patterns have been seen in intrusions involving custom IIS modules used as persistent backdoors.

At mnemonic, we have handled incidents where Nation State Groups (NSGs) have deployed multiple, advanced web shells not previously seen in the wild. In one such incident, a well-known NSG had deployed multiple, custom web shell components. This provided us with access to their source code and allowed us to analyse their tradecraft and defence evasion mechanisms to understand how they used it. We saw a significantly different level of attacker sophistication in this tradecraft compared to what we had seen in the past.

Why attackers still rely on web shells

Despite improvements in endpoint and network detection, web shells offer attackers several advantages:

Prerequisites: All a threat actor needs to deploy a web shell is a file upload vulnerability, which is typically easier to discover than a direct remote code execution (RCE) vulnerability. Web shells are therefore often used to gain RCE from a file upload vulnerability.
Low operational cost: If an attacker can upload files, they can drop a web shell as a single file. The web server executes it directly, so no extra components are required.
Stealth: They leverage the existing web server process and network ports, blending into legitimate HTTP/S traffic.
Resilience: Even if other malware such as a command-and-control (C2) agent is removed, a well-hidden web shell can remain operational even after the underlying vulnerability is fixed. A web shell may allow a threat actor to re-establish themselves in the environment if they are detected elsewhere.
Flexibility: Custom web shells can execute a variety of tasks, like file manipulation, process execution, and tunnelling, with little change to their footprint.

Web shells in TIBER testing

What spawned this research for us was a TIBER test. One test scenario involved simulating an advanced nation-state threat actor. This threat actor was known for compromising Internet-facing servers and maintaining long-term access through advanced, custom web shells. Reproducing that level of tradecraft in a controlled test meant designing our own implementation, balancing realism with the need to maximise learning value for the defenders.

Limitations of common web shell approaches

Organisations and their defenders may think of web shells as a solved problem: run a scan, match a signature, delete the file. In practice, that approach only works against commodity type web shells. Off-the-shelf web shells like China Chopper, Godzilla, or simple PHP/ASP scripts are noisy, predictable, and widely covered in both open-source and commercial detection rules. They are useful for training and basic detection validation, but they do not represent the tooling used by advanced threat actors.

From a Red Team perspective, common web shell approaches have several drawbacks when the goal is to realistically emulate a capable threat actor:

Signature coverage: Commodity web shells have known static strings, predictable HTTP patterns, and consistent file structures. These attributes make them easy to detect using static signatures.
Operational noise: They execute commands in ways that leave obvious logs, command-line history, web server access logs, or abnormal child processes. For example, the most common cmd.aspx type web shell spawns cmd.exe for every command it runs, making it trivial to spot in process monitoring and child-parent process relationships.
Limited protocol flexibility: Most off-the-shelf web shells rely on simple HTTP POST/GET request structures that may stand out in network traffic analysis.
Lack of environment tailoring: Real threat actors often modify their web shells to match the target’s context and naming conventions, reducing the chance of detection. Commodity web shells do not address such concerns.
Minimal operational security controls: Few web shells implement authentication, request obfuscation or other features that make detection harder.

Consequently, for our TIBER test, deploying a well-known, easily fingerprinted web shell risks would give the defenders an easy win that says little about their ability to detect and respond to a real, stealthy web shell foothold. Raising the bar required our web shell to behave and communicate like one designed by a determined, well-resourced threat actor.

Figure - The typical cmd.aspx web shell that simply spawns cmd.exe for each command the operator executes.

Why web shells matter in TIBER tests

TIBER testing is about more than proving that a firewall blocks a scan or that EDR can catch commodity malware. It is about assessing whether an organisation can withstand realistic, high-end threat actor behaviour. Since web shells are among the most common tools advanced threat actors use to maintain persistence on compromised servers, they fit naturally into a TIBER test scenario.

Advanced threat actors regularly deploy custom web shells for covert persistence. If we rely only on well-known tools, the test will measure whether defenders can catch noisy, outdated techniques, not whether they can detect the techniques actually used in modern intrusions. Integrating a stealthy, threat-intelligence-driven web shell into a TIBER engagement enabled us to:

Pressure-test the monitoring of Internet-facing infrastructure, which often receives less scrutiny than endpoints.
Expose blind spots in how defenders inspect HTTP(S) traffic, application logs, and server processes.
Provide IT leadership with evidence of how a single overlooked foothold could enable long-term persistence.

Leg-up vs. exploitation

In the TIBER test scenario that spawned this research, we had not yet breached the perimeter. The actor we were emulating typically compromises Internet-facing IIS servers and plants web shells. Consequently, that became our objective. After weeks of attempts on a very small external scope with only a few relevant IIS hosts, we had not found any viable vulnerability to exploit. The first leg-up we proposed and received included some information about backend file paths that let us explore previously undiscovered endpoints, but this also led nowhere.

Given the threat intelligence and prior incidents showing threat actors breach web servers and deploy web shells, we chose to propose a leg-up: having the Control Team deploy a web shell on our behalf. In TIBER tests, all leg-ups must be approved by the TIBER Cyber Team (TCT), which is responsible for overseeing the test. This leg-up aligned closely enough with what a real threat actor could achieve with more time and resources, so we decided to request it. With this reasoning, the TCT approved the leg-up and had the Control Team deploy a web shell for us to use.

Figure: Example of using leg-up to deploy a web shell in order to progress a TIBER test scenario.

If your immediate reaction is, “but that is cheating,” understand that the goal of a TIBER test is not to prove zero-day hunting skills or to replicate an actor’s playbook line for line, but to maximise organisational learning. Exploring the consequences of a web shell foothold produced far more value than burning the remaining test time on unlikely vulnerabilities or restarting the intrusion process with another approach. Testing the organisation’s detection and response to a realistic web shell compromise of an Internet-facing asset is not only valuable, but often neglected in offensive security testing. We designed a novel, stealthy web shell with capabilities similar to those of nation-state groups, and used it to continue the test scenario. This highlights how the leg-up process can be used to enable meaningful test coverage, even at the cost of some realism.

Conclusion

Web shells remain one of the most effective ways for attackers to maintain long-term access in targeted intrusions. Given a vulnerability, they are easy to deploy, flexible to use, and capable of persisting even after other malware is removed. For defenders, incorporating realistic web shell scenarios into Red and Purple Team testing is a great way to address a common blind spot before its exploited in a real incident.

In the TIBER attack scenario we tested, the web shell we made was installed using a leg-up as opposed to us exploiting a vulnerability. This demonstrated that leg-ups are not shortcuts, but a means of creating realistic conditions that defenders must be prepared for. By requesting a web shell installation as a leg-up, we were able to mirror a threat actor’s likely outcome even without a vulnerability, and test how the organisation responded to it. The key lesson is that TIBER tests do not need to involve zero-day vulnerabilities to be valuable, they just need to provide opportunities for detecting advanced threats in realistic threat scenarios.

In part two we will go deeper into how we built a web shell to mirror a specific threat actor’s tradecraft.

Web shells in TIBER testing: Part 1

Prevalence in real incidents

Why attackers still rely on web shells

Web shells in TIBER testing

Limitations of common web shell approaches

Why web shells matter in TIBER tests

Leg-up vs. exploitation

Conclusion

Questions?

Receive updates on security threats and upcoming seminars

Web shells in TIBER testing: Part 1

Written by:

Prevalence in real incidents

Why attackers still rely on web shells

Web shells in TIBER testing

Limitations of common web shell approaches

Why web shells matter in TIBER tests

Leg-up vs. exploitation

Conclusion

Questions?