osquery源码解读之分析shell_history

说明

前面两篇主要是对osquery的使用进行了说明,本篇文章将会分析osquery的源码。本文将主要对shell_historyprocess_open_sockets两张表进行说明。通过对这些表的实现分析,一方面能够了解osquery的实现通过SQL查询系统信息的机制,另一方面可以加深对Linux系统的理解。

表的说明

shell_history是用于查看shell的历史记录,而process_open_sockets是用于记录主机当前的网络行为。示例用法如下:
shell_history

1
2
3
4
5
6
7
8
osquery> select * from shell_history limit 3;
+------+------+-------------------------------------------------------------------+-----------------------------+
| uid | time | command | history_file |
+------+------+-------------------------------------------------------------------+-----------------------------+
| 1000 | 0 | pwd | /home/username/.bash_history |
| 1000 | 0 | ps -ef | /home/username/.bash_history |
| 1000 | 0 | ps -ef | grep java | /home/username/.bash_history |
+------+------+-------------------------------------------------------------------+-----------------------------+

process_open_socket显示了一个反弹shell的链接。

1
2
3
4
5
6
osquery> select * from process_open_sockets order by pid desc limit 1;
+--------+----+----------+--------+----------+---------------+----------------+------------+-------------+------+------------+---------------+
| pid | fd | socket | family | protocol | local_address | remote_address | local_port | remote_port | path | state | net_namespace |
+--------+----+----------+--------+----------+---------------+----------------+------------+-------------+------+------------+---------------+
| 115567 | 3 | 16467630 | 2 | 6 | 192.168.2.142 | 192.168.2.143 | 46368 | 8888 | | ESTABLISH | 0 |
+--------+----+----------+--------+----------+---------------+----------------+------------+-------------+------+------------+---------------+

osquery整体的代码结构十分地清晰。所有表的定义都是位于specs下面,所有表的实现都是位于osquery/tables

我们以shell_history为例,其表的定义是在specs/posix/shell_history.table

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
table_name("shell_history")
description("A line-delimited (command) table of per-user .*_history data.")
schema([
Column("uid", BIGINT, "Shell history owner", additional=True),
Column("time", INTEGER, "Entry timestamp. It could be absent, default value is 0."),
Column("command", TEXT, "Unparsed date/line/command history line"),
Column("history_file", TEXT, "Path to the .*_history for this user"),
ForeignKey(column="uid", table="users"),
])
attributes(user_data=True, no_pkey=True)
implementation("shell_history@genShellHistory")
examples([
"select * from users join shell_history using (uid)",
])
fuzz_paths([
"/home",
"/Users",
])s

shell_history.table中已经定义了相关的信息,入口是shell_history.cpp中的genShellHistory()函数,甚至给出了示例的SQL语句select * from users join shell_history using (uid)shell_history.cpp是位于osquery/tables/system/posix/shell_history.cpp中。同理,process_open_sockets的表定义位于specs/process_open_sockets.table,实现位于osquery/tables/networking/[linux|freebsd|windows]/process_open_sockets.cpp。可以看到由于process_open_sockets在多个平台上面都有,所以在linux/freebsd/windows中都存在process_open_sockets.cpp的实现。本文主要是以linux为例。

shell_history实现

前提知识

在分析之前,介绍一下Linux中的一些基本概念。我们常常会看到各种不同的unix shell,如bashzshtcshsh等等。bash是我们目前最常见的,它几乎是所有的类unix操作中内置的一个shell。而zsh相对于bash增加了更多的功能。我们在终端输入各种命令时,其实都是使用的这些shell
我们在用户的根目录下方利用ls -all就可以发现存在.bash_history文件,此文件就记录了我们在终端中输入的所有的命令。同样地,如果我们使用zsh,则会存在一个.zsh_history记录我们的命令。

同时在用户的根目录下还存在.bash_sessions的目录,根据这篇文章的介绍:

A new folder (~/.bash_sessions/) is used to store HISTFILE’s and .session files that are unique to sessions. If $BASH_SESSION or $TERM_SESSION_ID is set upon launching the shell (i.e. if Terminal is resuming from a saved state), the associated HISTFILE is merged into the current one, and the .session file is ran. Session saving is facilitated by means of an EXIT trap being set for a function bash_update_session_state.

.bash_sessions中存储了特定SESSION的HISTFILE和.session文件。如果在启动shell时设置了$BASH_SESSION$TERM_SESSION_ID。当此特定的SESSION启动了之后就会利用$BASH_SESSION$TERM_SESSION_ID恢复之前的状态。这也说明在.bash_sessions目录下也会存在*.history用于记录特定SESSION的历史命令信息。

分析

分析shell_history.cpp的入口函数genShellHistory():

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
QueryData genShellHistory(QueryContext& context) {
QueryData results;
// Iterate over each user
QueryData users = usersFromContext(context);
for (const auto& row : users) {
auto uid = row.find("uid");
auto gid = row.find("gid");
auto dir = row.find("directory");
if (uid != row.end() && gid != row.end() && dir != row.end()) {
genShellHistoryForUser(uid->second, gid->second, dir->second, results);
genShellHistoryFromBashSessions(uid->second, dir->second, results);
}
}

return results;
}

遍历所有的用户,拿到uidgiddirectory。之后调用genShellHistoryForUser()获取用户的shell记录genShellHistoryFromBashSessions()genShellHistoryForUser()作用类似。

genShellHistoryForUser():

1
2
3
4
5
6
7
8
9
10
11
12
13
void genShellHistoryForUser(const std::string& uid, const std::string& gid, const std::string& directory, QueryData& results) {
auto dropper = DropPrivileges::get();
if (!dropper->dropTo(uid, gid)) {
VLOG(1) << "Cannot drop privileges to UID " << uid;
return;
}

for (const auto& hfile : kShellHistoryFiles) {
boost::filesystem::path history_file = directory;
history_file /= hfile;
genShellHistoryFromFile(uid, history_file, results);
}
}

可以看到在执行之前调用了:

1
2
3
4
5
auto dropper = DropPrivileges::get();
if (!dropper->dropTo(uid, gid)) {
VLOG(1) << "Cannot drop privileges to UID " << uid;
return;
}

用于对giduid降权,为什么要这么做呢?后来询问外国网友,给了一个很详尽的答案:

Think about a scenario where you are a malicious user and you spotted a vulnerability(buffer overflow) which none of us has. In the code (osquery which is running usually with root permission) you also know that history files(controlled by you) are being read by code(osquery). Now you stored a shell code (a code which is capable of destroying anything in the system)such a way that it would overwrite the saved rip. So once the function returns program control is with the injected code(shell code) with root privilege. With dropping privilege you reduce the chance of putting entire system into danger.
There are other mitigation techniques (e.g. stack guard) to avoid above scenario but multiple defenses are required

简而言之,osquery一般都是使用root权限运行的,如果攻击者在.bash_history中注入了一段恶意的shellcode代码。那么当osquery读到了这个文件之后,攻击者就能够获取到root权限了,所以通过降权的方式就能够很好地避免这样的问题。

1
2
3
4
5
6
7
/**
* @brief The privilege/permissions dropper deconstructor will restore
* effective permissions.
*
* There should only be a single drop of privilege/permission active.
*/
virtual ~DropPrivileges();

可以看到当函数被析构之后,就会重新恢复对应文件的权限。
之后遍历kShellHistoryFiles文件,执行genShellHistoryFromFile()代码。kShellHistoryFiles在之前已经定义,内容是:

1
2
3
const std::vector<std::string> kShellHistoryFiles = {
".bash_history", ".zsh_history", ".zhistory", ".history", ".sh_history",
};

可以发现其实在kShellHistoryFiles定义的就是常见的bash用于记录shell history目录的文件。最后调用genShellHistoryFromFile()读取.history文件,解析数据。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
void genShellHistoryFromFile(const std::string& uid, const boost::filesystem::path& history_file, QueryData& results) {
std::string history_content;
if (forensicReadFile(history_file, history_content).ok()) {
auto bash_timestamp_rx = xp::sregex::compile("^#(?P<timestamp>[0-9]+)$");
auto zsh_timestamp_rx = xp::sregex::compile("^: {0,10}(?P<timestamp>[0-9]{1,11}):[0-9]+;(?P<command>.*)$");
std::string prev_bash_timestamp;
for (const auto& line : split(history_content, "\n")) {
xp::smatch bash_timestamp_matches;
xp::smatch zsh_timestamp_matches;

if (prev_bash_timestamp.empty() &&
xp::regex_search(line, bash_timestamp_matches, bash_timestamp_rx)) {
prev_bash_timestamp = bash_timestamp_matches["timestamp"];
continue;
}

Row r;

if (!prev_bash_timestamp.empty()) {
r["time"] = INTEGER(prev_bash_timestamp);
r["command"] = line;
prev_bash_timestamp.clear();
} else if (xp::regex_search(
line, zsh_timestamp_matches, zsh_timestamp_rx)) {
std::string timestamp = zsh_timestamp_matches["timestamp"];
r["time"] = INTEGER(timestamp);
r["command"] = zsh_timestamp_matches["command"];
} else {
r["time"] = INTEGER(0);
r["command"] = line;
}

r["uid"] = uid;
r["history_file"] = history_file.string();
results.push_back(r);
}
}
}

整个代码逻辑非常地清晰。

  1. forensicReadFile(history_file, history_content)读取文件内容。
  2. 定义bash_timestamp_rxzsh_timestamp_rx的正则表达式,用于解析对应的.history文件的内容。
  3. for (const auto& line : split(history_content, "\n"))读取文件的每一行,分别利用bash_timestamp_rxzsh_timestamp_rx解析每一行的内容。
  4. Row r;...;r["history_file"] = history_file.string();results.push_back(r);将解析之后的内容写入到Row中返回。

自此就完成了shell_history的解析工作。执行select * from shell_history就会按照上述的流程返回所有的历史命令的结果。

对于genShellHistoryFromBashSessions()函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
void genShellHistoryFromBashSessions(const std::string &uid,const std::string &directory,QueryData &results) {
boost::filesystem::path bash_sessions = directory;
bash_sessions /= ".bash_sessions";

if (pathExists(bash_sessions)) {
bash_sessions /= "*.history";
std::vector <std::string> session_hist_files;
resolveFilePattern(bash_sessions, session_hist_files);

for (const auto &hfile : session_hist_files) {
boost::filesystem::path history_file = hfile;
genShellHistoryFromFile(uid, history_file, results);
}
}
}

genShellHistoryFromBashSessions()获取历史命令的方法比较简单。

  1. 获取到.bash_sessions/*.history所有的文件;
  2. 同样调用genShellHistoryFromFile(uid, history_file, results);方法获取到历史命令;

总结

阅读一些优秀的开源软件的代码,不仅能够学习到相关的知识更能够了解到一些设计哲学。
拥有快速学习能⼒的⽩帽子,是不能有短板的。有的只是⼤量的标准板和⼏块长板。

以上