Fuzzing Like A Caveman

  1. Fuzzing Like A Caveman:利用Python写了个简单的fuzzer,主要包括bit翻转和魔法数字替换两种fuzz方法
  2. Fuzzing Like A Caveman 2: Improving Performance 改进了fuzzer一些方法,讲述了[cProfile](<https://docs.python.org/zh-cn/3/library/profile.html#module-cProfile>)评估Python程序性能,利用popen改进了原有的执行目标程序方法,又用C++和C重写了Fuzzer
  3. Fuzzing Like A Caveman 3: Trying to Somewhat Understand The Importance Code Coverage
  4. Fuzzing Like A Caveman 4: Snapshot/Code Coverage Fuzzer!
  5. Fuzzing Like A Caveman 5: A Code Coverage Tour for Cavepeople

C++版本Fuzzer解释

//
// this function simply creates a stream by opening a file in binary mode;
// finds the end of file, creates a string 'data', resizes data to be the same
// size as the file moves the file pointer back to the beginning of the file;
// reads the data from the into the data string;
//
std::string get_bytesstring filename
{
    std::ifstream fin(filename, std::ios::binary);

    if (fin.is_open())
    {
        fin.seekg(0, std::ios::end);
        std::string data;
        data.resize(fin.tellg());
        fin.seekg(0, std::ios::beg);
        fin.read(&data[0], data.size());

        return data;
    }

    else
    {
        std::cout << "Failed to open " << filename << ".n";
        exit(1);
    }

}
//seekg与seekp
//seekp 可用于将信息 put(放入)到文件中,而 seekg 则可用于从文件中 get(获取)信息
//file.seekg(2L, ios::beg);将读取位置设置为从文件开头开始的第 3 个字节(字节 2)
//http://c.biancheng.net/view/1541.html

//tellg与tellp
//tellp 用于返回写入位置,tellg 则用于返回读取位置

//resize调整容器中有效数据区域的尺寸,如果尺寸变小,原来数据多余的截掉。若尺寸变大,不够的数据用该函数第二个参数填充,影响size。

//read 第一个参数是用于接收字节的数组,第二个参数是读多少个字节到第一个参数

PS:就是把filename文件中内容赋值给data的过程,咋这么麻烦

bit_flip函数

//
// this will take 1% of the bytes from our valid jpeg and
// flip a random bit in the byte and return the altered string
//
std::string bit_flipstring data
{

    int size = (data.length() - 4);
    int num_of_flips = (int)(size * .01);

    // get a vector full of 1% of random byte indexes
    std::vector<int> picked_indexes;
    for (int i = 0; i < num_of_flips; i++)
    {
        int picked_index = rand() % size;
        picked_indexes.push_back(picked_index);
    }

    // iterate through the data string at those indexes and flip a bit
    for (int i = 0; i < picked_indexes.size(); ++i)
    {
        int index = picked_indexes[i];
        char current = data.at(index);
        int decimal = ((int)current & 0xff);

        int bit_to_flip = rand() % 8;

        decimal ^= 1 << bit_to_flip;
        decimal &= 0xff;

        data[index] = (char)decimal;
    }

    return data;

}

create_new函数的替代

//
// takes mutated string and creates new jpeg with it;
//
void create_newstring mutated
{
    std::ofstream fout("mutated.jpg", std::ios::binary);

    if (fout.is_open())
    {
        fout.seekp(0, std::ios::beg);
        fout.write(&mutated[0], mutated.size());
    }
    else
    {
        std::cout << "Failed to create mutated.jpg" << ".n";
        exit(1);
    }

}

get_output将传入的字符串当做命令执行

exif函数则是调用get_output函数并且判定输出是否有crash的标志,并做对应的处理动作

//
// function to run a system command and store the output as a string;
// <https://www.jeremymorgan.com/tutorials/c-programming/how-to-capture-the-output-of-a-linux-command-in-c/>
//
std::string get_outputstring cmd
{
    std::string output;
    FILE * stream;
    char buffer[256];

    stream = popen(cmd.c_str(), "r");
    if (stream)
    {
        while (!feof(stream))
            if (fgets(buffer, 256, stream) != NULL) output.append(buffer);
                pclose(stream);
    }

    return output;

}

//
// we actually run our exiv2 command via the get_output() func;
// retrieve the output in the form of a string and then we can parse the string;
// we'll save all the outputs that result in a segfault or floating point except;
//
void exifstring mutated, int counter
{
    std::string command = "exif mutated.jpg -verbose 2>&1";

    std::string output = get_output(command);

    std::string segfault = "Segmentation";
    std::string floating_point = "Floating";

    std::size_t pos1 = output.find(segfault);
    std::size_t pos2 = output.find(floating_point);

    if (pos1 != -1)
    {
        std::cout << "Segfault!n";
        std::ostringstream oss;
        oss << "/root/cppcrashes/crash." << counter << ".jpg";
        std::string filename = oss.str();
        std::ofstream fout(filename, std::ios::binary);

        if (fout.is_open())
            {
                fout.seekp(0, std::ios::beg);
                fout.write(&mutated[0], mutated.size());
            }
        else
        {
            std::cout << "Failed to create " << filename << ".jpg" << ".n";
            exit(1);
        }
    }
    else if (pos2 != -1)
    {
        std::cout << "Floating Point!n";
        std::ostringstream oss;
        oss << "/root/cppcrashes/crash." << counter << ".jpg";
        std::string filename = oss.str();
        std::ofstream fout(filename, std::ios::binary);

        if (fout.is_open())
            {
                fout.seekp(0, std::ios::beg);
                fout.write(&mutated[0], mutated.size());
            }
        else
        {
            std::cout << "Failed to create " << filename << ".jpg" << ".n";
            exit(1);
        }
    }
}

另一种Fuzz的变异方式定义

//
// simply generates a vector of strings that are our 'magic' values;
//
std::vector<std::string> vector_gen()
{
    std::vector<std::string> magic;

    using namespace std::string_literals;

    magic.push_back("xff");
    magic.push_back("x7f");
    magic.push_back("x00"s);
    magic.push_back("xffxff");
    magic.push_back("x7fxff");
    magic.push_back("x00x00"s);
    magic.push_back("xffxffxffxff");
    magic.push_back("x80x00x00x00"s);
    magic.push_back("x40x00x00x00"s);
    magic.push_back("x7fxffxffxff");

    return magic;
}

//
// randomly picks a magic value from the vector and overwrites that many bytes in the image;
//
std::string magicstring data, std::vector<std::string> magic
{

    int vector_size = magic.size();
    int picked_magic_index = rand() % vector_size;
    std::string picked_magic = magic[picked_magic_index];
    int size = (data.length() - 4);
    int picked_data_index = rand() % size;
    data.replace(picked_data_index, magic[picked_magic_index].length(), magic[picked_magic_index]);

    return data;

}

//
// returns 0 or 1;
//
int func_pick()
{
    int result = rand() % 2;

    return result;
}

main函数

int main(int argc, char** argv)
{

    if (argc < 3)
    {
        std::cout << "Usage: ./cppfuzz <valid jpeg> <number_of_fuzzing_iterations>n";
        std::cout << "Usage: ./cppfuzz Canon_40D.jpg 10000n";
        return 1;
    }

    // start timer
    auto start = std::chrono::high_resolution_clock::now();

    // initialize our random seed
    srand((unsigned)time(NULL));

    // generate our vector of magic numbers
    std::vector<std::string> magic_vector = vector_gen();

    std::string filename = argv[1];
    int iterations = atoi(argv[2]);

    int counter = 0;
    while (counter < iterations)
    {

        std::string data = get_bytes(filename);

        int function = func_pick();
        function = 1;
        if (function == 0)
        {
            // utilize the magic mutation method; create new jpg; send to exiv2
            std::string mutated = magic(data, magic_vector);
            create_new(mutated);
            exif(mutated,counter);
            counter++;
        }
        else
        {
            // utilize the bit flip mutation; create new jpg; send to exiv2
            std::string mutated = bit_flip(data);
            create_new(mutated);
            exif(mutated,counter);
            counter++;
        }
    }

    // stop timer and print execution time
    auto stop = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start);
    std::cout << "Execution Time: " << duration.count() << "msn";

    return 0;
}

总结:能学到不少C++开发的技巧,基本都是用的Python,C++基本上都忘完了

C版本Fuzzer优化

主要是利用fork、execvp这些调用来取代popen函数

fork调用的一个奇妙之处就是它仅仅被调用一次,却能够返回两次,它可能有三种不同的返回值:

1)在父进程中,fork返回新创建子进程的进程ID;

2)在子进程中,fork返回0;

3)如果出现错误,fork返回一个负值;

4代码解读

long long unsigned start_addr = 0x555555554b41; //
long long unsigned end_addr = 0x5555555548c0;   //

vuln.bp_addresses[0] = 0x555555554b7d;
vuln.bp_addresses[1] = 0x555555554bb9;

在start和end下了两个断点,在第一个断点的时候读取寄存器的值,去除第一个断点,然后把rip的值-1然后重新赋值给register,

动态断点:下在check2和check3开头

C语言lseek()函数:移动文件的读写位置

头文件:

#include <sys/types.h>  #include <unistd.h>

定义函数:

off_t lseek(**int** fildes, off_t offset, **int** whence);

函数说明:每一个已打开的文件都有一个读写位置, 当打开文件时通常其读写位置是指向文件开头, 若是以附加的方式打开文件(如O_APPEND), 则读写位置会指向文件尾. 当read()或write()时, 读写位置会随之增加,lseek()便是用来控制该文件的读写位置. 参数fildes 为已打开的文件描述词, 参数offset 为根据参数whence来移动读写位置的位移数.

参数 whence 为下列其中一种:

下列是教特别的使用方式:1) 欲将读写位置移到文件开头时:lseek(int fildes, 0, SEEK_SET);2) 欲将读写位置移到文件尾时:lseek(int fildes, 0, SEEK_END);3) 想要取得目前文件位置时:lseek(int fildes, 0, SEEK_CUR);

返回值:当调用成功时则返回目前的读写位置, 也就是距离文件开头多少个字节. 若有错误则返回-1, errno 会存放错误代码.

相关函数:readdir, write, fcntl, close, lseek, readlink, fread

头文件:#include <unistd.h>

定义函数:ssize_t read(int fd, void * buf, size_t count);

函数说明:read()会把参数fd 所指的文件传送count 个字节到buf 指针所指的内存中. 若参数count 为0, 则read()不会有作用并返回0. 返回值为实际读取到的字节数, 如果返回0, 表示已到达文件尾或是无可读取的数据,此外文件读写位置会随读取到的字节移动.

附加说明:

如果顺利 read()会返回实际读到的字节数, 最好能将返回值与参数count 作比较, 若返回的字节数比要求读取的字节数少, 则有可能读到了文件尾、从管道(pipe)或终端机读? ?蛘呤莚ead()被信号中断了读取动作.

当有错误发生时则返回-1, 错误代码存入errno 中, 而文件读写位置则无法预期.

错误代码:EINTR 此调用被信号所中断.EAGAIN 当使用不可阻断I/O 时(O_NONBLOCK), 若无数据可读取则返回此值.EBADF 参数fd 非有效的文件描述词, 或该文件已关闭.

Fuzzer开发

前置知识

C函数如何返回字符串?

C语言中函数返回字符串的四种方法

强制转换int类型

(type_name) expression

新建动态大小数组

C语言实现动态数组,克服静态数组大小固定的缺陷_C语言中文网

Waitpid