Wireshark - Reassembling TCP streams with large XML in payload (multiple TCP packets)

asked 2020-07-07 12:18:24 +0000

ElrondMcBong gravatar image

updated 2020-07-07 12:26:38 +0000

Hello community,

I am trying to write a custom dissector for following purpose:

There are multiple TCP streams in the network, which I want analyse. The main traffic are TCP packets which have plain XML payload. Some of the XMLs are small enough to fit in only one TCP packet. These could be dissected by the TCP/XML dissector of Wirehshark which would work fine.

But I also have XML which are so long that they are stretched over multiple TCP packets. These TCP packets have to be reassembled first to get the whole XML. And that is the main purpose, I want the whole XML as output in a header field.

I have started with the C-String example in the file reassemble.readme of the Wireshark sources. I have a more or less running dissector written. In my test scenarios it worked fine, but if I use captured pcap files wireshark closes without an error message or something. I have no clue why this happens.

Here is the code of the dissector.

#include "config.h"
#include "string.h"
#include <epan/packet.h>
#include <epan/tap.h>
#include <epan/prefs.h>
#include <epan/dissectors/packet-tcp.h>
#include <epan/tvbparse.h>
#include <epan/reassemble.h>
//#include "packet-tcp.h"

#define SOME_PORT 1337
#define debug TRUE

static const char *nicerange;
static int proto_rst_xml = -1;

static int tvb_old_length = 0;
static char *xml_header;
static int old_frame_number = -1;

// Definitions of Header-Fields (hf) for dissection 
static int hf_xml_end = -1;
static int hf_cstring = -1;
static gboolean search_for_xml_header = TRUE;


static int
dissect_rst_xml(tvbuff_t *tvb, packet_info *pinfo, proto_tree *tree _U_, void *data _U_)
{
    static char search_end[] = ""; 
    guint offset = 0;
    char search_begin[] = "<?xml";

    gboolean xml_finished = FALSE;
    const guint8 *tvb_string;

    #ifdef debug
    g_print("\n\nFRAME: %i\n", pinfo->fd->num);
    #endif

    char *search_end_ptr;
    tvb_string = tvb_get_string_enc(wmem_packet_scope(), tvb, 0, tvb_reported_length(tvb), ENC_UTF_8);



    if (strstr(tvb_string, search_begin) == NULL)
    {
        #ifdef debug
        g_print("%i no xmlbegin: \n", pinfo->fd->num);
        #endif
        tvb_old_length = 0;
        return tvb_captured_length(tvb);
    }

    #ifdef debug
    g_print("%i is xml: \n", pinfo->fd->num);
    g_print("search for xml header %i \n",search_for_xml_header);
    #endif

    if (search_for_xml_header==TRUE && old_frame_number != pinfo->fd->num){

        search_for_xml_header = FALSE;
        char tvb_string_copy[20000];
        memset(tvb_string_copy,0,strlen(tvb_string_copy));
        strcpy(tvb_string_copy,tvb_string);

        memset(search_end,0,strlen(search_end));
        strcpy(search_end,"</");
        char delimiter[] = "<>";

        // get first token
        search_end_ptr = strtok(tvb_string_copy, delimiter);
        guint i=0;

        //Opening XML-Tag is in the second token
        while((search_end_ptr != NULL) && (i<1)){    
            g_print("%s \n", search_end_ptr);
            search_end_ptr = strtok(NULL,delimiter);

            if(i == 0){

                strcat(search_end,search_end_ptr);
                strtok(search_end, " ");
                g_print("XML-ENDE-TAG: %s \n", search_end);
            }
            i++;
        }


        g_print("XML-END-TAG: %s \n", search_end);

        #ifdef debug
        g_print("copy of tvb_string: %s \n",tvb_string_copy);
        #endif
    }

    old_frame_number = pinfo->fd->num;


    while (offset < tvb_reported_length(tvb))
        {
            gint available = tvb_reported_length_remaining(tvb, offset);
            gint len = tvb_strnlen(tvb, offset, available);

            tvb_string = tvb_get_string_enc(wmem_packet_scope(), tvb, 0, available, ENC_UTF_8);

            //g_print(tvb_string);
            if (strstr(tvb_string, search_end) != NULL)
            {
                #ifdef debug
                g_print("xml finnsihed tag found %s \n", search_end);
                #endif
                xml_finished = TRUE;
            }


            if(xml_finished == FALSE)
            {
                /* we ran out of data: ask for more */
                pinfo->desegment_offset = offset;
                pinfo->desegment_len = DESEGMENT_ONE_MORE_SEGMENT;

                #ifdef debug
                g_print("Waypoint: xml not finnished\n");
                #endif

                return (offset + available);
            }

            col_set_str(pinfo->cinfo, COL_INFO, "RSTXML String ...
(more)
edit retag flag offensive close merge delete

Comments

Edit1:

I found one issue,

strcpy copied in the second run of the dissectors processing the whole tvb_string in the tvb_string_copy, causing an memory overflow.

I now use strncpy to limit the stringlength.

Seems to work now.

But in this context the dissector work properly only if i separate the different TCP streams in separate files and then open these files to dissect them.

Any advice to this behaviour?

If you have some other advises let me know!

Edit 2: There was another memory issue due to line

static char search_end[] = "";

I ended up with setting an size of the char array to

static char search_end[2000] = "";

this should be more than long enough to capture the closing tag of the xml.

It seems to work flawless now.

ElrondMcBong gravatar imageElrondMcBong ( 2020-07-07 15:14:57 +0000 )edit