Ask Your Question
0

in TCP Follow Stream Window, can support of CP1252 encoding be added

asked 2019-10-01 21:04:48 +0000

Dernyn gravatar image

updated 2019-10-17 18:04:47 +0000

Can CP1252 encoding support be added to the follow stream window; if so it would provide better support for my new font to display bytes in ASCII-8 character form with CP1252 , more info on https://github.com/dernyn/256

please understand what CP1252 encoding is before commenting or denying my request.

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
0

answered 2019-10-01 22:38:35 +0000

Guy Harris gravatar image

If you mean "interpret the characters in the byte stream as CP1252 characters, so that bytes with the 8th bit set are to be interpreted as being in CP1252, that could probably be added. File an enhancement request on the Wireshark Bugzilla.

Wireshark will, however, continue to use UTF-8 internally, with Qt using, I think, UTF-16 inside its strings. It's not ever going to be using CP1252 internally. If you want to use your font, it'd better be a font that Qt can use.

edit flag offensive delete link more

Comments

if you followed the link, you will see how QT is the main example in my font preview!.

QT supports CP1252 Just fine. it's all about a encoding API call.

Dernyn gravatar imageDernyn ( 2019-10-16 21:17:43 +0000 )edit

As it's apparently so simple please submit your change as detailed on the Wiki page Submitting patches.

grahamb gravatar imagegrahamb ( 2019-10-17 10:08:43 +0000 )edit

if you followed the link,

I DID follow the link. The only thing it shows about Qt is a display inside Qt Creator. None of that indicates what you're asking for.

Your text first speaks of non-Unicode encodings:

This Font works best with character Region English - Western/Latin1 encoding providing single-byte character encoding with CP1252/Windows-1252/IBM819/CP819/iso-ir-100/csISO-Latin1/ibm-5348 and ISO-8859-1 depending on Text editor support.

This Font does not support UTF-8 encoding, Inherently Fails to fully encode in UTF-8 due to it's native lack of single-byte character support by the standard, which is limited to ASCII-7 (128).

but then says it's a Unicode-encoded font:

It is an ISO 10646-1(Unicode,BMP) encoded True-Type Font(TTF)

If it's a Unicode-encoded font, so that any software that handles Unicode can use it to display, at least, the subset of Unicode that includes the characters in ...(more)

Guy Harris gravatar imageGuy Harris ( 2019-10-17 16:26:30 +0000 )edit

@grahamb thanks for all your help, It seems I have angered you by my comments. I did Open the enhancement request as indicated. I may just make the changes in the code myself and provide a patch, but I was avoiding having to do so if it's a new add feature, I don't get the push back or the shaming for my suggestion that it can't be that hard, as if what I am requesting is impossible to do. it's a native API call in the Qt SDK, it can't be that hard is all I'm indicating.

Qt Creator is no different than Qt as a GTK+ programming interface with support for all these different encoding within itself , what I hence at is that if it works in Qt creator it works in any Qt API call by other apps. why does it feels ...(more)

Dernyn gravatar imageDernyn ( 2019-10-17 17:32:42 +0000 )edit

@Dernyn, no problem at all, but as you seem to be the only person requesting this change it's unlikely to happen unless you step up to the plate and submit the required change, this isn't being awkward or pushing back, just stating reality.

I didn't see any mention in the question or comments that an item has been raised on Bugzilla.

The Wireshark devs are generally welcoming and grateful for all changes submitted and requests made, but as we are all volunteers (except Gerald) we don't have much spare time for investigating and implementing what seems to be an esoteric request.

grahamb gravatar imagegrahamb ( 2019-10-17 17:51:56 +0000 )edit

I did Open the enhancement request as indicated.

Presumably that's bug 16137.

Guy Harris gravatar imageGuy Harris ( 2019-10-17 18:00:34 +0000 )edit

So, just because a font is Unicode capable - meaning it can allocate mappings in the Unicode regions( ISO 10646-1(Unicode,BMP), it does not mean it covers UTF-8. UTF-8 is not the defacto Unicode standard or mapping. CP1252/ISO-8859-1 is covered/supported in the Unicode mappings and that's why my font works the way it does.

if you tried to use my font in a UTF-8 encoding, it will not render correctly because is not a UTF-8/16 Font, although it is a Unicode Font. Unicode does not just means UTF-8 support, that's just a subset of the Unicode available mapping.

I also can't have UTF-8 and CP1252/ISO-8859-1 in one font, they are two different subsets encodings of Unicode, supporting different sections of the Unicode Map.

Unicode assigns "a unique number for every character, no matter what platform, device, application or language."

UTF-8 and UTF-16 are ...(more)

Guy Harris gravatar imageGuy Harris ( 2019-10-17 18:48:55 +0000 )edit

This is how Qt Creator does it.

QString utf16LineTextInUtf8Buffer(const QByteArray &utf8Buffer, int currentUtf8Offset)
{
    const int lineStartUtf8Offset = currentUtf8Offset
                                        ? (utf8Buffer.lastIndexOf('\n', currentUtf8Offset - 1) + 1)
                                        : 0;
    const int lineEndUtf8Offset = utf8Buffer.indexOf('\n', currentUtf8Offset);
    return QString::fromUtf8(
        utf8Buffer.mid(lineStartUtf8Offset, lineEndUtf8Offset - lineStartUtf8Offset));
}

static bool isByteOfMultiByteCodePoint(unsigned char byte)
{
    return byte & 0x80; // Check if most significant bit is set
}

bool utf8AdvanceCodePoint(const char *&current)
{
    if (Q_UNLIKELY(*current == '\0'))
        return false;

    // Process multi-byte UTF-8 code point (non-latin1)
    if (Q_UNLIKELY(isByteOfMultiByteCodePoint(*current))) {
        unsigned trailingBytesCurrentCodePoint = 1;
        for (unsigned char c = (*current) << 2; isByteOfMultiByteCodePoint(c); c <<= 1)
            ++trailingBytesCurrentCodePoint;
        current += trailingBytesCurrentCodePoint + 1;

    // Process single-byte UTF-8 code point (latin1)
    } else {
        ++current;
    }

    return true;
}
Dernyn gravatar imageDernyn ( 2019-10-17 18:58:25 +0000 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2019-10-01 21:04:48 +0000

Seen: 550 times

Last updated: Oct 17 '19