Skip to content

Corrections to base64 encoding function #1804

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Mar 28, 2019
Merged

Conversation

apjarvis
Copy link
Contributor

As noted by Manuel there were deficiencies in the base64 encode routine which I believe this change fixes.

Added the ability to decode 'plist' data structures.

Additional code is in C (not C++) so really needs re-working
but only if of wider interest.
'Edit display format'. Compiles but gets stuck when 'plist'
is selcted, however selecting a different format clears the
issue.
both encode and decode of base64 is implemented only the decoding is
currently exported.
@mgrojo
Copy link
Member

mgrojo commented Mar 15, 2019

Hi, @apjarvis

It's working much better, but I'm still getting incorrect results for some values. I use the following query to catch them:

SELECT weekdays,toBase64(weekdays) FROM demodata WHERE unbase64(tobase64(weekdays)) <> weekdays;

Values having problems are:

  • Three letter weekdays in English: Mon, Tue... All of them have an invalid character at the end of the result (7F)
weekdays toBase64(weekdays)
Wed V2Vkg�
Thu VGh1g�
  • TEXT storing numbers in this format: 7791404.0091921333
  • Some long texts, but I couldn't find a pattern.

Values working right:

  • Dates in the format "YYYY-MM-DD"
  • Text storing "True" or "False".
  • INTEGER values

Hope this helps you to fix the remaining issues.

@apjarvis
Copy link
Contributor Author

apjarvis commented Mar 16, 2019 via email

@mgrojo
Copy link
Member

mgrojo commented Mar 16, 2019

You can try with the following sample data:

CREATE TABLE "demodata" (
	"clientid"	INTEGER,
	"date"	TEXT DEFAULT CURRENT_DATE,
	"weekdays"	TEXT,
	"gains"	TEXT,
	"prices"	TEXT,
	"up"	TEXT,
	"bindata"	BLOB
);
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('0', '2008-04-30', 'Wed', '-0.52458192906686452', '7791404.0091921333', 'False');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('1', '2008-05-01', 'Thu', '0.076191536201738269', '3167180.7366340165', 'True');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('2', '2008-05-02', 'Fri', '-0.86850970062880861', '9589766.9613829032', 'False');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('3', '2008-05-03', 'Sat', '-0.42701083852713395', '8949415.1867596991', 'False');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('4', '2008-05-04', 'Sun', '0.2532553652693274', '937163.44375252665', 'True');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('5', '2008-05-05', 'Mon', '-0.68151636911081892', '949579.88022264629', 'False');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('6', '2008-05-06', 'Tue', '0.0071911579626532168', '7268426.906552773', 'True');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('7', '2008-05-07', 'Wed', '0.67449747200412147', '7517014.782897247', 'True');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('8', '2008-05-08', 'Thu', '-1.1841008656818983', '1920959.5423492221', 'False');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('9', '2008-05-09', 'Fri', '-1.5803692595811152', '8456240.6198725495', 'False');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('10', '2008-05-06', 'Tue', '0.0071911579626532168', '7268426.906552773', 'True');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('11', '2008-05-07', 'Wed', '-0.86850970062880861', '9589766.9613829032', 'False');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('12', '2008-05-03', 'Sat', '-0.42701083852713395', '8949415.1867596991', 'False');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('13', '2008-05-05', 'Mon', '-0.68151636911081892', '949579.88022264629', 'False');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('14', '2008-05-06', 'Tue', '0.0071911579626532168', '7268426.906552773', 'True');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('15', '2008-05-07', 'Wed', '0.67449747200412147', '7517014.782897247', 'True');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('16', '2008-05-08', 'Thu', '-1.1841008656818983', '1920959.5423492221', 'False');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('17', '2008-05-09', 'Fri', '-1.5803692595811152', '8456240.6198725495', 'False');

In bindata, insert whatever file or data you want.

@justinclift
Copy link
Member

What's the state of this? Still being worked on, or ready to be merged, or ? 😄

@mgrojo
Copy link
Member

mgrojo commented Mar 24, 2019

According to my quick tests, it seems to work a lot better. I only had problems when converting strings containing non-US-ASCII characters, like: 'ñ¡á', etc. But I don't know why it isn't working, because at the plain eye, the result is correct. For example:

SELECT '¡Hola!',toBase64('¡Hola!'), unbase64(tobase64('¡Hola!')), '¡Hola!' <> unbase64(tobase64('¡Hola!'));

Returns:

'¡Hola!' toBase64('¡Hola!') unbase64(tobase64('¡Hola!')) '¡Hola!' <> unbase64(tobase64('¡Hola!'))
¡Hola! wqFIb2xhIQ== ¡Hola! 1

While

SELECT ' Hola!',toBase64(' Hola!'), unbase64(tobase64(' Hola!')), ' Hola!' <> unbase64(tobase64(' Hola!'));

returns:

' Hola!' toBase64(' Hola!') unbase64(tobase64(' Hola!')) ' Hola!' <> unbase64(tobase64(' Hola!'))
Hola! IEhvbGEh Hola! 0

Note that '¡' is replaced by ' ' and that the unequal test is true in the first case (not good) and false (good) in the second.

@apjarvis
Copy link
Contributor Author

The base64 decoder has a simple test to see if the decoded object is ASCII. If it is then it returns a 'TEXT' item otherwise it returns a 'BLOB'. For symmetry the encoder handles just 'BLOB's and 'TEXT's (8 bit). I can certainly add encoding 'TEXT16' however then I need a reliable way for the decoder to decide that something is valid 'TEXT16'. That is something I have not found how to do yet. I can certainly encode 'TEXT16' but currently it will be decoded into a 'BLOB'. I am happy to add that if wanted.

@mgrojo
Copy link
Member

mgrojo commented Mar 25, 2019

What I see is that UTF-8 is decoded as a BLOB, and that's why SQLite says the results are different, while the actual encoding is the same. I see this using the quote function:

SELECT quote(unbase64(tobase64('¡Hola!')))
returns
X'C2A1486F6C6121'

Compare with
SELECT quote('¡Hola!')
that returns
'¡Hola!'

I suppose TEXT16 is used for UTF-16 and not for UTF-8 as is the case for my DB, so it wouldn't change my case. In fact, if I don't understand wrongly, SQLite3 would convert the data between both encodings according to https://www.sqlite.org/version3.html so processing TEXT16 wouldn't change anything, I think. The only improvement might be to detect correct UTF-8 and return it as TEXT.

@apjarvis
Copy link
Contributor Author

After decoding base64 the result is checked against the UTF8 format, i.e. using the header bits to check how many bytes are in each character and verifying that the extension bytes also have the correct header bits. If all characters match the required format then the file is assumed to be UTF8 and returned as 'TEXT' otherwise a 'BLOB' is returned.
Previously the test for returning 'TEXT' was all characters were printable ASCII or white space so there is a slight change here as a file full of control characters (e.g. nul bytes) will now be encoded as TEXT rather than a BLOB. Apart from a file containing all zero bytes I think that a file consisting of only control characters is unlikely to occur.

@mgrojo mgrojo merged commit c44286b into sqlitebrowser:master Mar 28, 2019
@mgrojo
Copy link
Member

mgrojo commented Mar 28, 2019

Thanks @apjarvis. Everything seems to work now so I've merged your work again.

@justinclift
Copy link
Member

Whooo!

Should we add this to our win and macOS builds?

@Z4us
Copy link

Z4us commented Mar 29, 2019 via email

@justinclift
Copy link
Member

Good point. I'll have a go at add it to our win and macOS builds, and see how it turns out... 😄

@justinclift
Copy link
Member

justinclift commented Mar 29, 2019

k, todays macOS nightly builds have just been recreated, this time including the new "formats" extension in the .app "Extensions" directory:

Anyone around on macOS able to try them out?

I'll have a go at building on win next...

@justinclift
Copy link
Member

justinclift commented Mar 29, 2019

Building it is failing with MSVC 2015:

C:\dev\SQLite-Win32>cl /MD extension-formats.c -link -dll -def:extension-formats.def -out:formats.dll
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x86
Copyright (C) Microsoft Corporation.  All rights reserved.

extension-formats.c
extension-formats.c(772): error C2072: '_errno': initialization of a function
extension-formats.c(801): error C2072: '_errno': initialization of a function
extension-formats.c(914): error C2072: '_errno': initialization of a function

@apjarvis Any ideas on fixing that? 😄

@justinclift
Copy link
Member

Follow up note - this was fixed by @apjarvis in #1827. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants