-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Corrections to base64 encoding function #1804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added the ability to decode 'plist' data structures. Additional code is in C (not C++) so really needs re-working but only if of wider interest.
'Edit display format'. Compiles but gets stuck when 'plist' is selcted, however selecting a different format clears the issue.
item to be parsed.
…ossible memory leak.
both encode and decode of base64 is implemented only the decoding is currently exported.
a text object otherwise create a blob.
Updating with master branch
Hi, @apjarvis It's working much better, but I'm still getting incorrect results for some values. I use the following query to catch them: SELECT weekdays,toBase64(weekdays) FROM demodata WHERE unbase64(tobase64(weekdays)) <> weekdays; Values having problems are:
Values working right:
Hope this helps you to fix the remaining issues. |
Hi Manuel,
I put the three letter week days through my test and got the following:
'Mon': n = 4 - TW9u, m = 3, d = Mon
'Tue': n = 4 - VHVl, m = 3, d = Tue
'Wed': n = 4 - V2Vk, m = 3, d = Wed
'Thu': n = 4 - VGh1, m = 3, d = Thu
'Fri': n = 4 - RnJp, m = 3, d = Fri
'Sat': n = 4 - U2F0, m = 3, d = Sat
'Sun': n = 4 - U3Vu, m = 3, d = Sun
Format is: Input, length of encoded data, encoded data, length of decoded
data, decoded data.
What I notice is that your encoding for Wed has extra letters (V2Vkg rather
than V2Vk) which would imply that I am not getting the data correctly.
Currently I use sqlite3_value_text to get the data if it is either 'Text'
or 'Blob' (every other type is ignored) and sqlite3_value_bytes to get the
number of bytes of data - which I assume excludes any text delimiter.
I am not familiar with SQL at all so would you send me the commands I need
to enter to create a database with a sample of each data type in it. That
way if I view it in Hex I might see what I'm doing wrong.
Cheers
Paul
|
You can try with the following sample data: CREATE TABLE "demodata" (
"clientid" INTEGER,
"date" TEXT DEFAULT CURRENT_DATE,
"weekdays" TEXT,
"gains" TEXT,
"prices" TEXT,
"up" TEXT,
"bindata" BLOB
);
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('0', '2008-04-30', 'Wed', '-0.52458192906686452', '7791404.0091921333', 'False');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('1', '2008-05-01', 'Thu', '0.076191536201738269', '3167180.7366340165', 'True');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('2', '2008-05-02', 'Fri', '-0.86850970062880861', '9589766.9613829032', 'False');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('3', '2008-05-03', 'Sat', '-0.42701083852713395', '8949415.1867596991', 'False');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('4', '2008-05-04', 'Sun', '0.2532553652693274', '937163.44375252665', 'True');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('5', '2008-05-05', 'Mon', '-0.68151636911081892', '949579.88022264629', 'False');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('6', '2008-05-06', 'Tue', '0.0071911579626532168', '7268426.906552773', 'True');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('7', '2008-05-07', 'Wed', '0.67449747200412147', '7517014.782897247', 'True');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('8', '2008-05-08', 'Thu', '-1.1841008656818983', '1920959.5423492221', 'False');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('9', '2008-05-09', 'Fri', '-1.5803692595811152', '8456240.6198725495', 'False');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('10', '2008-05-06', 'Tue', '0.0071911579626532168', '7268426.906552773', 'True');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('11', '2008-05-07', 'Wed', '-0.86850970062880861', '9589766.9613829032', 'False');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('12', '2008-05-03', 'Sat', '-0.42701083852713395', '8949415.1867596991', 'False');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('13', '2008-05-05', 'Mon', '-0.68151636911081892', '949579.88022264629', 'False');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('14', '2008-05-06', 'Tue', '0.0071911579626532168', '7268426.906552773', 'True');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('15', '2008-05-07', 'Wed', '0.67449747200412147', '7517014.782897247', 'True');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('16', '2008-05-08', 'Thu', '-1.1841008656818983', '1920959.5423492221', 'False');
INSERT INTO "main"."demodata" ("clientid", "date", "weekdays", "gains", "prices", "up") VALUES ('17', '2008-05-09', 'Fri', '-1.5803692595811152', '8456240.6198725495', 'False'); In bindata, insert whatever file or data you want. |
What's the state of this? Still being worked on, or ready to be merged, or ? 😄 |
According to my quick tests, it seems to work a lot better. I only had problems when converting strings containing non-US-ASCII characters, like: 'ñ¡á', etc. But I don't know why it isn't working, because at the plain eye, the result is correct. For example:
Returns:
While
returns:
Note that '¡' is replaced by ' ' and that the unequal test is true in the first case (not good) and false (good) in the second. |
The base64 decoder has a simple test to see if the decoded object is ASCII. If it is then it returns a 'TEXT' item otherwise it returns a 'BLOB'. For symmetry the encoder handles just 'BLOB's and 'TEXT's (8 bit). I can certainly add encoding 'TEXT16' however then I need a reliable way for the decoder to decide that something is valid 'TEXT16'. That is something I have not found how to do yet. I can certainly encode 'TEXT16' but currently it will be decoded into a 'BLOB'. I am happy to add that if wanted. |
What I see is that UTF-8 is decoded as a BLOB, and that's why SQLite says the results are different, while the actual encoding is the same. I see this using the quote function:
Compare with I suppose TEXT16 is used for UTF-16 and not for UTF-8 as is the case for my DB, so it wouldn't change my case. In fact, if I don't understand wrongly, SQLite3 would convert the data between both encodings according to https://www.sqlite.org/version3.html so processing TEXT16 wouldn't change anything, I think. The only improvement might be to detect correct UTF-8 and return it as TEXT. |
After decoding base64 the result is checked against the UTF8 format, i.e. using the header bits to check how many bytes are in each character and verifying that the extension bytes also have the correct header bits. If all characters match the required format then the file is assumed to be UTF8 and returned as 'TEXT' otherwise a 'BLOB' is returned. |
Thanks @apjarvis. Everything seems to work now so I've merged your work again. |
Whooo! Should we add this to our win and macOS builds? |
Why not?
|
Good point. I'll have a go at add it to our win and macOS builds, and see how it turns out... 😄 |
k, todays macOS nightly builds have just been recreated, this time including the new "formats" extension in the .app "Extensions" directory:
Anyone around on macOS able to try them out? I'll have a go at building on win next... |
Building it is failing with MSVC 2015:
@apjarvis Any ideas on fixing that? 😄 |
As noted by Manuel there were deficiencies in the base64 encode routine which I believe this change fixes.