Weird Symbols in Old Posts

Knyght

The Collector
#1
This actually happened during TFF's first move but it suddenly came to mind how older posts had certain characters turning into random symbols. Taking a quick look in some Preview threads, there are old posts where all the apostrophes have become Æ, opening quotation marks have become ô and closing quotation marks have become ö.

It never got fixed before but I'm wondering if Xenforo's software would make it more feasible.
 

NuitTombee

Immortal Capo
#2
Definitely should be possible with a find/replace.
 
Last edited:

Lord Raa

Exporter of Juice Tins
#3
Might be worth contacting the author of the post/fic to get them to repost it. Would also have the benefit of reminding them that we're open for business.
 

Shirotsume

Not The Goddamn @dmin
#4
We already looked at this on icyboards. So the explanation of what happened:

MySQL's UTF-8 isn't actually UTF-8. Instead, they fucked it up and only store 3 bytes per character. this is wrong, a UTF-8 character can be up to 4 bytes. MySQL released a work-around in 2010 called 'utf8mb4' but they never told anyone about it and even to this day most people don't know about it, and it doesn't fix what's already broken. icyboards used utf8, not utf8mb4, as did...pretty much everything.

That missing fourth byte is why things were fucked up- any character that used that fourth byte (like most 'directional' quotes that Word or TextEdit uses, many hyphens, etc) would need converted from their broken 3 byte state (because mysql discarded the fourth byte) to either their correct 4 byte state (assuming this forum doesn't have the same problem) or an 'equivalent' <3 byte character.

We discussed doing this on icyboards- I know how to do that conversion.

It was decided however that it wasn't worth the risk of FUBARing the database- always a risk when doing multiple table-wide find and replaces, and that authors could manually fix their old stuff if they wished (because the idea was raised of having admins/mods fix them manually as well, and that was shot down for 'author's works shouldn't be edited without their consent').

tl;dr: History lesson on the issue.
 

PCHeintz72

The Sentient Fanfic Search Engine mk II
#6
XF2 supporting the UTF8MB4 or not is not actually relevant.

The problem as I understand is it is too late in regards to the post content... The old posts were already converted back when the forum was converted to Icyboards (and I do well remember the issue being discussed back then)... meaning the information was stripped and is now permanently gone. The switch to XF2 does not solve the issue as it does not restore the stripped out information from old posts, it merely would prevent future corruption/conversion.

If they can directly access the databases they can do a general search and replace, but as was already pointed out, doing such a thing has the potential for making matters worse and causing corruption of the database.

While I've not worked much with forum software databases, I deal with databases a lot (I am a programmer by occupation), and I know for a fact it is fairly easy unless very careful to cause such corruption... all it would take is to mishandle one of the fields or clear up a field that should not be and you may well get potentially very odd access issues.
 

Shirotsume

Not The Goddamn @dmin
#7
That said, Wata and Ero now have direct access to the database- making a backup isn't actually a big deal anymore. (in truth, we should probably talk about setting up (encrypted) off-site backups.)
 

Knyght

The Collector
#9
That reminds me; whatever happened to the TFF archive? Lost to history?
 
Top