Question re: Software Search

This is the place to talk about anything not related to Transport Tycoon itself.

Moderator: General Forums Moderators

Post Reply
User avatar
AntoninKyrene
Engineer
Engineer
Posts: 74
Joined: 29 May 2011 17:32
Location: SW US

Question re: Software Search

Post by AntoninKyrene »

Hello All!

I need a piece of software that I can use to compare two versions of multiple files (using an A/B/X process) which are not quite the same .doc/.docx/.odt/.txt files as their name may indicate. I have consolidated a multitude of backups and recovered archives, leaving me multiple examples of a 'document' that has 10-200 versions. Unfortunately, the 200th versions is not always the most-current, or had the most-current updates. In one case, the 7th version is considered the gospel, and everything beyond that is duplicates of older versions, clones, or cast-offs from the incremental process. And they are not necessarily identical - sometimes the only thing that changed was the metadata properties (usually the subversion numbers)

ExamDiff has been suggested. It would work, but it would be laborious. There are less than 100 documents, so it is doable. However, if anyone knows of any other software that might work, I would welcome the input. This is strictly for text comparison - metadata comparison is sufficiently workable from the embedded document properties.

EXAMPLE: Document 22 is 57 pages long. The 24th version is the most-current - it has four words removed in comparison to the 22nd version. The four words describe a historical event that is now considered 'non-Canon' after some changes in the historical timeline in other documents, so they were removed. The 23rd version is an incremental backup of the 22nd version that was triggered by a metadata change in the document itself (likely a migration from Word to Writer, but we're not sure yet). ExamDiff could have found the differences between 22 and 24, but not 22 and 23 - that was found in the document properties metadata.

Why ask here? Smart people hang out here. This I have learned through the years... :D
Kuolema Tekee Taiteilijan
User avatar
ChillCore
Tycoon
Tycoon
Posts: 2822
Joined: 04 Oct 2008 23:05
Location: Lost in spaces

Re: Question re: Software Search

Post by ChillCore »

Would Meld meet your needs? ... https://meldmerge.org/
I mean trying is free and using it also ... xD

^^^ there are similar progs around but never felt the need to search for one myself ...

Unfortunately, the 200th versions is not always the most-current, or had the most-current updates. In one case, the 7th version is considered the gospel, and everything beyond that is duplicates of older versions,
unless you know< what to compare and what not ... this may remain a prob regordless of whatevs prog you use .... even if it is 'just' using Git, aplying all files in order one after another and then compare pacthes pulled from it
-- .- -.-- / - .... . / ..-. --- .-. -.-. . / -... . / .-- .. - .... / -.-- --- ..- .-.-.-
--- .... / -.-- . .- .... --..-- / .- -. -.. / .--. .-. .- .. ... . / - .... . / .-.. --- .-. -.. / ..-. --- .-. / .... . / --. .- ...- . / ..- ... / -.-. .... --- --- -.-. .... --- --- ... .-.-.- / ---... .--.

Playing with my patchpack? Ask questions on usage and report bugs in the correct thread first, please.
All included patches have been modified and are no longer 100% original.
User avatar
odisseus
Director
Director
Posts: 568
Joined: 01 Nov 2017 21:19

Re: Question re: Software Search

Post by odisseus »

I'm afraid there is no automatic way to solve this problem, especially if the "current" version is not necessarily the newest file. That said, there are a few tricks to make your work easier:
  • Each file has some metadata associated with it. Depending on the filesystem, this may be the date when it was created, last modification date, last access date etc. However, this metadata is unreliable: for example, when an old file is renamed, it gets a new creation date.
  • Advanced text formats (ODT, DOC and DOCX) also contain their own metadata. These may be somewhat more reliable, but they don't tell the whole story either.
  • You can verify that two given files have byte-for-byte identical contents by comparing their checksums (e.g. MD5 or SHA256).
  • Apparently LibreOffice is capable of comparing a file with its older version, or even two different files. MS Office likely has similar functionality.
  • For comparing plain text files, you need a diff tool.
Here's some advice which will help you avoid this problem in the future:
  • Start using a version control system. This is especially important when you collaborate with other authors.
  • Consider adopting a text-based format such as Markdown for the ease of comparing different versions.
  • Consider a collaborative note-taking system such as Obsidian.
  • For a completely non-technical community, Google Docs may be a more convenient option.
User avatar
AntoninKyrene
Engineer
Engineer
Posts: 74
Joined: 29 May 2011 17:32
Location: SW US

Re: Question re: Software Search

Post by AntoninKyrene »

Thank you!

Chillcore: unless you know< what to compare and what not
That is actually the problem. I am the only person who could know, as the data represents the fictitious world that only exists in my mind's eye. Hence, the need to clean it all up and allocate the remaining print as archival (i.e., copyright protection).

odisseus: Here's some advice which will help you avoid this problem in the future:
Version Control makes the most sense to me.

Canon has been established and the timeline is now locked, so moving forward, it has more to do with making sure new writings and new ideas conform to Canon. And, of course, making sure you can cross-reference a new event backwards and forwards in time in a way it can exist without conflict.

There will be no "Chewbacca died first!" in the novels but "Han died first!" in the movies nonsense from this author. :!:
Kuolema Tekee Taiteilijan
Post Reply

Return to “Off-Topic”

Who is online

Users browsing this forum: No registered users and 12 guests