Need an algorithm :)

Forum for technical discussions regarding development. If you have a general suggestion, problem or comment, please use one of the other forums.

Moderator: OpenTTD Developers

Post Reply
User avatar
ffpp
Engineer
Engineer
Posts: 125
Joined: 29 Jan 2010 12:56

Need an algorithm :)

Post by ffpp »

Hi,

does anybody have an efficient solution for this ?

Lets say 'Input' is an array of (positive) int of unknown length and unknown content, i.e. integers can be all over the place in Input, it is unsorted there may be duplicate elements.
Now I have two more arrays Given1, Given2 which also are of unknown length and contain ints (>=0).
I would like to find out efficiently which elements of Input are *not* to be found in either Given1 or Given2, with respect to uniqueness of elements.
Ex:
Input [ 9, 2, 3, 3, 3, 42, 1001]
Given1 [ 42, 3 ]
Given2 [ 3, 4, 5, 9 ]
then the output should be [2, 3, 1001].

I can think of several solutions but they all seem rather inefficient. So at least out of curiosity, can anybody think of anything good ? :D
Arie-
Director
Director
Posts: 593
Joined: 20 Jan 2009 16:07

Re: Need an algorithm :)

Post by Arie- »

I haven't had any courses on algorithms (only general OO knowledge from some Java courses) so I probably cannot help you, but my guess is there's a lot of people on Stack Exchange who probably love to help you.

Edit: btw, my approach would be:
- to sort the first list
- then iterate over the other two list and remove each element from the first list if an element exists.
Eddi
Tycoon
Tycoon
Posts: 8258
Joined: 17 Jan 2007 00:14

Re: Need an algorithm :)

Post by Eddi »

yes, sorting would probably be my approach as well, hashtables might work also. if you want to write this down as elegant as possible, you'd probably want to use a "multiset", if your language/library offers that.

so: output = multiset(input) - multiset(given1) - multiset(given2)
where "-" is a setminus
User avatar
ffpp
Engineer
Engineer
Posts: 125
Joined: 29 Jan 2010 12:56

Re: Need an algorithm :)

Post by ffpp »

I was thinking of hashtables too.
One thing I forgot to mention thatall the arrays are immutable (or better say I'm not allowed to modify them).
Creating a temporary array that is sorted would be possible though.
The library in use is the openttd source btw. That is if I don't want to use anything external, additionally.
J0anJosep
Traffic Manager
Traffic Manager
Posts: 139
Joined: 06 Aug 2011 15:51
Location: Spain

Re: Need an algorithm :)

Post by J0anJosep »

The core/smallvec_type.hpp and core/sort_func.hpp can be useful. I think they provide enough tools for your algorithm.

Can the output be returned sorted?
Ex:
Input: [1,2,3,4,4,4,7,6]
Given1: [1,4]
Given2: [3]
Output: [2,4,4,7,6] or [2,4,4,6,7]?

If the output can be sorted, then there's an elegant solution using "int FindIndex(...)" of SmallVector type. Not fast, but at least simple to write.
Formerly known as Juanjo
User avatar
FooBar
Tycoon
Tycoon
Posts: 6553
Joined: 21 May 2007 11:47
Location: The Netherlands
Contact:

Re: Need an algorithm :)

Post by FooBar »

Doesn't C++ have a function to do that?
I know PHP has the function array_diff() that does "Returns an array containing all the entries from array1 that are not present in any of the other arrays."

Otherwise something else that could work:
- combine both Given arrays (not caring about duplicate values or sort order);
- foreach value in Input, check if value exists in Given;
- if not in Given, add value to new array Output;
- at the end Output will only contain the values of Input that are not in Given.

But then again maybe C++ has better ways to do this.
Alberth
OpenTTD Developer
OpenTTD Developer
Posts: 4763
Joined: 09 Sep 2007 05:03
Location: home

Re: Need an algorithm :)

Post by Alberth »

The simplest solution I think is to make a std::map<int, int> (from number to count).
Add the Given* arras in it (add entry, or increment count)
Subtract Input from it (decrement count, unless it is 0).

The only problem I have with the above is whether map<> is acceptable in trunk. I would have to ponder about that one, and also have a closer look at what data structs exist in the course code atm.
Creat
Traffic Manager
Traffic Manager
Posts: 141
Joined: 26 Oct 2009 16:33

Re: Need an algorithm :)

Post by Creat »

To answer your question about efficiency, there's a bit of additional information needed: How large (or small) are the arrays usually? How often do you have to check which values in Given1/2 are not contained in Input? Can you keep (sorted) copies of the arrays? If so, can you keep them synchronized (because for example Input it only appended and nothing in the middle or beginning is removed)?

If the sizes of Given1/2 are k, l and size of Input is n, then trivially checking every element against the entire input array will take k*l*n comparisons. If k and l are typically quite small that isn't so bad (you can also check k and l simultaneously, then it becomes (k+l)*n). Sorting Input alone has a complexity of n*log(n), which can be significant for big n. If you constantly have to sort it, because you can't keep a sorted copy, that alone will likely take more cycles than searching with 'brute force'.

The question of size also goes to the efficiency of std::map and the like. They are usually implemented using a form of binary tree, specifically a red-black-tree if I recall correctly. But in order for them to be actually efficient you need lots of data (couple of hundred or thousand entries at least), otherwise the overhead of creating them isn't really worth it (they aren't much faster than simple arrays at small sizes). They do make for much better readable code though, and using them won't kill your performance compared to a manual approach. It would just most notably not be better (possibly slightly worse but nothing significant).

Now, what I'd do, if you can keep sorted copies (of all 3 arrays), is run a single loop, only incrementing the index of the array with the smallest value (or iterator, whichever you prefer). Basically like this:

Code: Select all

const_iterator inp = input.begin();
const_iterator giv1 = given1.begin();

while (inp != input.end()) {
	if (giv1 != given1.end() && (*inp) == (*giv1) ) {
		// same entry found
		DoWhateverItIsWhenYouFoundOne();
		++inp;
		++giv1;
	} else if (giv1 != given1.end() && (*inp) < (*giv1))  {
		// Current elemenit in input was not in given
		TakeElement(*inp);
		++inp;
	} else if (giv1 != given1.end() && (*inp) > (*giv1))  {
		// Current elemenit in given was not in input
		// do NOT take element, as given has to 'catch up' first, might still be in there
		++giv1;
	} else {
		// no more elements in given (giv1 == given1.end()), take all remaining from input
		TakeElement(*inp);
		++inp;
           }
}
You will of course need to adapt it to handle giv1 and giv2 simultaneously (possibly by just combining them into 1 array, easiest to handle)...
Yexo
Tycoon
Tycoon
Posts: 3663
Joined: 20 Dec 2007 12:49

Re: Need an algorithm :)

Post by Yexo »

Alberth wrote:The only problem I have with the above is whether map<> is acceptable in trunk.
std::map is already used in a few places.
User avatar
ffpp
Engineer
Engineer
Posts: 125
Joined: 29 Jan 2010 12:56

Re: Need an algorithm :)

Post by ffpp »

Okay, thanks all. Sorting the input is not an option in this case but I will refer to std::map for now because it makes the solution cleaner and easier to understand than to stitch something together.
SummerBulb
Engineer
Engineer
Posts: 24
Joined: 06 Feb 2010 22:12

Re: Need an algorithm :)

Post by SummerBulb »

do you know the maximum value of the given array?
Are you looking for time efficiency or memory efficiency too?
Post Reply

Return to “OpenTTD Development”

Who is online

Users browsing this forum: No registered users and 9 guests