Page 1 of 1

Need an algorithm :)

Posted: 27 Jan 2012 07:45
by ffpp
Hi,

does anybody have an efficient solution for this ?

Lets say 'Input' is an array of (positive) int of unknown length and unknown content, i.e. integers can be all over the place in Input, it is unsorted there may be duplicate elements.
Now I have two more arrays Given1, Given2 which also are of unknown length and contain ints (>=0).
I would like to find out efficiently which elements of Input are *not* to be found in either Given1 or Given2, with respect to uniqueness of elements.
Ex:
Input [ 9, 2, 3, 3, 3, 42, 1001]
Given1 [ 42, 3 ]
Given2 [ 3, 4, 5, 9 ]
then the output should be [2, 3, 1001].

I can think of several solutions but they all seem rather inefficient. So at least out of curiosity, can anybody think of anything good ? :D

Re: Need an algorithm :)

Posted: 27 Jan 2012 07:56
by Arie-
I haven't had any courses on algorithms (only general OO knowledge from some Java courses) so I probably cannot help you, but my guess is there's a lot of people on Stack Exchange who probably love to help you.

Edit: btw, my approach would be:
- to sort the first list
- then iterate over the other two list and remove each element from the first list if an element exists.

Re: Need an algorithm :)

Posted: 27 Jan 2012 08:06
by Eddi
yes, sorting would probably be my approach as well, hashtables might work also. if you want to write this down as elegant as possible, you'd probably want to use a "multiset", if your language/library offers that.

so: output = multiset(input) - multiset(given1) - multiset(given2)
where "-" is a setminus

Re: Need an algorithm :)

Posted: 27 Jan 2012 09:27
by ffpp
I was thinking of hashtables too.
One thing I forgot to mention thatall the arrays are immutable (or better say I'm not allowed to modify them).
Creating a temporary array that is sorted would be possible though.
The library in use is the openttd source btw. That is if I don't want to use anything external, additionally.

Re: Need an algorithm :)

Posted: 27 Jan 2012 10:20
by J0anJosep
The core/smallvec_type.hpp and core/sort_func.hpp can be useful. I think they provide enough tools for your algorithm.

Can the output be returned sorted?
Ex:
Input: [1,2,3,4,4,4,7,6]
Given1: [1,4]
Given2: [3]
Output: [2,4,4,7,6] or [2,4,4,6,7]?

If the output can be sorted, then there's an elegant solution using "int FindIndex(...)" of SmallVector type. Not fast, but at least simple to write.

Re: Need an algorithm :)

Posted: 27 Jan 2012 10:25
by FooBar
Doesn't C++ have a function to do that?
I know PHP has the function array_diff() that does "Returns an array containing all the entries from array1 that are not present in any of the other arrays."

Otherwise something else that could work:
- combine both Given arrays (not caring about duplicate values or sort order);
- foreach value in Input, check if value exists in Given;
- if not in Given, add value to new array Output;
- at the end Output will only contain the values of Input that are not in Given.

But then again maybe C++ has better ways to do this.

Re: Need an algorithm :)

Posted: 27 Jan 2012 11:44
by Alberth
The simplest solution I think is to make a std::map<int, int> (from number to count).
Add the Given* arras in it (add entry, or increment count)
Subtract Input from it (decrement count, unless it is 0).

The only problem I have with the above is whether map<> is acceptable in trunk. I would have to ponder about that one, and also have a closer look at what data structs exist in the course code atm.

Re: Need an algorithm :)

Posted: 27 Jan 2012 12:49
by Creat
To answer your question about efficiency, there's a bit of additional information needed: How large (or small) are the arrays usually? How often do you have to check which values in Given1/2 are not contained in Input? Can you keep (sorted) copies of the arrays? If so, can you keep them synchronized (because for example Input it only appended and nothing in the middle or beginning is removed)?

If the sizes of Given1/2 are k, l and size of Input is n, then trivially checking every element against the entire input array will take k*l*n comparisons. If k and l are typically quite small that isn't so bad (you can also check k and l simultaneously, then it becomes (k+l)*n). Sorting Input alone has a complexity of n*log(n), which can be significant for big n. If you constantly have to sort it, because you can't keep a sorted copy, that alone will likely take more cycles than searching with 'brute force'.

The question of size also goes to the efficiency of std::map and the like. They are usually implemented using a form of binary tree, specifically a red-black-tree if I recall correctly. But in order for them to be actually efficient you need lots of data (couple of hundred or thousand entries at least), otherwise the overhead of creating them isn't really worth it (they aren't much faster than simple arrays at small sizes). They do make for much better readable code though, and using them won't kill your performance compared to a manual approach. It would just most notably not be better (possibly slightly worse but nothing significant).

Now, what I'd do, if you can keep sorted copies (of all 3 arrays), is run a single loop, only incrementing the index of the array with the smallest value (or iterator, whichever you prefer). Basically like this:

Code: Select all

const_iterator inp = input.begin();
const_iterator giv1 = given1.begin();

while (inp != input.end()) {
	if (giv1 != given1.end() && (*inp) == (*giv1) ) {
		// same entry found
		DoWhateverItIsWhenYouFoundOne();
		++inp;
		++giv1;
	} else if (giv1 != given1.end() && (*inp) < (*giv1))  {
		// Current elemenit in input was not in given
		TakeElement(*inp);
		++inp;
	} else if (giv1 != given1.end() && (*inp) > (*giv1))  {
		// Current elemenit in given was not in input
		// do NOT take element, as given has to 'catch up' first, might still be in there
		++giv1;
	} else {
		// no more elements in given (giv1 == given1.end()), take all remaining from input
		TakeElement(*inp);
		++inp;
           }
}
You will of course need to adapt it to handle giv1 and giv2 simultaneously (possibly by just combining them into 1 array, easiest to handle)...

Re: Need an algorithm :)

Posted: 27 Jan 2012 18:09
by Yexo
Alberth wrote:The only problem I have with the above is whether map<> is acceptable in trunk.
std::map is already used in a few places.

Re: Need an algorithm :)

Posted: 27 Jan 2012 20:59
by ffpp
Okay, thanks all. Sorting the input is not an option in this case but I will refer to std::map for now because it makes the solution cleaner and easier to understand than to stitch something together.

Re: Need an algorithm :)

Posted: 30 Jan 2012 14:29
by SummerBulb
do you know the maximum value of the given array?
Are you looking for time efficiency or memory efficiency too?