*[Caml-list] Choosing a random element in a Map or a Set@ 2021-03-18 9:44 jean-marc.alliot2021-03-18 10:06 ` François Pottier ` (3 more replies) 0 siblings, 4 replies; 9+ messages in thread From: jean-marc.alliot @ 2021-03-18 9:44 UTC (permalink / raw) To: caml-list Dear list, I am trying to take a random element from a Map or a Set. Currently, I generate one random int between 1 et Card(map), and I iter until I reach this element. The solution is in O(n) which is not great... I was about to code the following solution: take the current Map code and add a function which follows the balanced binary tree from the root and takes randomly a left or right turn at each node until we reach a leaf (we only need to generate one random int and use its binary representation to choose the left or right direction at each node). This should be in O(log(n)) Before I start coding like an idiot: 1) Is there another, more intelligent, solution? 2) Is my solution biased? I think it is not, as long as the tree is properly balanced but... Thanks Jean-Marc ^ permalink raw reply [flat|nested] 9+ messages in thread

*Re: [Caml-list] Choosing a random element in a Map or a Set2021-03-18 9:44 [Caml-list] Choosing a random element in a Map or a Set jean-marc.alliot@ 2021-03-18 10:06 ` François Pottier2021-03-18 14:20 ` Simon Cruanes 2021-03-18 11:10 ` Nicolas Barnier ` (2 subsequent siblings) 3 siblings, 1 reply; 9+ messages in thread From: François Pottier @ 2021-03-18 10:06 UTC (permalink / raw) To: jean-marc.alliot, caml-list Le 18/03/2021 à 10:44, jean-marc.alliot@irit.fr a écrit : > 1) Is there another, more intelligent, solution? > 2) Is my solution biased? I think it is not, as long as the tree is > properly balanced but... The tree is only approximately balanced, so this approach should be OK if you are willing to accept some bias, but it is not perfectly uniform. If it is important to obtain a uniform distribution, then I would suggest: 1- picking a random index in the interval [0, n); and 2- using a data structure that allows efficient indexing. Finger trees come to mind (see the papers by Hinze and Paterson and by Sozeau), but there are probably others. -- François Pottier francois.pottier@inria.fr http://cambium.inria.fr/~fpottier/ ^ permalink raw reply [flat|nested] 9+ messages in thread

*Re: [Caml-list] Choosing a random element in a Map or a Set2021-03-18 9:44 [Caml-list] Choosing a random element in a Map or a Set jean-marc.alliot 2021-03-18 10:06 ` François Pottier@ 2021-03-18 11:10 ` Nicolas Barnier2021-03-18 11:32 ` François Pottier 2021-03-19 1:32 ` Francois Berenger 2021-03-18 12:34 ` Ivan Gotovchits 2021-03-18 12:48 ` Ivan Gotovchits 3 siblings, 2 replies; 9+ messages in thread From: Nicolas Barnier @ 2021-03-18 11:10 UTC (permalink / raw) To: caml-list Hi Jean-Marc, Le 18/03/2021 à 10:44, jean-marc.alliot@irit.fr a écrit : > I am trying to take a random element from a Map or a Set. > > Currently, I generate one random int between 1 et Card(map), and I iter > until I reach this element. The solution is in O(n) which is not great... > > I was about to code the following solution: take the current Map code > and add a function which follows the balanced binary tree from the root > and takes randomly a left or right turn at each node until we reach a > leaf (we only need to generate one random int and use its binary > representation to choose the left or right direction at each node). This > should be in O(log(n)) > > Before I start coding like an idiot: > 1) Is there another, more intelligent, solution? If using a Map or a Set is not compulsory, you can design a data structure supporting insert, delete, search and random element in constant amortized time with an extensible array and a hash-table as explained here: https://www.geeksforgeeks.org/design-a-data-structure-that-supports-insert-delete-search-and-getrandom-in-constant-time/ I don't know if there are any good implementation of (Python-like) extensible arrays in a widespread OCaml library, but François Pottier published one as part of another superb data structure (union-find): https://gitlab.inria.fr/fpottier/unionfind/-/blob/master/src/StoreVector.ml You may want to adjust the ratio (2 here) by which the array is resized when necessary. Hope this helps, -- Nicolas ^ permalink raw reply [flat|nested] 9+ messages in thread

*Re: [Caml-list] Choosing a random element in a Map or a Set2021-03-18 11:10 ` Nicolas Barnier@ 2021-03-18 11:32 ` François Pottier2021-03-19 1:32 ` Francois Berenger 1 sibling, 0 replies; 9+ messages in thread From: François Pottier @ 2021-03-18 11:32 UTC (permalink / raw) To: Nicolas Barnier, caml-list Le 18/03/2021 à 12:10, Nicolas Barnier a écrit : > I don't know if there are any good implementation of (Python-like) > extensible arrays in a widespread OCaml library, Jean-Christophe Filliâtre has proposed a Vector library for a long time (https://www.lri.fr/~filliatr/ftp/ocaml/misc/vector.ml.html), and there is also a CCVector module in the excellent Containers library (https://c-cube.github.io/ocaml-containers/3.2/containers/CCVector/index.html). -- François Pottier francois.pottier@inria.fr http://cambium.inria.fr/~fpottier/ ^ permalink raw reply [flat|nested] 9+ messages in thread

*Re: [Caml-list] Choosing a random element in a Map or a Set2021-03-18 9:44 [Caml-list] Choosing a random element in a Map or a Set jean-marc.alliot 2021-03-18 10:06 ` François Pottier 2021-03-18 11:10 ` Nicolas Barnier@ 2021-03-18 12:34 ` Ivan Gotovchits2021-03-18 12:48 ` Ivan Gotovchits 3 siblings, 0 replies; 9+ messages in thread From: Ivan Gotovchits @ 2021-03-18 12:34 UTC (permalink / raw) To: jean-marc.alliot;+Cc:caml-list [-- Attachment #1: Type: text/plain, Size: 2455 bytes --] On Thu, Mar 18, 2021 at 5:41 AM <jean-marc.alliot@irit.fr> wrote: > Dear list, > > I am trying to take a random element from a Map or a Set. > > Currently, I generate one random int between 1 et Card(map), and I iter > until I reach this element. The solution is in O(n) which is not great... > > I was about to code the following solution: take the current Map code > and add a function which follows the balanced binary tree from the root > and takes randomly a left or right turn at each node until we reach a > leaf (we only need to generate one random int and use its binary > representation to choose the left or right direction at each node). This > should be in O(log(n)) > > Before I start coding like an idiot: > 1) Is there another, more intelligent, solution? > I would create an array of keys and then randomly choose a key from that array. If you want to be on the functional side, you can shuffle the array and turn it into a list. This solution involves some duplication but is easy to implement and is especially useful if you do not what repetitions in your selection. An alternative solution that doesn't suffer from an extra memory overhead would be writing a function that generates a random key (instead of a random integer). It is not always possible, especially when the domain of keys is infinite. However, if your set/map is total (i.e., maps all keys in the domain to values) then it is the perfect solution. Finally, as an amalgamation of the above approaches, you can hash-cons your keys (essentially turning keys into integers). It usually involves an extra data structure to maintain your index, but gives you the additional benefits of hash-consing, e.g., faster and tighter maps and sets. > 2) Is my solution biased? I think it is not, as long as the tree is > properly balanced but... > OCaml trees are relaxed AVL trees with the maximum difference between trees height equal to 2 (not 1 as in classical AVL). For small trees, it is quite a substantial difference, e.g., imagine a tree of 9 elements with the left tree having a height of 3 and the right of height 1. The right tree will have only one element and the left will have 7 elements. Therefore, the probability of selecting the right element will be 1/2 vs 1/9. Even for larger trees, it remains the problem as a large tree is made of small subtrees :) Therefore your probability distribution will be far from uniform. > > Thanks > > Jean-Marc > > > [-- Attachment #2: Type: text/html, Size: 3348 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread

*2021-03-18 9:44 [Caml-list] Choosing a random element in a Map or a Set jean-marc.alliot ` (2 preceding siblings ...) 2021-03-18 12:34 ` Ivan GotovchitsRe: [Caml-list] Choosing a random element in a Map or a Set@ 2021-03-18 12:48 ` Ivan Gotovchits3 siblings, 0 replies; 9+ messages in thread From: Ivan Gotovchits @ 2021-03-18 12:48 UTC (permalink / raw) To: jean-marc.alliot;+Cc:caml-list [-- Attachment #1: Type: text/plain, Size: 1014 bytes --] On Thu, Mar 18, 2021 at 5:41 AM <jean-marc.alliot@irit.fr> wrote: > Dear list, > > I am trying to take a random element from a Map or a Set. > > Currently, I generate one random int between 1 et Card(map), and I iter > until I reach this element. The solution is in O(n) which is not great... > > I was about to code the following solution: take the current Map code > As a side note, in case if you decide to pursue this idea. You don't really need to take the code as you can implement your solution using the public interface. For that, you need only these two public functions: `find_first` and `split`. The `find_first` function will help us to extract the top element (`let top t = fst@@find_first (fun _ -> true) t`), and `let destruct t = split (top t) t` will return you the left and the right subtrees, and the top element. The `destruct` function will not allocate or rebalance any trees and conceptually will just return pointers to the corresponding subtrees. So it is O(1) both in space and time. [-- Attachment #2: Type: text/html, Size: 1420 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread

*Re: [Caml-list] Choosing a random element in a Map or a Set2021-03-18 10:06 ` François Pottier@ 2021-03-18 14:20 ` Simon Cruanes2021-03-18 17:06 ` jean-marc.alliot 0 siblings, 1 reply; 9+ messages in thread From: Simon Cruanes @ 2021-03-18 14:20 UTC (permalink / raw) To: François Pottier;+Cc:jean-marc.alliot, caml-list [-- Attachment #1: Type: text/plain, Size: 614 bytes --] There are some balanced trees with a weight attached to each node. I have an implementation in containers-data [signature](https://c-cube.github.io/ocaml-containers/3.2/containers-data/CCWBTree/module-type-S/index.html) . There is a random_choose function that does the traversal and exploits the weight to pick left/right for each node. I implemented this years ago but the comments say: Most of this comes from "implementing sets efficiently in a functional language", Stephen Adams. The coefficients 5/2, 3/2 for balancing come from "balancing weight-balanced trees" -- Simon Cruanes [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread

*Re: [Caml-list] Choosing a random element in a Map or a Set2021-03-18 14:20 ` Simon Cruanes@ 2021-03-18 17:06 ` jean-marc.alliot0 siblings, 0 replies; 9+ messages in thread From: jean-marc.alliot @ 2021-03-18 17:06 UTC (permalink / raw) To: Simon Cruanes, caml-list Thanks Simon. This works beautifully and without having to write a single additional line of code. A few modifications were enough. That's the advantage of getting old: 30 years ago I would have jumped on my keyboard, now I realize that somebody smarter than me has probably already solved the problem, and I just ask and wait... 😁 Also thanks to all the other people who have answered my request. It's a pleasure, and an honor, to be still part of this community. Le 18/03/2021 à 15:20, Simon Cruanes a écrit : > There are some balanced trees with a weight attached to each node. > I have an implementation in containers-data > [signature](https://c-cube.github.io/ocaml-containers/3.2/containers-data/CCWBTree/module-type-S/index.html) . > > There is a random_choose function that does the traversal and exploits > the weight to pick left/right for each node. > > I implemented this years ago but the comments say: > > Most of this comes from "implementing sets efficiently in a functional language", Stephen Adams. > The coefficients 5/2, 3/2 for balancing come from "balancing weight-balanced trees" > ^ permalink raw reply [flat|nested] 9+ messages in thread

*Re: [Caml-list] Choosing a random element in a Map or a Set2021-03-18 11:10 ` Nicolas Barnier 2021-03-18 11:32 ` François Pottier@ 2021-03-19 1:32 ` Francois Berenger1 sibling, 0 replies; 9+ messages in thread From: Francois Berenger @ 2021-03-19 1:32 UTC (permalink / raw) To: caml-list On 18/03/2021 20:10, Nicolas Barnier wrote: > Hi Jean-Marc, > > Le 18/03/2021 à 10:44, jean-marc.alliot@irit.fr a écrit : >> I am trying to take a random element from a Map or a Set. >> >> Currently, I generate one random int between 1 et Card(map), and I >> iter >> until I reach this element. The solution is in O(n) which is not >> great... >> >> I was about to code the following solution: take the current Map code >> and add a function which follows the balanced binary tree from the >> root >> and takes randomly a left or right turn at each node until we reach a >> leaf (we only need to generate one random int and use its binary >> representation to choose the left or right direction at each node). >> This >> should be in O(log(n)) >> >> Before I start coding like an idiot: >> 1) Is there another, more intelligent, solution? > > If using a Map or a Set is not compulsory, you can design a data > structure > supporting insert, delete, search and random element in constant > amortized > time with an extensible array and a hash-table as explained here: Just FTR, a skip list would allow all the set operations, plus an efficient random access operation and element rank. Pugh, William. (1990). A skip list cookbook. I am not aware of an OCaml implementation with all the interesting operations though. > https://www.geeksforgeeks.org/design-a-data-structure-that-supports-insert-delete-search-and-getrandom-in-constant-time/ > > I don't know if there are any good implementation of (Python-like) > extensible arrays in a widespread OCaml library, but François Pottier > published one as part of another superb data structure (union-find): > > https://gitlab.inria.fr/fpottier/unionfind/-/blob/master/src/StoreVector.ml > > You may want to adjust the ratio (2 here) by which the array is resized > when > necessary. > > Hope this helps, > > -- Nicolas ^ permalink raw reply [flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-03-19 1:32 UTC | newest]Thread overview:9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-03-18 9:44 [Caml-list] Choosing a random element in a Map or a Set jean-marc.alliot 2021-03-18 10:06 ` François Pottier 2021-03-18 14:20 ` Simon Cruanes 2021-03-18 17:06 ` jean-marc.alliot 2021-03-18 11:10 ` Nicolas Barnier 2021-03-18 11:32 ` François Pottier 2021-03-19 1:32 ` Francois Berenger 2021-03-18 12:34 ` Ivan Gotovchits 2021-03-18 12:48 ` Ivan Gotovchits

caml-list - the Caml user's mailing list This inbox may be cloned and mirrored by anyone: git clone --mirror https://inbox.vuxu.org/caml-list git clone --mirror https://inbox.ocaml.org/caml-list # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V1 caml-list caml-list/ https://inbox.vuxu.org/caml-list \ caml-list@inria.fr public-inbox-index caml-list Example config snippet for mirrors. Newsgroup available over NNTP: nntp://inbox.vuxu.org/vuxu.archive.caml-list AGPL code for this site: git clone https://public-inbox.org/public-inbox.git