vignettes/konigsbergr.Rmd
konigsbergr.Rmd
In 1736, mathematician Leonard Euler proved it was impossible to walk through the German city of Königsberg crossing each of the city’s seven bridges exactly once. His work, famously dubbed the “Bridges of Königsberg” problem, laid the foundation for graph theory and network analysis, and foreshadowed the invention of topology.
kongisbergr
is a playful experiment that allows you to find pathways across the bridges in your own city or town, relying on the data from OpenStreetMap.
First, we need to get map data from OSM. kongisbergr
takes data extracts produced by the osmar package, which provides several options for calling out to APIs or loading files from disk.
An easy way to get an OSM data extract is to go to https://openstreetmap.org, navigate to the region whose data you want to download, and then use the “Export” tab to select a bounding box. Click the “Overpass API” link to begin downloading a file.
knitr::include_graphics("osm_export.jpg")
You may then read the file into R:
city_osm <- get_osm_file("path/to/my/file")
Konigsbergr also provides a wrapper function get_osm_bbox()
that lets you download small-sized extracts of data from OSM by specifying minimum and maximum latitudes and longitudes. If you are trying to do an entire city, however, you may want to download the OSM XML as a separate file so that you can persist it to disk. Then, you can read it in manually using osmar’s utility functions.
boston <- get_osm_bbox(xmin = -71.1383,
xmax = -71.1132,
ymin = 42.3592,
ymax = 42.3754)
We need to convert the OSM data into a network object and attach select attributes to that network from the original attributes and tags.
The top-level function konigsberg_graph()
takes an osmar
object like boston
and returns a tidygraph object. There are two options for modifying the way that this network is formed.
automobile_highways
versus pedestrian_highways
will alter the number and kinds of edges in the resulting graph.all_bridges
, will mark all of these as bridges that the pathway must cross. However, you may wish to exclude those minor bridges, in which case you may pass main_bridges
instead.boston_graph <- konigsberg_graph(boston, path_filter = automobile_highways, bridge_filter = main_bridges)
#> Creating base graph...complete!
#> Adding OSM attributes...complete!
#> Filtering graph to desired paths and bridges...complete!
boston_graph
#> # A tbl_graph: 2605 nodes and 3579 edges
#> #
#> # A directed simple graph with 1 component
#> #
#> # Edge Data: 3,579 x 15 (active)
#> from to id access bicycle bridge foot highway label oneway
#> <int> <int> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 2 8.61e6 <NA> <NA> <NA> <NA> reside… Sout… yes
#> 2 3 4 8.61e6 <NA> <NA> <NA> <NA> reside… Flag… <NA>
#> 3 4 5 8.61e6 <NA> <NA> <NA> <NA> reside… Flag… <NA>
#> 4 5 6 8.61e6 <NA> <NA> <NA> <NA> reside… Flag… <NA>
#> 5 6 7 8.61e6 <NA> <NA> <NA> <NA> reside… Flag… <NA>
#> 6 7 8 8.61e6 <NA> <NA> <NA> <NA> reside… Flag… <NA>
#> # … with 3,573 more rows, and 5 more variables: bridge_relation <dbl>,
#> # relation_label <fct>, is_bridge <lgl>, bridge_id <dbl>, distance <dbl>
#> #
#> # Node Data: 2,605 x 4
#> id lat lon label
#> <dbl> <dbl> <dbl> <chr>
#> 1 61325553 42.4 -71.1 <NA>
#> 2 61323944 42.4 -71.1 <NA>
#> 3 61328597 42.4 -71.1 <NA>
#> # … with 2,602 more rows
Pathfinding is accomplished using the pathfinder package, which is designed to work on graphs of any kind. konigsbergr provides the cross_all_bridges()
function, which will start from the first node in the network or from an OSM node ID that you specify.
boston_path <- cross_all_bridges(boston_graph)
#> Pre-calculating distance matrices...full matrix...inter-bundle matrix...intra-bundle matrix...done.
#> Running greedy search...done.
#> Extrapolating full paths...
#> done.
#> Completed a path with 15 steps crossing 315 edges and 9 bundles.
view_konigsberg_path(boston_graph, boston_path)
Although relatively fast, the greedy_search()
function implemented by pathfinder is not particularly creative. Depending on how you have defined the limits of your map data, you may find that the path needs to cross back over a bridge, or gets trapped (e.g. behind a one-way street) and cannot complete the entire circuit. Try specifying alternate start points for the pathway to see if you can find a more efficient route:
# Pass an OSM node id to cross_all_bridges. For example, this node is at https://openstreetmap.org/node/2688967592
boston_path_2 <- cross_all_bridges(boston_graph, starting_node = 2688967592)
#> Pre-calculating distance matrices...full matrix...inter-bundle matrix...intra-bundle matrix...done.
#> Running greedy search...done.
#> Extrapolating full paths...
#> done.
#> Completed a path with 15 steps crossing 248 edges and 9 bundles.
view_konigsberg_path(boston_graph, boston_path_2)
Very large OSM exports can cause memory errors when using the original osmar package. The bigosm package can read in much larger exports (e.g. New York City is about 2 GB of XML!) and also proactively filter the R representation down to just roads and bridges. konigsbergr will automatically use this package if you call get_osm_bbox()
. Even with this filtering, though, be aware that trying to do a graph construction and traversal for a large city can still take tens of minutes for graphs of NYC-scale.
In addition to filtering and marking bridges for crossing, konigsberg_graph()
also wraps the core function that converts OSM data into a graph object. This is an expensive operation when your data are very large, and so Konigsbergr also exposes the underlying function create_base_konigsberg_graph()
. This will perform the initial conversion before applying the path_filter()
and bridge_filter()
functions, and you may wish to save that intermediate object so that it is easier to experiment with different filter combinations later on.
base_boston_graph <- base_konigsberg_graph(boston)
#> Creating base graph...complete!
#> Adding OSM attributes...complete!
base_boston_graph
#> # A tbl_graph: 9108 nodes and 11422 edges
#> #
#> # A directed multigraph with 1 component
#> #
#> # Node Data: 9,108 x 4 (active)
#> id lat lon label
#> <dbl> <dbl> <dbl> <chr>
#> 1 61325553 42.4 -71.1 <NA>
#> 2 61323944 42.4 -71.1 <NA>
#> 3 61328597 42.4 -71.1 <NA>
#> 4 6298024767 42.4 -71.1 <NA>
#> 5 379598392 42.4 -71.1 <NA>
#> 6 315559814 42.4 -71.1 <NA>
#> # … with 9,102 more rows
#> #
#> # Edge Data: 11,422 x 12
#> from to id access bicycle bridge foot highway label oneway
#> <int> <int> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 2 8.61e6 <NA> <NA> <NA> <NA> reside… Sout… yes
#> 2 3 4 8.61e6 <NA> <NA> <NA> <NA> reside… Flag… <NA>
#> 3 4 5 8.61e6 <NA> <NA> <NA> <NA> reside… Flag… <NA>
#> # … with 1.142e+04 more rows, and 2 more variables: bridge_relation <dbl>,
#> # relation_label <fct>
When trying to run this over a large geographic extent, I recommend using an orchestration library like drake that will save intermediate objects to disk.