Saving tileset id on every tile

Not sure if it’s been discussed before but, Tilesets should have ids (integer) too, just like it’s name, which currently can’t be null/empty. It could be as simple as its insertion index (and thus persisted in the tmx file). This would allow each tile to be linked against its tileset (which is [sort of] already linked through the gid value). By saving the tileset identifier too, the parsing process would be SO MUCH faster, in an O(1) speed. Currently, many parsers out there use O(n) to locate the tileset and the least being O(log n), when using binary search. Every millisecond DOES matter, mainly on big maps (even though there are alternatives for this, they can be very complex). Please kindly consider this.

I do not understand this. How would I be able to read it in O(1) speed? :open_mouth:
The larger the map the longer it takes to load, right?

O(1) would mean constant time no matter how many tiles / tilesets you have. I guess you do not mean the whole parsing, but only the tile id → tileset id mapping?

If you are into performance that much you will gain a larger speedup by using a custom (binary?) format. Overloading the TMX format for allowing some questionable optimisations at load time does not sound like a good idea to me.

Also if your maps so big, that this tileset lookup time starts to matter you should maybe look at splitting the map into smaller parts, load the map in a seperate thread or maybe a combination of both. But I fear you are overengineering here.

Regards,
Ablu

Hello, Ablu. Thanks for your time.
Tile-SET is a data structure (set) that contains a range of tile ids. Example:

[tileset first=‘1’ count=‘248’…]

The range of tiles of a tileset is (first, first + count - 1), being (1, 248) the example above (thanks to the new versions, we now don’t need to calculate the count value). Currently, tiles are ‘linked’ to tilesets through the ‘gid’, that is the tileset grid index. Mostly parsers out there simply loop over the tileset array and check if the tile id is in that tileset range (first <= gid <= last). This implies O(n) speed, where n is the number of tilesets, EVERY tile iteration. This is a very costly task. The fastest way to search the owner of a tile is by maintaining a LOOKUP array of tileset ranges and doing a binary search on it, example:
a[0] = 1
a[1] = 248
And so on (there are a few examples out there). What I’m asking for is to simply save the tileset id into the tile. This would be something as follows:

[tileset id=‘1’ first=‘1’ count=‘248’…]
[tileset id=‘2’ first=‘249’ count=‘780’…]
[tile gid=‘45’ tileset=‘1’]
[tile gid=‘578’ tileset=‘2’]
[tile gid=‘248’ tileset=‘1’]

The example above means that you no longer need to use the old O(n) loop to find which tileset the tile with gid 45 belongs to and that’s my point. If data is encoded/compressed, the tileset id could be embed just like the flip flags. To achieve the so wished O(1) speed (constant), we could do this:

// assume tileset id = tileset insertion order (index)
tile.tileSet = tileSets[tile.tileSetId - 1];

Done. We just saved A LOT of time. Simple as that. It’s just an extra information and it would not crash old applications (that use old tmx versions).
If it was to use a custom map data file, I wouldn’t be here requesting such a minimal task.

PS: I wrote this whole post on my phone and couldn’t use HTML tags (instead, I replaced <> by [])

Actually, I think the fastest would be to allocate a single tiles array for all your tilesets, indexed by gid. That way you don’t even need to care which tileset any tile is from, you just do a single array index after you have cleared the flipping flags from the gid.

Since your example is using XML elements and your suggestion would only work for the XML layer data format, I first of all have to point out that this format is so storage and processing intensive that it’s likely a much more significant factor in your loading time than the lookup of the matching tileset. Rather than worrying about that lookup, you should consider switching to zlib-compressed layer data first. In fact, in the next new feature release of Tiled the XML layer data format will be deprecated (it’s already done on the master branch), since there’s always some users who fall into this trap and end up with huge slow-loading TMX files.

In general I wonder if you actually measured/profiled the map loading and indeed found this lookup to be the slow part. You need to do measurements when optimizing or you’ll be very likely to waste your time.

Hey, Thorbjørn! Thanks for your time and all your efforts!

Actually, I think the fastest would be to allocate a single tiles array for all your tilesets, indexed by gid. That way you don’t even need to care which tileset any tile is from, you just do a single array index after you have cleared the flipping flags from the gid.

Well, honestly, this is another way to make a lookup table of tilesets. This is indeed a good choice, however, it is only good when you have few different tiles which means you wouldnt do too much allocations. The worst case is when you have lots of big tilesets on a small map (who knows). I can tell that both aproaches are ‘tied’ in time, so it’s an implementation decision (being your example the easiest). Yes, I profiled them both.

Since your example is using XML elements and your suggestion would only work for the XML layer data format, I first of all have to point out that this format is so storage and processing intensive that it’s likely a much more significant factor in your loading time than the lookup of the matching tileset. Rather than worrying about that lookup, you should consider switching to zlib-compressed layer data first.

Well, yeah, you’re right. My example would only work for XML decoded/encoded/compressed layers. I did not care about CSV when I first thought about it since I never even used it, but it doesn’t mean we couldnt think how to do that for the CSV type (from the top of my head right now, I thought about embedding into each value just like we would do for the encoded layers, like the flip flags). Having an extra attribute to be parsed on each tile, would indeed cause a kind of overhead if we have an decoded layer because it is the object with the highest number of repetitions in the whole file, but mostly parsers out there are really super fast and I dont really think this would be a problem for them. If it is an encoded layer, this would be much faster because theres no more string/parsing/validating involved but bit twiddling techniques. Allocating variables to hold that new attribute does influences the time, but its basically 0.

I finally got my first example using ZLIB to work yesterday and I haven’t profiled enough to talk about it, but I do believe it wouldnt be as fast, because thats just another algorithm to be executed (if not compressed, base64 decode and youre ready to go. Otherwise, base64 decode then zlib decompress and youre ready to go). Still, even if decompressing is faster, that doesnt mean anything about fetching the tilesets. It’s a sepparated operation.

In fact, in the next new feature release of Tiled the XML layer data format will be deprecated (it’s already done on the master branch), since there’s always some users who fall into this trap and end up with huge slow-loading TMX files.

Are there any quick links for the new documentation or do I have to find it myself browsing through the source code?

Last, but not least, why do images of image layers dont have their width and height, unlike tileset’s images? Is there any horrid thing behind it or could you consider adding them as another request of mine?

As a final note, I still highly hope we could test if it does optimizes the parsing/loading speed of a map and implement it in the future versions.

Ok, I had not understood from your example that you also meant it to apply to the binary layer data format. Unfortunately, changing this format while keeping compatibility (32-bit global tile IDs) will introduce some arbitrary lower limits on the number of possible tilesets and the number of possible tiles within a tileset. Possibly the following may be reasonable:

3 flipping flags
10 bits for tileset index (allowing up to 1024 tilesets)
19 bits for tile index (up half a million tiles per tileset)

Though for whatever reason any subdivision here seems arbitrary and uncomfortable to me. But compatibility could be kept by using the “firstgid” property to achieve these subdivisions. Though it would also lead to many huge numbers in the CSV format…

zlib decompress is pretty fast, and maybe you’re forgetting that compressed data means less data needs to be read from disk, less work needs to be done by the XML parser, and there will be much less data to base64-decode. The difference in loading time is probably small, but there can be huge difference in file size.

Dropping the XML layer data options will probably only mean dropping that from the TMX map format page (or marking it as deprecated there). No new documentation needs to be written about it.

Resizable images are already supported as tile objects. Maybe the image layer will support it as well eventually, but it’s not a priority for me. Essentially the image layer could just be removed since it offers no real benefit over using tile objects.

Please feel free to test it. I personally think if you’re seeing that loading time is becoming problematic, there’s probably better ways of making it faster, like changing to another map format altogether (maybe a custom binary one that can just be memory-mapped) or cutting you map into smaller pieces (because at this point your map must be huge).