PUI side should include an option for a sitemap.xml file

Description

The Sitemaps protocol allows a webmaster to inform search engines about URLs on a website that are available for crawling. A Sitemap is an XML file that lists the URLs for a site.

It would make sense for the PUI to generate this file as an option.

Activity

Show:
Mark Custer
January 21, 2020, 9:33 PM

sorry i hadn’t followed up previously. i took the plugin for a spin in late December with a copy of our database on my laptop, and things worked as expected as long as I excluded “archival objects” from the site maps. When I added those, I got “out of memory” errors… that said, I never allotted more than the 1 gig that’s provided by default (i’m pretty sure). I haven’t had a chance to test again, though, and I definitely won’t this week. I don’t recall seeing any glaring issues at all, though.

Joshua Shaw
January 21, 2020, 10:49 PM

That out of memory makes sense. If I have time, I’ll look into chunking up the database read which is what I suspect is causing the memory issue (right now its grabbing everything in one go) and I’ll add a note to the ReadMe.

I’m kinda surprised you can run a full index on your data set with just that standard 1 gig! I know I can’t run an index without at least 8 gig with our data (700k objects or so). I kinda assumed the indexer memory needs would be about on par with the sitemap generation.

Joshua Shaw
January 22, 2020, 3:39 PM
Edited

I’m trying to reproduce the out of memory error, but I’m not having any luck with my test data (about 550k AOs with about 110k published). When you get a chance, can you

  1. Give me an idea of how many objects you’ve got and how many of those are published?

  2. Try the sitemap plugin with increased memory

PS. What version are you testing against?

Thanks!

Joshua Shaw
February 26, 2020, 1:36 PM

Latest beta changes sitemap storage to standard data directory: https://github.com/dartmouth-dltg/aspace_sitemap/releases/tag/v1.0.0-beta-6

Joshua Shaw
March 24, 2020, 5:17 PM
Edited

Writing to the local filesystem is now default option. Updated and released v1.0.0 of the plugin https://github.com/dartmouth-dltg/aspace_sitemap/releases/tag/v1.0.0

If all goes well, I’ll start working on incorporating this into the core code within the next couple of weeks.

Assignee

Joshua Shaw

Reporter

Blake Carver

Labels

None

Priority

Major