c7bb1608a1
Progress towards sensible-enough numpy based querying using shared memory views rather than duplicated data.
sam2026-05-26 17:32:56 +01:00
b9ffbdee89
Created mca_stubs.py which provides record entry details (and slice objects) for the .MCA types.
sam2026-05-26 11:06:28 +01:00
e887cc791e
Removal of sqlite/sqlalchemy based approach - it is too slow to combine the results, even with in-memory database loading.
sam2026-05-26 10:17:28 +01:00
0479f1e4a8
Improved a few things, querying for multiple services now runs at a tolerable speed. Would prefer if it could be improved further, will look at pre-merging tables using sql rather than pandas.
sam2026-05-25 21:21:53 +01:00
e723109a0a
Begun creating some utility functions and noticed some limitations. Fetching one schedule at a time is too slow, and we could easily split an aggregated result.
sam2026-05-25 17:46:13 +01:00
36aa23f464
Various minor updates, basic Schedule class. Added a SixData class to manage conversions of YYMMDD to/from more pythonic objects.
sam2026-05-25 14:06:35 +01:00
c2633952d3
Added mca_queries.py and it's pre-generated result mca_record_types.py. The latter is for type hinting and will make writing queries to solve for schedule numbers much easier. Next will be to write tools to make hunting for desired schedules easier.
sam2026-05-25 13:26:11 +01:00
51c4f5030c
Updated the raw_mca_... table generation to include line number from the file, and schedule number - although we may need to investigate how the last entry behaves with 'ZZ' records and any others. We don't want to inherit the technical debt of remembering this one case every time.
sam2026-05-23 10:38:03 +01:00
f35cda6f10
Finished parsing.py initial implementation, now have a sqlite database generating >600MB of timetable records. Next will be generating sqlalchemy desriptors based on the automated specifications. If I can re-learn sqlalchemy that is.
sam2026-05-22 16:57:14 +01:00
d63f151c9b
Added sqlite export of .MCA file's record spec. This won't live in /data, but in a user's cache. This is to allow user choice on how and when to update the timetable files and reduce redundancy.
sam2026-05-22 11:59:45 +01:00
fc09eb775e
Parsing of RSP's specification now cached to /data. This means that we can ship the tables with the project, rather than the .pdf being a requirement of use.
sam2026-05-22 10:37:42 +01:00
14b17a22d7
Used pypdf to create extract_specification_document_tables in parsing.py. Should allow easier indexing of the various file types in future. Will need to adapt for files other than .MCA and look at formalising into a local database.
sam2026-05-22 01:11:24 +01:00
51c6af9782
Tracked nr_requests.py and added fetch_nr_timetable_files.
sam2026-05-21 20:23:43 +01:00
f454af8ab4
Added NRConfig and fetch_nr_token in nr_requests.py.
sam2026-05-21 18:52:24 +01:00