TODO ---- Contact Vector about contract ending 15 Aug, not 11th [ done ] UDW automation (see below) Meeting with Freddie next Wed at 11am [ done ] Rosie wants to know at end of July what days in July I will be invoicing for (for their financial quarter end) [ done ] At end of July, generate UDW reports for July and FTP them. From 1 August, run UDW daily/weekly (automatically!). Generate report for Conni. [ done ] Check that May and June UDW reports generated OK, post-process and FTP them [ done ] QA of July Usage Reporting stats Investigate why huge jump in Van Nostrand PDF downloads [ done - On 5 July IP 130.220.79.98 (zim.city.unisa.edu.au) did 2860 downloads! ] Deal with Aventis/Sanoi "IP check" query - from Conni [ done ] Deal with query about Virginia Tech AS tokens - from Eileen Koup [ Customer EAL0000165 OID 63000550 IP 128.173.127.127 ] [ done ] Deal with query about AstraZeneca AS tokens - from Christian de Pay [ EAL0000061 OID 54500144 IP 63.100.108.21 ? [ "double-click" bug ] ----------------------------------------------------------------------------- Zhiming's contact details: (020) 8133 8188 (Skype, so gets him anywhere in world) zedmcchen@hotmail.com DW meetings: 001 913 312 9357 access code 009582 ----------------------------------------------------------------------------- UDW automation TODO: ------------------- Should send data weekly on Thu for data thro Wed So DB extracts that are simply FTPed should only be done weekly Re-fetch & pre-process weblogs for July. [done] runday.sh should become master control script for usage reporting and UDW. It should do: Fetch weblogs (as at present) Pre-process weblogs (as at present, but with modified filtering that specifies what to exclude instead of what to include) [DONE - modified filtering in runday.sh] Do DB dumps for usage reporting (as at present) Do DB dumps for udwreport program Run usagereport (as at present) Run udwreport Do DB dumps for UDW FTP (weekly) Concatenate UDW output files for week & re-generated .h files (weekly) FTP files generated by udwreport (weekly) FTP files dumped from DB (weekly) Anything that is specific to Usage Reporting or UDW should be pushed down into called scripts. DB dumping for UDW can be done by Zhiming's dumpExtracts.sh script. Modify runday.sh to call it when doing other DB dumping. [ done - not tested ] Running of udwreport can be done by Zhiming's runUDWPrep.sh script (though would benefit from some tidying up). Run it, as background task, after getting weblosg and doing DB dumps. [ done (in runday.sh) - not tested ] FTPing can be done using ncftpput. ncftpput requires a file containing host, user and password. [created as Install/config/ftp.config] FTPing of files dumped from DB is probably best done in dumpExtracts.sh, since all the config info is known there. There is already FTP code in there, but commented out & non-functional - replace by ncftpput. [done - not tested] FTPing of udwreport files is probably best done from runUDWPrep.sh, for similar reasons. [done - not tested] Need to put error-checking in all scripts. Check for any changes that Zhiming may have made to scripts in /ReportingThird/stats/UDW/Install/script. Check for any changes that Zhiming may have made to udwreport source. Incorporate /ReportingThird/stats/UDW/output/summary/extract_summary.sh into post-processing of runUDWPrep.sh. [ done - not tested ] runUDWPrep.sh should FTP summary files. [ done - not tested ] Check naming of files generated by udwreport - what date is it appending? Ensure that runUDWPrep.sh uses same names. [ Uses date that udwreport is run! See ExtractBase constructor ] Modify udwreport ExtractBase constructor to use date from program arguments in filenames instead of today's date. Modify runUDWPrep.sh to do same. Modify udwreport to get start & end days for weblogs from program arguments instead of config file. Modify runUDWPrep.sh accordingly. Re-organize directory structure - see notes below. [ done ] Modify source/Makefile to put udwreport in right place. [ done ] Create Makefile with install target. [ done ] convdate should be found from somewhere more sensible (shouldn't have to manually copy to UDW directory). [ done - put copy in ~/bin ] Output directory should have year & month sub-directories (?). Modify udwreport and runUDWPrep.sh to do this. Filenames should still include date, since they are FTPed to single directory. Put scripts & config files into CVS. [ done - but not committed ] dumpExtracts.sh should use yesterday's date for filenames - it's currently using today's date. [ done - not tested ] ----------------------------------------------------------------------------- Zhiming's notes --------------- ftp to ftp server machine ftp.wiley.com login wis_bmis password jws&cbp$ [ can also ssh: user wisbmmod password jws&vba$ ] working dir: development/udw/extract C program creates 5 types of files: Access_Fact, Customer_Contract, Session, Usage_Fact and User DB UDWExtract generates 3 types of files: Customer, License and Product More notes from Zhiming ----------------------- The script directory is /ReportingThird/stats/UDW/Install/script The script to run (with parameters) is /ReportingThird/stats/UDW/Install/script/runUDWPrep.sh -d 2006-05-01 -M The executable is /ReportingThird/stats/UDW/Install/bin/SunOS/udwreport which was compiled in the source directory: /ReportingThird/stats/UDW/Source/Current Please note that there is no make install. Executable must be copied by hand. There two versions of master udw-conf.xml files: udw-conf.xml.master_May_onwards and udw-conf.xml.master_Jan_Apr. The one for Jan to Apr uses the historically parsed journal/cochrane/stasa logs whilst the new one uses the newly re-parsed www3 logs. Directory /ReportingThird/stats/UDW/Persistent contains the saved sequence files as well as session counts.If a rerun is needed, the appropriate files must be linked to by the *.previous links in that directory. Directory /ReportingThird/stats/UDW/output contains the output the running the C++ program. The following must be done before data can be ftp'ed. 1) Generating summary information using script /ReportingThird/stats/UDW/output/summary/extract_summary.sh. The appropriate UDW_WP_Usage_Fact file name must be passed to the script as the only parameter. 2) UDW_WP_Usage_Fact should be split into smaller files containing 10,000,000 lines or less each. 3) Make sure that Customer_Contract has been post-processed. If not there is a script, /UDW/stats/historicData/output/doCustomerContract.sh, to do so. Again, the correct file name must be passed as the first argument. Send these output (or postprocessed data) to ftp server. email Dinesh Pohane (cc Jason Beard) an email informing him of that. ----------------------------------------------------------------------------- DB dump scripts --------------- In Install/DBExtract/bin: Dir_UDW_DB.config - configuration file dumpExtracts.sh - main script to do DB dumps bcpDB.sh - used by dumpExtracts.sh and dumpUDWData.sh dumpUDWData.sh - simple first attempt at script? queryDB.sh - used by dumpExtracts.sh (maybe also for manual querying of database - pass SQL statement as arg) In Install/DBExtract/bin/DBDump: SQL scripts for dumping data for use by udwreport In Install/DBExtract/bin/UDWExtract: SQL scripts for dumping data to be FTPed Dumped DB data -------------- In DBExtracts/2006/Jun etc.: DB data to be FTPed In Data/2006/Jun etc.: DB data for use by udwreport [ now moved to same directory as Usage Reporting DB data ] ----------------------------------------------------------------------------- Scripts for running udwreport ----------------------------- In Install/script: Dir_UDW.config - configuration of directories runUDWPrep.sh - script to run udwreport utilities.sh - used by runUDWPrep.sh ----------------------------------------------------------------------------- Configuration ------------- Configuration of files & directories is scattered: Install/script/Dir_UDW.config: UDWROOTDIR="/UDW/stats/UDW/Frank" PERSISTENTSTOREDIR="${UDWROOTDIR}/Persistent" CONFIGDIR="${UDWROOTDIR}/Install/config" BINDIR="${UDWROOTDIR}/Install/bin" Install/DBExtract/bin/Dir_UDW_DB.config: EXTRACTDATADIR="/UDW/stats/UDW/Frank/DBExtracts" DBDUMPDATADIR="/UDW/stats/UDW/Frank/Data" Install/config/udw-conf.xml: DocumentFileRoot="/ReportingFourth/stats/Reporting/Data" ConsumerFileRoot="/ReportingFourth/stats/Reporting/Data" LicenseFileRoot="/ReportingFourth/stats/Reporting/Data" PersistentFileRoot="/ReportingThird/stats/UDW/Persistent" WebLogStreamRoot="/ReportingFourth/stats/Reporting/WebLogs/JournalLog" UrlOidMappingFileRoot="/Reporting/stats/Reporting/Data" ArticleSelectFileRoot="/ReportingFourth/stats/Reporting/Data" PPVFileRoot="/ReportingFourth/stats/Reporting/Data" ExternalGatewayFileRoot="/ReportingFourth/stats/Reporting/Data" MatrixFileRoot="/ReportingThird/stats/UDW/Install/config" OutputFileRoot="/ReportingThird/stats/UDW/output" ConsumerFileName="UDW_WP_User" SessionFileName="UDW_WP_Session" UsageFactFileName="UDW_WP_Usage_Fact" AccessFactFileName="UDW_WP_Access_Fact" CustomerContractFileName="UDW_WP_Customer_Contract" Install/DBExtract/bin/dumpExtracts.sh: EXTRACTS="Customer License Product" EXTRACTFILEPREFIX="UDW_WP" Install/script/runUDWPrep.sh: CONFIGFILE="udw-conf.xml" ----------------------------------------------------------------------------- Directory structure ------------------- DBExtracts # this is DB data dumped by dumpExtracts.sh, to be FTPed 2006 Jul UDW_WP_Customer_20060719.dat.gz UDW_WP_Customer_20060719.h.gz UDW_WP_License_20060719.dat.gz UDW_WP_License_20060719.h.gz UDW_WP_Product_20060719.dat.gz UDW_WP_Product_20060719.h.gz Data # This is DB data dumped by dumpExtracts.sh, for use by udwreport. # Might be sensible to put this in same directory as Usage Reporting DB # data. 2006 Jul Consumer.dbd ContentType.dbd CustomerAccount.dbd FreeArticles.dbd Hierarchy.bcp License.bcp LicenseProducts.bcp ProductCollections.bcp SuperUser.dbd Install DBExtract bin # Scripts for doing DB dumps. # Might be sensible to move these into the scripts directory. DBDump Consumer.sql ContentType.sql CustomerAccount.sql FreeArticles.sql SuperUser.sql UDWExtract Customer.sql License.sql Product.sql bcpDB.sh dumpExtracts.sh dumpUDWData.sh queryDB.sh data bin SunOS # These have been manually copied here. # Note that convdate is from the Reporting source. convdate udwreport config counting-matrix.xml udw-conf.xml udw-conf.xml.master script Dir_UDW.config runUDWPrep.sh utilities.sh Persistent SequenceMarker.dat.2006-03 SessionCount.dat.2006-03 tmp SequenceMarker.dat.2006-03 SessionCount.dat.2006-03 Source Current CVS Makefile *.h *.cc udwreport bin SunOS output # Output from udwreport, to be FTPed. # Note that there is no hierarchy to this. UDW_WP_Access_Fact_20060720.dat UDW_WP_Customer_Contract_20060720.dat UDW_WP_Session_20060720.dat UDW_WP_Usage_Fact_20060720.dat UDW_WP_User_20060720.dat ppta.txt summary extract_summary.sh ppta_mar.txt ------------------------------------------------------------------------------ Historic data ------------- There is a README at /UDW/stats/historicData/README.txt. Historic output data is in /UDW/stats/historicData/output - program is run from /ReportingThird/stats/UDW.