PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

[Office 2007 plugin error] Could not open OOXML (error reading from: C:\Users ...etc.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    As a follow-up, and to conclude for the time being, I've included at the end of this message the results for two successive runs of the same offline configuration over the same data set, the only difference being in the number of threads used, nine for the first run, one for the second.

    For some reason multiple-thread runs produce slightly better results: in some but not all cases, about 3 to 9 percent more files were reported as being indexed, than in single-thread mode. In both modes, the same 127 files were reported as "Errors" for both runs. It is not clear what the status of the other unindexed files was. For example, plain text files, 706 indexed in multiple-thread mode, 704 in single thread mode.

    Among the "hard-core set" of 127 problematic files:

    - About 40 Excel files from a particular sub-group of files stored on our GitHub server turned out not to be real Excel files at all, but symbolic links that when downloaded from our GitHub server appear on the local file system as normal Excel files, but only 1 KB in size.

    - About 50 Word files from an particular sub-group of files stored on our SVN server are old Word .doc files which can be opened locally, but not indexed. This may be related to their format, as they were all created with legacy versions of Word about 10 years ago.

    - All remaining files are various Excel, Word, or PowerPoint files, mostly unrelated, and stored in various places on our SharePoint server. Local copies of these can be opened, but not indexed, for reasons that remain unclear to us.

    Log Summary Report:
    20:13:42 - Start indexing (offline mode) at Tue Oct 1 20:13:42 2019
    20:48:32 - Indexing completed at Tue Oct 1 20:48:32 2019
    20:48:32 - INDEX SUMMARY
    20:48:32 - Files indexed: 27204
    20:48:32 - Files skipped: 79821
    20:48:32 - Files filtered: 1594
    20:48:32 - Emails indexed: 0
    20:48:32 - Unique words found: 779655
    20:48:32 - Variant words found: 584214
    20:48:32 - Total words found: 52207012
    20:48:32 - Avg. unique words per page: 28.66
    20:48:32 - Avg. words per page: 1919
    20:48:32 - Peak physical memory used: 1421 MB
    20:48:32 - Peak virtual memory used: 41013 MB
    20:48:32 - Errors: 127
    20:48:32 - Total bytes scanned/downloaded: 4068560597
    20:48:32 - File extensions:
    20:48:32 - .asp indexed: 0
    20:48:32 - .aspx indexed: 0
    20:48:32 - .cc indexed: 1587
    20:48:32 - .cenv indexed: 160
    20:48:32 - .cgi indexed: 0
    20:48:32 - .crdl indexed: 1144
    20:48:32 - .doc indexed: 436
    20:48:32 - .docm indexed: 39
    20:48:32 - .docx indexed: 125
    20:48:32 - .h indexed: 1428
    20:48:32 - .htm indexed: 8418
    20:48:32 - .html indexed: 6755
    20:48:32 - .one indexed: 346
    20:48:32 - .onetoc2 indexed: 132
    20:48:32 - .pdf indexed: 507
    20:48:32 - .php indexed: 0
    20:48:32 - .php3 indexed: 0
    20:48:32 - .php4 indexed: 0
    20:48:32 - .png indexed: 1651
    20:48:32 - .ppt indexed: 2
    20:48:32 - .pptm indexed: 14
    20:48:32 - .pptx indexed: 627
    20:48:32 - .py indexed: 2405
    20:48:32 - .txt indexed: 706
    20:48:32 - .vsd indexed: 161
    20:48:32 - .vsdx indexed: 2
    20:48:32 - .wmf indexed: 1
    20:48:32 - .xls indexed: 18
    20:48:32 - .xlsm indexed: 261
    20:48:32 - .xlsx indexed: 279
    22:12:54 - Start indexing (offline mode) at Tue Oct 1 22:12:54 2019
    23:03:12 - Indexing completed at Tue Oct 1 23:03:12 2019
    23:03:12 - INDEX SUMMARY
    23:03:12 - Files indexed: 26982
    23:03:12 - Files skipped: 79780
    23:03:12 - Files filtered: 1592
    23:03:12 - Emails indexed: 0
    23:03:12 - Unique words found: 779605
    23:03:12 - Variant words found: 584169
    23:03:12 - Total words found: 51634751
    23:03:12 - Avg. unique words per page: 28.89
    23:03:12 - Avg. words per page: 1913
    23:03:12 - Peak physical memory used: 569 MB
    23:03:12 - Peak virtual memory used: 9729 MB
    23:03:12 - Errors: 127
    23:03:12 - Total bytes scanned/downloaded: 4022275435
    23:03:12 - File extensions:
    23:03:12 - .asp indexed: 0
    23:03:12 - .aspx indexed: 0
    23:03:12 - .cc indexed: 1587
    23:03:12 - .cenv indexed: 160
    23:03:12 - .cgi indexed: 0
    23:03:12 - .crdl indexed: 1144
    23:03:12 - .doc indexed: 400
    23:03:12 - .docm indexed: 39
    23:03:12 - .docx indexed: 120
    23:03:12 - .h indexed: 1428
    23:03:12 - .htm indexed: 8418
    23:03:12 - .html indexed: 6755
    23:03:12 - .one indexed: 346
    23:03:12 - .onetoc2 indexed: 132
    23:03:12 - .pdf indexed: 500
    23:03:12 - .php indexed: 0
    23:03:12 - .php3 indexed: 0
    23:03:12 - .php4 indexed: 0
    23:03:12 - .png indexed: 1501
    23:03:12 - .ppt indexed: 2
    23:03:12 - .pptm indexed: 14
    23:03:12 - .pptx indexed: 611
    23:03:12 - .py indexed: 2405
    23:03:12 - .txt indexed: 704
    23:03:12 - .vsd indexed: 161
    23:03:12 - .vsdx indexed: 2
    23:03:12 - .wmf indexed: 1
    23:03:12 - .xls indexed: 18
    23:03:12 - .xlsm indexed: 255
    23:03:12 - .xlsx indexed: 279
    Richard

    Comment


    • #17
      Can you confirm you are using the latest build (V8.0 build 1007) as available here:
      https://www.zoomsearchengine.com/zoo...w.html#windows

      There has been a few related issues fixed since build 1006.
      --Ray
      Wrensoft Web Software
      Sydney, Australia
      Zoom Search Engine

      Comment


      • #18
        Yes, using v. 8 build 1107
        Richard

        Comment


        • #19
          If you want us to investigate further, you can email us a few sample files from this group, along with your ZCFG settings:

          Originally posted by rkg82 View Post
          All remaining files are various Excel, Word, or PowerPoint files, mostly unrelated, and stored in various places on our SharePoint server. Local copies of these can be opened, but not indexed, for reasons that remain unclear to us.
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment


          • #20
            Unfortunately, I cannot provide the actual files, for reasons of confidentiality. I'd have to strip all content from them. I'll try to send one or two stripped ones next time I come across a new file that fails to index the first time, or after having been successfully indexed previously. I already provided one such stripped version of one of the files subsequent to our exchanges in the other thread I started.
            Richard

            Comment


            • #21
              We reproduced the problem with the previous stripped file you gave us, and fixed the problem such that it no longer occurs with the latest build. So there may be a different problem with these other files.
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment

              Working...
              X