Skip to content

Conversation

scanny
Copy link
Contributor

@scanny scanny commented Aug 25, 2018

This PR provides a place for me to comment on differences between merged commits and those submitted on PR #445.

Copy link
Contributor Author

@scanny scanny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Benjamin-T I've left some explanatory comments where the merged commits differ from those submitted with your PR.


* ? Do bookmarks need to be unique across all stories? (like headers, footers,
etc.)? This could be trouble for us because we don't yet have access to
those "stories".
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple new open questions here; this one in particular I think will be important for us to bear in mind as we go forward. I expect we should design with the prospect of Bookmarks being across all stories in the document rather than just the main document body, and I think this feature is going to be vulnerable to repair-errors for that reason.

@@ -10,6 +10,7 @@ Feature Analysis
.. toctree::
:titlesonly:

features/bookmarks
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally place new analysis documents at the top of the list rather than try to find a logical ordering. This makes them easy to find in the most likely case, which is when you're actively working on the feature.

@wip
Scenario: Document.bookmarks
Given a Document object as document
Then document.bookmarks is a Bookmarks object
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The project's style for acceptance tests has evolved somewhat since many of these were written. This one reflects the current style, which adopts the perspective that the "user" in these user acceptance tests is a developer using the API. So the scenario name is usually terse like this, often {object}.{method or property}, sometimes with "getter" or "setter" appended when it's two aspects of a property.

The given statement in this style mentions the "variable" name used for the object in an "as" clause at the end.

This test is quite simplified from your original version. This reflects that there are three "levels" or "steps" that are tested, each individually:

  1. A document object has a .bookmarks property that returns (unconditionally and idempotently in this case) a Bookmarks object.

  2. A Bookmarks object has certain behaviors (len, indexed access, interable, etc.)

  3. An individual Bookmark object has certain behaviors.

These are each tested separately and not mixed in one test.

@@ -27,6 +27,11 @@ def given_a_blank_document(context):
context.document = Document(test_docx('doc-word-default-blank'))


@given('a Document object as document')
def given_a_Document_object_as_document(context):
context.document = Document(test_docx('doc-default'))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for a special document file yet, and better that we don't introduce one for this because bookmarks should work even when the document has none. We'll add your test file in a later commit.

bookmarks = context.document.bookmarks
actual = bookmarks.__class__.__name__
expected = 'Bookmarks'
assert actual == expected, 'bookmarks is a %s object' % actual
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it may seem a trivial test, all we want to do here is establish we got a Bookmarks object on this call. Any particular behaviors of that bookmarks object is for another test. These "..is an X object" tests are quite effective at driving (and defending) the top-level behaviors they're designed to test.

from docx.enum.section import WD_SECTION
from docx.enum.text import WD_BREAK
from docx.section import Section, Sections
from docx.shared import ElementProxy, Emu, lazyproperty
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past I used relative imports (e.g. '.' and '..' prefixes) in this project, but have since determined this is not best practice. So whenever I have to add an import I fix the others in that module. Note they are sorted in alphabetical order at the top of the module and items within the line as well. This makes the position for a new import predictable and easy to review.

docx/document.py Outdated
Conflicts may arise if a bookmark is added with the same name as one
appearing in another story such as a header or footer etc.
"""
return self._body.bookmarks
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the multi-story scope of bookmarks, I'm inclined to delegate ownership of the actual Bookmarks object to _Body, like .paragraphs does with those. Eventually this Document-level bookmarks may become an "aggregate" object that contains bookmarks from all stories in the document; so this seam will allow us to add that later (and incrementally) with a minimum of disruption.

docx/document.py Outdated
@lazyproperty
def bookmarks(self):
"""|Bookmarks| object providing access to main story bookmarks."""
raise NotImplementedError
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When needed for mocking purposes, a new method/property is added as a "stub". It nevertheless always has a complete docstring (that is formatted according to PEP 257). Its only line is raise NotImplementedError which causes the acceptance test to fail at just the right spot when it's next in line for implementation.


bookmarks = document.bookmarks

assert bookmarks is bookmarks_
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past, I used to always have a fixture for each test. Now I only use a test fixture when there are multiple cases (i.e. fixture needs params=[]). It seems like it makes it easier to have all the fixture-making right there with the test when that's possible. This doesn't apply to fixture components though (single mocks signified by a trailing underscore ('_') in their name. Those are just used directly by the test rather that a fixture.

@@ -291,12 +306,12 @@ def body_(self, request):
return instance_mock(request, _Body)

@pytest.fixture
def _block_width_prop_(self, request):
return property_mock(request, Document, '_block_width')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I maintain fixture components in strict alphabetical order by name. This one was out of order, which explains this change not directly related to this test. Keeping them alphabetical speeds finding the one you're after when scanning the source, especially when code-folding and search aren't handy.

@scanny scanny mentioned this pull request Aug 25, 2018
@scanny scanny force-pushed the feature/bookmarks branch 2 times, most recently from e092fee to e272749 Compare August 31, 2018 19:31
@scanny scanny force-pushed the feature/bookmarks branch 4 times, most recently from 2d6c913 to 52f4285 Compare December 28, 2018 05:59
@Benjamin-T
Copy link

@scanny #445 has the latest master as base, shouldn't we rebase this as well and see if we can get some new commits merged into this branch?
regards,

Ben

@Benjamin-T
Copy link

@scanny
What would it take to get the bookmarks feature added to the library? If you are very time constrained, perhaps in small steps. Or otherwise, contact me directly to see if it can be done via a sponsorship.
Regards,
Ben

@scanny scanny force-pushed the feature/bookmarks branch 2 times, most recently from f5f452b to 794af7d Compare July 13, 2019 23:44
@scanny
Copy link
Contributor Author

scanny commented Jul 15, 2019

@Benjamin-T Hi Benjamin, I did some work on this over the weekend:

  • I rebased my feature/bookmarks onto the latest master, which contains the work recently completed on adding headers and footers.

  • I added about seven commits from your branch, up to bmk: add Document.start_bookmark().

Please review this to understand the changes I made and why, then rebase your branch on my feature/bookmarks such that your remaining commits proceed from those that have been committed here (and do not also include commits already on my feature/bookmarks branch. You may prefer to do this by cherry-picking; any way that works for you is fine with me.

Then I'll take another look.

@Benjamin-T
Copy link

@scanny
Thanks for taking the time and adding some of my commits. Dit you also notice that I've updated the feature analysis document? I've tried to answer some of the remaining questions. If you find the answers satisfactory, perhaps that document can be added to this PR as well.

@Benjamin-T
Copy link

Benjamin-T commented Aug 31, 2019

Dear @scanny

I've rebased my branch and added some commits. I've also updated the branch with an additional acceptance test. If you could take a look and see if they can be merged that would be much appreciated. (Note that TravisCI says the branch is failing, but it really only is the python 3.4 test which is crashing)

  • The current implementation supports bookmarks for the header and the footer parts as well. Should we add these as acceptance tests?

  • In order to support bookmarks at paragraph level, a very similar interface is required as we currently have in the BlockItemContainer, are we going to copy paste this code or do you prefer the BookmarkMixin approach? (I've added a commit to show what I mean, see spike: BookmarkMixin at b714b84)

The last commit bmk: add BaseStoryPart.iter_story_parts() is a refactor proposal:
If we change the location of the 'iter_story_parts() method from the DocumentPart to the BaseStoryPart the header and footer parts can access the same method. (as well as any other storypart we might imlement in the future i.e: Comments and Endnotes.)

I haven't changed it in my current branch in order to make it more obvious what I mean.

Curious about your thoughts on the matter.

regards ,

Ben

@Benjamin-T
Copy link

Benjamin-T commented Sep 29, 2020

@scanny

There are some pending commits in my feature branch. Please let me know when you've looked at them so we can move forward.

thanks for time

Ben

@Benjamin-T
Copy link

Benjamin-T commented Feb 16, 2021

Hi Steve,

I somehow missed your new commits to this PR - Great stuff!

I've experimented a bit with it and discovered that even though the bookmarks added do show up in the word editor, they are somewhat different.

The main issue is that python-docx places them at the same level as the paragraph elements, whereas the word editor inserts the bookmark elements on a paragraph level. I've checked this for the body and header stories and assume that this holds for the other story-parts as well.

The current implementation results in this xml:

from docx import Document
doc = Document()
bmk = doc.start_bookmark('test')
doc.end_bookmark(bmk)
doc.save('test_bookmark.docx')
<w:document>
  <w:body>
    <w:bookmarkStart w:name="test" w:id="1" />
    <w:bookmarkEnd w:id="1" />
  </w:body>
</w:document>

Whereas the word editor creates this xml:
(Create document using python-docx --> click insert bookmark --> save file)

<w:document>
  <w:body>
    <w:p>
      <w:bookmarkStart w:name="test" w:id="0" />
      <w:bookmarkEnd w:id="0" />
    </w:p>
  </w:body>
</w:document>

Edit:
@scanny
On closer inspection, the bookmark elements are a bit different to 'regular' openxml elements. The xml which is generated by python-docx is definitely valid, but somehow gets updated by the word editor. A clue why this is, might be the fact that its not possible to create a field code reference to the body-level bookmark from within the same story part. Meaning, a main story bookmark can be referred to from a header or footer story-part but a reference from within the main story-part leads to an error message: "Error! Not a valid bookmark self-reference."

An even more peculiar thing happens when two body-level bookmarks are added: The first bookmark can be referred to whereas the latter leads to the same error message. My conclusion is that the word-editor tries to prevent these self-reference errors by raising the bookmark elements to the paragraph level.

What would your suggestion be? Continue with this compliant implementation which gets updated upon save by the word editor or only add bookmarks in paragraph elements. Or am I missing a point here?

My suggestion would be to only implement the bookmarks on paragraph level. Which would either require the user to first add a paragraph before being able to add a bookmark, or provide a convenience function on document level which adds a paragraph element before adding the bookmark to it.

For reference:
This is a very convenient webpage containing all the ooxml schema elements i.e. Paragraph: http://officeopenxml.com/WPparagraph.php)

Benjamin Toornstra and others added 26 commits May 15, 2021 15:28
Adding .__iter__() is not strictly required to enable iteration, but it
improves the performance of iteration significantly by avoiding the
default implementation which would repeatedly parse the bookmark pairs
in the document.
* Sequence is now in `collections.abc` and will be removed from
  `collections` in 3.8.
* zip() returns an iterator in Python 3 and must be wrapped with
  something like list() or tuple() to be realized.
* Remove support for 2.6.
* Add tox tests for 3.6 and 3.7.
* Update .travis.yml to match tox versions.
@scanny scanny force-pushed the feature/bookmarks branch from f7b5c42 to a53a6bb Compare May 15, 2021 22:34
@scanny
Copy link
Contributor Author

scanny commented Jun 8, 2021

@Benjamin-T I think the safest bet is to determine what the Microsoft API for Word does and reproduce that behavior. Because details like this are not documented (publicly at least), the only good assurance I've found is to duplicate the MS API behavior, which is pretty good assurance in my experience, although sometimes a mystery why it needs to be the way it is.

This is an example of why the analysis document is so important, in particular doing the thorough analysis work needed to properly create that document. I of course learned that the hard way, by reimplementing several features early on, but this is definitely a "teaching moment" opportunity :)

There are a couple ways to experiment with the Microsoft API, but they all require Windows as far as I know. In general I think using VBA right from within Word is the easiest method. I used to keep a Windows virtual machine around (on my Mac) to do that sort of thing but don't have it anymore.

@Benjamin-T
Copy link

Benjamin-T commented Jun 9, 2021

Dear @scanny

Thanks for the response, I've run a test using the Word API:
See Docs

From this I think we can conclude that the 'correct' place of the bookmark elements is paragraph level.

Sub Mark() 
 ActiveDocument.Bookmarks.Add Name:="mark" 
End Sub

Bookmark at start of document

Results in:

<w:body xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:p w:rsidR="000E0575" w:rsidRDefault="000E0575" w14:paraId="6AE70610" w14:textId="77777777" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml">
    <w:bookmarkStart w:name="myplace" w:id="0" />
    <w:bookmarkEnd w:id="0" />
  </w:p>
  <w:sectPr w:rsidR="000E0575">
    <w:pgSz w:w="11906" w:h="16838" />
    <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="708" w:footer="708" w:gutter="0" />
    <w:cols w:space="708" />
    <w:docGrid w:linePitch="360" />
  </w:sectPr>
</w:body>

Perhaps interesting, the corresponding generated code:

using DocumentFormat.OpenXml.Wordprocessing;
using DocumentFormat.OpenXml;

namespace GeneratedCode
{
    public class GeneratedClass
    {
        // Creates an Body instance and adds its children.
        public Body GenerateBody()
        {
            Body body1 = new Body();

            Paragraph paragraph1 = new Paragraph(){ RsidParagraphAddition = "000E0575", RsidRunAdditionDefault = "000E0575", ParagraphId = "6AE70610", TextId = "77777777" };
            BookmarkStart bookmarkStart1 = new BookmarkStart(){ Name = "myplace", Id = "0" };
            BookmarkEnd bookmarkEnd1 = new BookmarkEnd(){ Id = "0" };

            paragraph1.Append(bookmarkStart1);
            paragraph1.Append(bookmarkEnd1);
            body1.Append(paragraph1);
            return body1;
        }


    }
}

Bookmark inside a run

I've also placed a bookmark using the same code with the curser inside a run:

<w:document>
  <w:body>
    <w:p w:rsidR="000E0575" w:rsidRDefault="00390F75" w14:paraId="6AE70610" w14:textId="1AE1D158">
      <w:r>
        <w:t xml:space="preserve">This is a bookmark </w:t>
      </w:r>
      <w:bookmarkStart w:name="myplace" w:id="0" />
      <w:bookmarkEnd w:id="0" />
      <w:r>
        <w:t>inside a run</w:t>
      </w:r>
    </w:p>
    <w:p w:rsidR="00390F75" w:rsidRDefault="00390F75" w14:paraId="6428D04B" w14:textId="77777777" />
    <w:sectPr w:rsidR="00390F75">
      <w:pgSz w:w="11906" w:h="16838" />
      <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="708" w:footer="708" w:gutter="0" />
      <w:cols w:space="708" />
      <w:docGrid w:linePitch="360" />
    </w:sectPr>
  </w:body>
</w:document>
using DocumentFormat.OpenXml.Wordprocessing;
using DocumentFormat.OpenXml;

namespace GeneratedCode
{
    public class GeneratedClass
    {
        // Creates an Body instance and adds its children.
        public Body GenerateBody()
        {
            Body body1 = new Body();

            Paragraph paragraph1 = new Paragraph(){ RsidParagraphAddition = "000E0575", RsidRunAdditionDefault = "00390F75", ParagraphId = "6AE70610", TextId = "1AE1D158" };

            Run run1 = new Run();
            Text text1 = new Text(){ Space = SpaceProcessingModeValues.Preserve };
            text1.Text = "This is a bookmark ";

            run1.Append(text1);
            BookmarkStart bookmarkStart1 = new BookmarkStart(){ Name = "myplace", Id = "0" };
            BookmarkEnd bookmarkEnd1 = new BookmarkEnd(){ Id = "0" };

            Run run2 = new Run();
            Text text2 = new Text();
            text2.Text = "inside a run";

            run2.Append(text2);

            paragraph1.Append(run1);
            paragraph1.Append(bookmarkStart1);
            paragraph1.Append(bookmarkEnd1);
            paragraph1.Append(run2);
            Paragraph paragraph2 = new Paragraph(){ RsidParagraphAddition = "00390F75", RsidRunAdditionDefault = "00390F75", ParagraphId = "6428D04B", TextId = "77777777" };

            body1.Append(paragraph1);
            body1.Append(paragraph2);
            return body1;
        }


    }
}

Bookmark around specific text in run

(And also added a field to test whether the reference is also working correctly)

<w:body xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:p w:rsidR="000E0575" w:rsidRDefault="00390F75" w14:paraId="6AE70610" w14:textId="5ABFDEE5" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml">
    <w:r>
      <w:t xml:space="preserve">This is a </w:t>
    </w:r>
    <w:bookmarkStart w:name="myplace" w:id="0" />
    <w:r>
      <w:t>bookmark inside</w:t>
    </w:r>
    <w:bookmarkEnd w:id="0" />
    <w:r>
      <w:t xml:space="preserve"> a run</w:t>
    </w:r>
  </w:p>
  <w:p w:rsidR="004C3C5C" w:rsidRDefault="004C3C5C" w14:paraId="0DD6D36E" w14:textId="0579ED72" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" />
  <w:p w:rsidR="004C3C5C" w:rsidRDefault="004C3C5C" w14:paraId="3196D0A8" w14:textId="17C6E3EF" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml">
    <w:r>
      <w:fldChar w:fldCharType="begin" />
    </w:r>
    <w:r>
      <w:instrText xml:space="preserve"> REF myplace \h </w:instrText>
    </w:r>
    <w:r>
      <w:fldChar w:fldCharType="separate" />
    </w:r>
    <w:r>
      <w:t>bookmark inside</w:t>
    </w:r>
    <w:r>
      <w:fldChar w:fldCharType="end" />
    </w:r>
  </w:p>
  <w:p w:rsidR="00390F75" w:rsidRDefault="00390F75" w14:paraId="6428D04B" w14:textId="77777777" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" />
  <w:sectPr w:rsidR="00390F75">
    <w:pgSz w:w="11906" w:h="16838" />
    <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="708" w:footer="708" w:gutter="0" />
    <w:cols w:space="708" />
    <w:docGrid w:linePitch="360" />
  </w:sectPr>
</w:body>

@macpoupou
Copy link

Have you plan to merge it soon ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants