-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Feature/bookmarks #539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Feature/bookmarks #539
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Benjamin-T I've left some explanatory comments where the merged commits differ from those submitted with your PR.
|
||
* ? Do bookmarks need to be unique across all stories? (like headers, footers, | ||
etc.)? This could be trouble for us because we don't yet have access to | ||
those "stories". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple new open questions here; this one in particular I think will be important for us to bear in mind as we go forward. I expect we should design with the prospect of Bookmarks being across all stories in the document rather than just the main document body, and I think this feature is going to be vulnerable to repair-errors for that reason.
@@ -10,6 +10,7 @@ Feature Analysis | |||
.. toctree:: | |||
:titlesonly: | |||
|
|||
features/bookmarks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally place new analysis documents at the top of the list rather than try to find a logical ordering. This makes them easy to find in the most likely case, which is when you're actively working on the feature.
@wip | ||
Scenario: Document.bookmarks | ||
Given a Document object as document | ||
Then document.bookmarks is a Bookmarks object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The project's style for acceptance tests has evolved somewhat since many of these were written. This one reflects the current style, which adopts the perspective that the "user" in these user acceptance tests is a developer using the API. So the scenario name is usually terse like this, often {object}.{method or property}, sometimes with "getter" or "setter" appended when it's two aspects of a property.
The given statement in this style mentions the "variable" name used for the object in an "as" clause at the end.
This test is quite simplified from your original version. This reflects that there are three "levels" or "steps" that are tested, each individually:
-
A document object has a
.bookmarks
property that returns (unconditionally and idempotently in this case) aBookmarks
object. -
A Bookmarks object has certain behaviors (len, indexed access, interable, etc.)
-
An individual Bookmark object has certain behaviors.
These are each tested separately and not mixed in one test.
features/steps/document.py
Outdated
@@ -27,6 +27,11 @@ def given_a_blank_document(context): | |||
context.document = Document(test_docx('doc-word-default-blank')) | |||
|
|||
|
|||
@given('a Document object as document') | |||
def given_a_Document_object_as_document(context): | |||
context.document = Document(test_docx('doc-default')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for a special document file yet, and better that we don't introduce one for this because bookmarks should work even when the document has none. We'll add your test file in a later commit.
features/steps/document.py
Outdated
bookmarks = context.document.bookmarks | ||
actual = bookmarks.__class__.__name__ | ||
expected = 'Bookmarks' | ||
assert actual == expected, 'bookmarks is a %s object' % actual |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it may seem a trivial test, all we want to do here is establish we got a Bookmarks object on this call. Any particular behaviors of that bookmarks object is for another test. These "..is an X object" tests are quite effective at driving (and defending) the top-level behaviors they're designed to test.
from docx.enum.section import WD_SECTION | ||
from docx.enum.text import WD_BREAK | ||
from docx.section import Section, Sections | ||
from docx.shared import ElementProxy, Emu, lazyproperty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the past I used relative imports (e.g. '.' and '..' prefixes) in this project, but have since determined this is not best practice. So whenever I have to add an import I fix the others in that module. Note they are sorted in alphabetical order at the top of the module and items within the line as well. This makes the position for a new import predictable and easy to review.
docx/document.py
Outdated
Conflicts may arise if a bookmark is added with the same name as one | ||
appearing in another story such as a header or footer etc. | ||
""" | ||
return self._body.bookmarks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the multi-story scope of bookmarks, I'm inclined to delegate ownership of the actual Bookmarks object to _Body, like .paragraphs
does with those. Eventually this Document
-level bookmarks may become an "aggregate" object that contains bookmarks from all stories in the document; so this seam will allow us to add that later (and incrementally) with a minimum of disruption.
docx/document.py
Outdated
@lazyproperty | ||
def bookmarks(self): | ||
"""|Bookmarks| object providing access to main story bookmarks.""" | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When needed for mocking purposes, a new method/property is added as a "stub". It nevertheless always has a complete docstring (that is formatted according to PEP 257). Its only line is raise NotImplementedError
which causes the acceptance test to fail at just the right spot when it's next in line for implementation.
|
||
bookmarks = document.bookmarks | ||
|
||
assert bookmarks is bookmarks_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the past, I used to always have a fixture for each test. Now I only use a test fixture when there are multiple cases (i.e. fixture needs params=[]
). It seems like it makes it easier to have all the fixture-making right there with the test when that's possible. This doesn't apply to fixture components though (single mocks signified by a trailing underscore ('_') in their name. Those are just used directly by the test rather that a fixture.
@@ -291,12 +306,12 @@ def body_(self, request): | |||
return instance_mock(request, _Body) | |||
|
|||
@pytest.fixture | |||
def _block_width_prop_(self, request): | |||
return property_mock(request, Document, '_block_width') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I maintain fixture components in strict alphabetical order by name. This one was out of order, which explains this change not directly related to this test. Keeping them alphabetical speeds finding the one you're after when scanning the source, especially when code-folding and search aren't handy.
e092fee
to
e272749
Compare
e272749
to
cd41998
Compare
2d6c913
to
52f4285
Compare
@scanny |
f5f452b
to
794af7d
Compare
@Benjamin-T Hi Benjamin, I did some work on this over the weekend:
Please review this to understand the changes I made and why, then rebase your branch on my Then I'll take another look. |
@scanny |
Dear @scanny I've rebased my branch and added some commits. I've also updated the branch with an additional acceptance test. If you could take a look and see if they can be merged that would be much appreciated. (Note that TravisCI says the branch is failing, but it really only is the python 3.4 test which is crashing)
The last commit I haven't changed it in my current branch in order to make it more obvious what I mean. Curious about your thoughts on the matter. regards , Ben |
There are some pending commits in my feature branch. Please let me know when you've looked at them so we can move forward. thanks for time Ben |
e140108
to
f7b5c42
Compare
Hi Steve, I somehow missed your new commits to this PR - Great stuff! I've experimented a bit with it and discovered that even though the bookmarks added do show up in the word editor, they are somewhat different. The main issue is that python-docx places them at the same level as the paragraph elements, whereas the word editor inserts the bookmark elements on a paragraph level. I've checked this for the body and header stories and assume that this holds for the other story-parts as well. The current implementation results in this xml: from docx import Document
doc = Document()
bmk = doc.start_bookmark('test')
doc.end_bookmark(bmk)
doc.save('test_bookmark.docx') <w:document>
<w:body>
<w:bookmarkStart w:name="test" w:id="1" />
<w:bookmarkEnd w:id="1" />
</w:body>
</w:document> Whereas the word editor creates this xml: <w:document>
<w:body>
<w:p>
<w:bookmarkStart w:name="test" w:id="0" />
<w:bookmarkEnd w:id="0" />
</w:p>
</w:body>
</w:document> Edit: An even more peculiar thing happens when two body-level bookmarks are added: The first bookmark can be referred to whereas the latter leads to the same error message. My conclusion is that the word-editor tries to prevent these self-reference errors by raising the bookmark elements to the paragraph level. What would your suggestion be? Continue with this compliant implementation which gets updated upon save by the word editor or only add bookmarks in paragraph elements. Or am I missing a point here? My suggestion would be to only implement the bookmarks on paragraph level. Which would either require the user to first add a paragraph before being able to add a bookmark, or provide a convenience function on document level which adds a paragraph element before adding the bookmark to it. For reference: |
Adding .__iter__() is not strictly required to enable iteration, but it improves the performance of iteration significantly by avoiding the default implementation which would repeatedly parse the bookmark pairs in the document.
* Sequence is now in `collections.abc` and will be removed from `collections` in 3.8. * zip() returns an iterator in Python 3 and must be wrapped with something like list() or tuple() to be realized. * Remove support for 2.6. * Add tox tests for 3.6 and 3.7. * Update .travis.yml to match tox versions.
@Benjamin-T I think the safest bet is to determine what the Microsoft API for Word does and reproduce that behavior. Because details like this are not documented (publicly at least), the only good assurance I've found is to duplicate the MS API behavior, which is pretty good assurance in my experience, although sometimes a mystery why it needs to be the way it is. This is an example of why the analysis document is so important, in particular doing the thorough analysis work needed to properly create that document. I of course learned that the hard way, by reimplementing several features early on, but this is definitely a "teaching moment" opportunity :) There are a couple ways to experiment with the Microsoft API, but they all require Windows as far as I know. In general I think using VBA right from within Word is the easiest method. I used to keep a Windows virtual machine around (on my Mac) to do that sort of thing but don't have it anymore. |
Dear @scanny Thanks for the response, I've run a test using the Word API: From this I think we can conclude that the 'correct' place of the bookmark elements is paragraph level. Sub Mark()
ActiveDocument.Bookmarks.Add Name:="mark"
End Sub Bookmark at start of documentResults in: <w:body xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:p w:rsidR="000E0575" w:rsidRDefault="000E0575" w14:paraId="6AE70610" w14:textId="77777777" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml">
<w:bookmarkStart w:name="myplace" w:id="0" />
<w:bookmarkEnd w:id="0" />
</w:p>
<w:sectPr w:rsidR="000E0575">
<w:pgSz w:w="11906" w:h="16838" />
<w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="708" w:footer="708" w:gutter="0" />
<w:cols w:space="708" />
<w:docGrid w:linePitch="360" />
</w:sectPr>
</w:body> Perhaps interesting, the corresponding generated code: using DocumentFormat.OpenXml.Wordprocessing;
using DocumentFormat.OpenXml;
namespace GeneratedCode
{
public class GeneratedClass
{
// Creates an Body instance and adds its children.
public Body GenerateBody()
{
Body body1 = new Body();
Paragraph paragraph1 = new Paragraph(){ RsidParagraphAddition = "000E0575", RsidRunAdditionDefault = "000E0575", ParagraphId = "6AE70610", TextId = "77777777" };
BookmarkStart bookmarkStart1 = new BookmarkStart(){ Name = "myplace", Id = "0" };
BookmarkEnd bookmarkEnd1 = new BookmarkEnd(){ Id = "0" };
paragraph1.Append(bookmarkStart1);
paragraph1.Append(bookmarkEnd1);
body1.Append(paragraph1);
return body1;
}
}
}
Bookmark inside a runI've also placed a bookmark using the same code with the curser inside a run: <w:document>
<w:body>
<w:p w:rsidR="000E0575" w:rsidRDefault="00390F75" w14:paraId="6AE70610" w14:textId="1AE1D158">
<w:r>
<w:t xml:space="preserve">This is a bookmark </w:t>
</w:r>
<w:bookmarkStart w:name="myplace" w:id="0" />
<w:bookmarkEnd w:id="0" />
<w:r>
<w:t>inside a run</w:t>
</w:r>
</w:p>
<w:p w:rsidR="00390F75" w:rsidRDefault="00390F75" w14:paraId="6428D04B" w14:textId="77777777" />
<w:sectPr w:rsidR="00390F75">
<w:pgSz w:w="11906" w:h="16838" />
<w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="708" w:footer="708" w:gutter="0" />
<w:cols w:space="708" />
<w:docGrid w:linePitch="360" />
</w:sectPr>
</w:body>
</w:document> using DocumentFormat.OpenXml.Wordprocessing;
using DocumentFormat.OpenXml;
namespace GeneratedCode
{
public class GeneratedClass
{
// Creates an Body instance and adds its children.
public Body GenerateBody()
{
Body body1 = new Body();
Paragraph paragraph1 = new Paragraph(){ RsidParagraphAddition = "000E0575", RsidRunAdditionDefault = "00390F75", ParagraphId = "6AE70610", TextId = "1AE1D158" };
Run run1 = new Run();
Text text1 = new Text(){ Space = SpaceProcessingModeValues.Preserve };
text1.Text = "This is a bookmark ";
run1.Append(text1);
BookmarkStart bookmarkStart1 = new BookmarkStart(){ Name = "myplace", Id = "0" };
BookmarkEnd bookmarkEnd1 = new BookmarkEnd(){ Id = "0" };
Run run2 = new Run();
Text text2 = new Text();
text2.Text = "inside a run";
run2.Append(text2);
paragraph1.Append(run1);
paragraph1.Append(bookmarkStart1);
paragraph1.Append(bookmarkEnd1);
paragraph1.Append(run2);
Paragraph paragraph2 = new Paragraph(){ RsidParagraphAddition = "00390F75", RsidRunAdditionDefault = "00390F75", ParagraphId = "6428D04B", TextId = "77777777" };
body1.Append(paragraph1);
body1.Append(paragraph2);
return body1;
}
}
} Bookmark around specific text in run(And also added a field to test whether the reference is also working correctly) <w:body xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:p w:rsidR="000E0575" w:rsidRDefault="00390F75" w14:paraId="6AE70610" w14:textId="5ABFDEE5" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml">
<w:r>
<w:t xml:space="preserve">This is a </w:t>
</w:r>
<w:bookmarkStart w:name="myplace" w:id="0" />
<w:r>
<w:t>bookmark inside</w:t>
</w:r>
<w:bookmarkEnd w:id="0" />
<w:r>
<w:t xml:space="preserve"> a run</w:t>
</w:r>
</w:p>
<w:p w:rsidR="004C3C5C" w:rsidRDefault="004C3C5C" w14:paraId="0DD6D36E" w14:textId="0579ED72" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" />
<w:p w:rsidR="004C3C5C" w:rsidRDefault="004C3C5C" w14:paraId="3196D0A8" w14:textId="17C6E3EF" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml">
<w:r>
<w:fldChar w:fldCharType="begin" />
</w:r>
<w:r>
<w:instrText xml:space="preserve"> REF myplace \h </w:instrText>
</w:r>
<w:r>
<w:fldChar w:fldCharType="separate" />
</w:r>
<w:r>
<w:t>bookmark inside</w:t>
</w:r>
<w:r>
<w:fldChar w:fldCharType="end" />
</w:r>
</w:p>
<w:p w:rsidR="00390F75" w:rsidRDefault="00390F75" w14:paraId="6428D04B" w14:textId="77777777" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" />
<w:sectPr w:rsidR="00390F75">
<w:pgSz w:w="11906" w:h="16838" />
<w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="708" w:footer="708" w:gutter="0" />
<w:cols w:space="708" />
<w:docGrid w:linePitch="360" />
</w:sectPr>
</w:body> |
Have you plan to merge it soon ? |
This PR provides a place for me to comment on differences between merged commits and those submitted on PR #445.