User talk:Jayvdb
From Wikisource
[edit] Click edit here to add talkback
[edit] Kah-ke-wa-quo-na-by pagemove
- Thanks, I believe it's at the right spot now. WilyD 15:55, 16 August 2008 (UTC)
[edit] Work for your bot (again)
Hi John, Again, I have some work for your bot: could you import the text to the pages of fr:Livre:Marchez pendant que vous avez la lumiere.djvu. Thanks, Yann 14:07, 17 August 2008 (UTC)
And also fr:Livre:Tolstoï - Correspondance inédite.djvu. ;o) May be it is better to wait that you have the bot flag. I think there is enough proof that your bot is useful. It would be long. Yann 16:14, 17 August 2008 (UTC)
[edit] Navigasi ID
I'm at a good pause-point on the id issues and have just been nosing about here, again. I was peeking at
and see that a number of the 'paragraphs' are not really; they're divs or center or u-elements; all very non-semantic. I also looked at
which is built differently; an ordered list. I expect that the original document was formatted as the one from the 7th was, with indented paragraphs. There really is no reason wikisource documents such as these can't be maintained in a cleaner source form and render in a more semantically meaningful form.
CSS can format an ordered list with the number inline and indented as the doc from May 7 images show. Similarly, all true paragraphs of docs should actually be emitted as p-elements by MediaWiki and weird trick largely avoided; those divs are awful — they're not even closed.
The other thing I keep noticing is initial br-elements being generated as the first thing in a paragraph. This messes with the text-indent thing and is driven by the number of blanks in the wiki-text.
So, where would you like me to focus? Our chats from a few weeks ago are archived, but I can find them. I could make further centering examples… I think I'd like to stay-off navboxes for a bit. Cheers, Jack Merridew 11:09, 18 August 2008 (UTC)
- Improvements to German Instrument of Surrender (7 May 1945) will be very appreciated, as it is being evaluated as a Featured texts. The other featured text candidates could also use your skills.
- If I was to point you at any specific work, I would suggest you work on a few pages of our Proofreading project with an eye to review the HTML/style aspects. Correspondence should go on a central discussion page like Index talk:Equitation.djvu. --John Vandenberg (chat) 11:47, 18 August 2008 (UTC)
-
- I'll have-at; already did with {{underline}} — which I just created. Seems to me that raw html should be avoided in wiki-markup where something else can do the trick. Templates such as the ones I just created can encapsulate the implementation rather than folks using disparate techniques that are hard-coded out there. And bot can always smack stuff about in a big way. I've seen a few things yours has done; your feed it regular expressions and it hits large numbers of pages, right? Cheers, Jack Merridew 12:02, 18 August 2008 (UTC)
-
-
- I also prefer templates over raw HTML, especially as a way of avoiding CSS styles being defined inline.
- The bot hits large numbers of pages, either by category or by prefix, and performs replacements. John Vandenberg (chat) 12:38, 18 August 2008 (UTC)
-
←See this example;
- User:Jack Merridew/German Instrument of Surrender
- User:Jack Merridew/monobook.css
(take the first dozen or so lines)
The idea here is to use a list to get the numbers but to get the indented look of the original doc. It would also avoid the p-element issues if it were in the page space. In a lot of xhtml documents, lists are formatted to match paragraphs well, so having body text in li-elements is fine. The live surrender instrument has point 5 on the second page and it still has a nasty div trick to dodge the text-indent rule. If the rule I used were the applicable one in the page-space and presentation space, we'd be good. One thing I'm not sure if we can do with wiki-markup is to establish a hook on the ol-element; basically I'd like to set a class on it such as <ol class="olIndented">. Maybe a wrapper div would be needed, or a more complex selector. Rules like this certainly can't be the ambient one as most pages will want vanilla behavior. You mentioned floating an idea for document specific style sheet and that would be worth pursuing. Is there any support for this sort of thing in place, yet?
Does you bot run on many wiki? Something I've seen tons of is code like width="140px"; the 'px' is invalid, only correct for css, but not in xhtml attributes. This mistake exists hundreds of thousands of times all over the small wikis. Someone made this mistake 4ish years ago on en"wp and and hundreds of wikis copied it endlessly. Mostly they're in infoboxes; td and table elements.
I created about 6 formatting templates. I ran into some name conflicts; such as with {{lowercase}} vs {{lc}}. I expect people will prefer the short-form for typing ease. Cheers, Jack Merridew 13:30, 18 August 2008 (UTC)
- I can see how User:Jack Merridew/German Instrument of Surrender would work, but that approach depends on adding CSS to Commons.css, which isnt feasible in the short term. Can German Instrument of Surrender (7 May 1945) be changed to a numbered list without changes to Commons.css? Perhaps using the literal HTML OL element rather than "#"?
- At present we cant add CSS to Commons.css for every work, otherwise it would become very big. Additions to the global CSS need to be generic, so that they are features that can be used by many works. Perhaps we need a big discussion about this on WS:S, to investigate current CSS inline usage and enhance the global CSS to minimise inline CSS.
- Document specific style sheets hasnt been widely discussed, and it would depend on a software enhancement, or perhaps a mediawiki extension already exists.
- I dont have time to run a bot elsewhere. John Vandenberg (chat) 14:11, 18 August 2008 (UTC)
-
- I've just hard-coded with inline css. There's an issue with the list being split on two transcluded pages. Can you move the closing </ol> on the first page into a footer and the opening <ol ...> into the header on the second page? The idea would be to glue the list into one list when transcluded together. If not, it needs to be reverted. Cheers, Jack Merridew 14:33, 18 August 2008 (UTC)
-
-
- I've moved the close and reopen of the OL into footer/header respectively. I do that quite often to keep multiple pages of a Table of contents in the one wikitable. German Instrument of Surrender (7 May 1945) looks good. John Vandenberg (chat) 14:47, 18 August 2008 (UTC)
-
-
-
-
- I saw you do something like that before, so I took a go. It does leave the list looking funny on page two; people see a "1." when they're trying to get a "5.", so word needs to get out about tricks such as this.
-
-
-
-
-
- This whole notion of splitting pages up per the original printed pages is issue-rich. I can see the pluses, too. I wouldn't be surprised if a TOC spanning 4-5 pages is not uncommon; and indices even longer. I also thought a bit about a template for this sort of thing, but there's the page-spanning issue, again.
-
-
-
-
-
- Certainly pages such as common.css need to be generic enough for wide use. See my query above (now with the nowiki fixed). On things like an ol, the ol-element is generated; there's nothing in the wiki-text at all. So how would I specify a class (whatever style sheet it came from)? Another thing I've long though is that wiki-markup needs a div shorthand. Hard-coding works, but it's rather anti-wiki. Cheers, Jack Merridew 15:03, 18 August 2008 (UTC)
-
-
[edit] London Gazette
Jefferyseow 10:33, 26 August 2008 (UTC)
[edit] Thanks
thanks for the welcome! vandenberg may be the finest spaceport in the galaxy other than palmachim! :) ... 5768altalena 19:07, 20 August 2008 (UTC)
[edit] How I build up a dictionary of all words into a work
I'm pushing it.wikisource at the "edge of chaos", as you know, it's precisely where some new ideas find their appropriate environment to grow out (many disappear after a short life... no problem! It's evolution, baby!).
One of my projects is a python routine that gets a text as input, and gives a ordered list of all different words of the text (a dictionary). I posted the output of such routine, as first examples, here: it:Il cavallarizzo/Dialogo 1/Voc and it:Il cavallarizzo/Dialogo 2/Voc. If you like, take too a look at my it:Template:?, I'm very bold of it, even if it's so simple. it:Il cavallarizzo is a very difficult equestrian work of 16° century, written into ancient Italian, with images of pages online only, and any OCR is perfectly unuseful to read it.
If you think that the very primitive and rough python code I wrote could be of any interest here, ask me for it. I'll translate the comments and I'll post the code here where you like (I think, I'll post it into my user page, then any one could move it where most useful). Please use my it user account to reply if you like!--Alex brollo 08:28, 21 August 2008 (UTC)
[edit] a geegaw to the left of paragraphs
Hey! Look, a marker parked at the top-left of a paragraph and hanging into the margin. This paragraph, which is explicitly formatted with a p-element, an id, and style attributes, is just a demo.
I used the id to get an image; for the pagespace paragraph marker, we would use something else specified in the style sheet, probably with about 6 pixels of white-space on top to better align with the first line of text. I used this id because the selector is 'loose' — it doesn't care what element uses it (no element to the left of the dot). The id invokes the following from main.css (the name the css is served as, I don't know where this is in the MediaWiki:namespace).
.link-document {
background: url("document.png") center right no-repeat;
padding-right: 12px;
}
So, I've got a background and then use the style attribute to smack a few things, like the position and undo the right padding (which is now wrong as I want the mark on the left). I also set a negative left margin to extend the paragraphs block to the left and pull the text back the same amount with padding-left; but the background image uses the space I grabbed in the margin.
This will have all the same issues as the text-indent messing with centering, so the selector has to be a lot more specific than the thing that's in place for the indent. I'm thinking a basic rule that does all this and then a bunch of special cases where it's undone; like when we're not an immediate child whatever the div around the content area was called.
Jack Merridew 12:18, 21 August 2008 (UTC)
Skip the bit about top-padding; we can just offset it as needed; probably best to use em units for the top offset
Jack Merridew 12:31, 21 August 2008 (UTC)
-
- It's not working in the retarded browser and I believe it is only because the selectors for the IDs I'm using are more complex than the snippet I gave above (full given below). IE doesn't understand attribute selectors and is ignoring the whole thing as a result. So, this is just an artifact of the demo. Cheers, Jack Merridew 13:09, 21 August 2008 (UTC)
#bodyContent a.external[href $=".pdf"], #bodyContent a.external[href $=".PDF"],
#bodyContent a.external[href *=".pdf#"], #bodyContent a.external[href *=".PDF#"],
#bodyContent a.external[href *=".pdf?"], #bodyContent a.external[href *=".PDF?"],
.link-document {
background: url("document.png") center right no-repeat;
padding-right: 12px;
}
- Looks like you are getting somewhere. I'm not worried about IE for the proofreading interface in the Page: namespace; we can tell proofreaders to use Firefox if they want to see the nice paragraph markers. If need be, a JavaScript gadget could be used to provide the paragraph markers in the right spot. John Vandenberg (chat) 13:18, 21 August 2008 (UTC)
So were looking at replacing;
body.ns-104 p{
text-indent: 2em;
}
body.ns-104 .poem p{
text-indent: 0em;
}
with;
body.ns-104 p
{
background: url("paragraph-mark.png") 0 .3em no-repeat;
margin-left: -12px;
padding-left: 12px;
}
body.ns-104 center p,
body.ns-104 *.center p
{
background-image: none;
margin-left: 0;
padding-left: 0;
}
(or)
body.ns-104 * p,
body.ns-104 * * p,
body.ns-104 * * * p,
body.ns-104 * * * * p,
body.ns-104 * * * * * p,
body.ns-104 * * * * * * p
{
background-image: none;
margin-left: 0;
padding-left: 0;
}
The paragraph-mark.png image would need to be chosen and maybe a few values tweaked if it's not the same size as document.png.
If we blow-off IE6, we go with;
body.ns-104 div.pagetext>p
{
background: url("paragraph-mark.png") 0 .3em no-repeat;
margin-left: -12px;
padding-left: 12px;
}
If somebody gets cute with styling paragraph elements, they could mess things up, but, hey, let'em. Cheers, Jack Merridew 13:42, 21 August 2008 (UTC)
- Looks good. I think it needs to be proposed on WS:S, especially as there are options. If you have an image ready, the floor is yours. John Vandenberg (chat) 13:48, 21 August 2008 (UTC)
-
; tidy the license, please. How about you start the thread and I'll chip in; you know the terminology better. Cheers, Jack Merridew 13:57, 21 August 2008 (UTC)
-
-
- OK. Can you upload that to Commons? Other Wikisource sub-domains may want to follow suit. John Vandenberg (chat) 14:00, 21 August 2008 (UTC)
-
-
-
-
- Silly me; I uploaded it here because I was thinking it was wikisource specific and the other languages did not occur to me. It's on Commons now; same name, and tagged, so please delete the local. Cheers, Jack Merridew 14:10, 21 August 2008 (UTC)
-
-
I've added comments and tweaked the code to also key off of #bodyContent; this is necessary for the child selector form and good form for the others. I think someone should pop this into place for a test and see what the issues are (if any). Cheers, Jack Merridew 14:49, 21 August 2008 (UTC)
Ah, wrong div — the div with class="pagetext". I've refactored here and at ws:s, but have not been able to get this working via my local monobook.css. Something silly a bir besar ain't helping. We'll see what overnight brings. Cheers, Jack Merridew 15:20, 21 August 2008 (UTC)
I've proposed rolling this out on ws:s — no one's chipped-in; time to get moving. I've also suggested a new image; something basic such as '¶'. Cheers, Jack Merridew 08:57, 27 August 2008 (UTC)
- Agreed. I was just thinking today I should put it into effect so we can see the next step, but I got side tracked by the Koran. John Vandenberg (chat) 09:43, 27 August 2008 (UTC)
-
- You added this;
body.ns-104 p
{
text-indent: 0em;
}
-
- but that was only to override the prior code, which you removed. Unless there's something else going on, it's not needed. The other override for poems was also only to undo the now-gone rule; i.e. I agree with it's removal. Cheers, Jack Merridew 10:31, 27 August 2008 (UTC)
- Also, it's 'No IE6 Support'; IE7+ should do it fine. Cheers, Jack Merridew 10:34, 27 August 2008 (UTC)
[edit] Wind in the Willows (1913)
See these two edits; [1], [2]. I collapsed sequences of two blank lines down to one in the page headers. These double blanks were causing a br-element to be generated at the beginning of the first p-element on the page; these are p-elements that are actually the continuation of paragraphs from the prior page. Prior (ha) to the new paragraph marker technique, these blanks/br-elements caused the paragraph fragment to not indent which made it 'look' right. Now, it makes our paragraph mark float up a line. Either way, this has no impact on the presentation level; this is basically a bit of baggage from the old indent system. Be nice if your bot could smack all of these. Cheers, Jack Merridew 14:36, 27 August 2008 (UTC)
Oh, these appear on The Wind in the Willows/Chapter 7. Cheers, Jack Merridew 14:37, 27 August 2008 (UTC)
[edit] MediaWiki space
I've been nosing about and noticed a nit; in MediaWiki:Loginend, the <br clear="both" /> should be changed; <br clear="all" /> or, and better, <br style="clear: both;" />. or consider using {{clear}}. Cheers, Jack Merridew 12:50, 21 August 2008 (UTC)
Done John Vandenberg (chat) 13:18, 21 August 2008 (UTC)
[edit] template wikisorcery
Dear John,
I looked at Template:hyphenated word start and I asked myself: "what's its purpose?" Forgive my ignorance, but it employs more text , implies another twin template and and is more difficult to remember than <Noinlcude>-</noinclude>. What am I missing? - εΔω 15:54, 21 August 2008 (UTC)
- When two pages are pushed together, there is whitespace between them, so the work is broken. The two templates address this because only one of them puts the word into the transcluded output; the other is silent except in the "Page" namespace. Also, both of the templates also show the complete word when the user hovers over it in the Page namespace. John Vandenberg (chat) 09:18, 22 August 2008 (UTC)
[edit] proofreading
While proof reading, am I supposed to fix a minor problem, such as adding a missing dash or remove a space that shouldn't be there? - Epousesquecido 02:40, 22 August 2008 (UTC)
- Yes, please make any improvements you can. John Vandenberg (chat) 04:09, 22 August 2008 (UTC)
[edit] J'accuse
Hi. I just tweaked this and thought I'd let you know that this and w:en:Dreyfus affair are the backdrop to w:en:Papillon (autobiography); not the anti-semitic aspect, but a theme of injustice by the supposedly great nation of France and outing the truth of w:en:Devil's Island. Currently reading w:en:The World Is Flat. Cheers, Jack Merridew 08:30, 22 August 2008 (UTC)
- Vote for it here: WS:FTC#J'accuse, or just leave a comment, or improve the note on J'accuse. --John Vandenberg (chat) 08:41, 22 August 2008 (UTC)
- FTC is where I found it; I've not finished reading it yet! I was thinking of moving it to "J'accuse…!" and possibly converting the horizontal rules to something more like the ornament in the original.
-
- re the paragraph marker; if no one comments in another day or so, are we going go ahead? Cheers, Jack Merridew 08:45, 22 August 2008 (UTC)
-
-
- Go for it.
-
-
-
- Yes, incrementally probably. John Vandenberg (chat) 08:48, 22 August 2008 (UTC)
-
[edit] Presenting...
The as complete as I could get it based on the first source, which I've now exhausted up to 1922-works of Author:Banjo Paterson! :-) (dare to compare) —Giggy 13:47, 22 August 2008 (UTC)
[edit] Redundant page
The page in question is redundant to Westminster Confession of Faith, the deleted page was nothing but redlinks. Kathleen.wright5 05:37, 23 August 2008 (UTC)
- I've changed the header and its now on his Author page. Kathleen.wright5 06:05, 23 August 2008 (UTC)
[edit] User:CanadaCitizen
You do know who that it, right? Cheers, Jack Merridew 12:57, 23 August 2008 (UTC)
- Of course. John Vandenberg (chat) 12:58, 23 August 2008 (UTC)
-
- I saw that global blocking has been rolled-out, so the fun may end for the nakal anak. Cheers, Jack Merridew 13:08, 23 August 2008 (UTC)
- too rich. Cheers, Jack Merridew 13:10, 23 August 2008 (UTC)
[edit] OCR, with your bot, on fr.wikisource
Hi John,
Please, could you make some OCR with your bot on fr.wikisource:
- from fr:Page:Adam - Le Serpent noir (1905).djvu/14
- to fr:Page:Adam - Le Serpent noir (1905).djvu/418
Thank you :)
--LaosLos 16:03, 23 August 2008 (UTC)
The ThomasBot works again, there is no need of the OCR now :) --LaosLos 22:45, 23 August 2008 (UTC)
[edit] On the Vital Principle
Hi John,
I have created an index and copy-pasted the OCR from the Internet Archive. What must I do next? Do I ask your bot to put the text into the pages, or do I go on by hand? Is this page the place where to ask ? I am new with this in en.ws and a bit lost. Thanks for your help!- --Zyephyrus 16:53, 23 August 2008 (UTC)
- We do not need the raw OCR posted onto Wikisource, so I have deleted the OCR. In future, you should add it to WS:TP#DJVU files with a text layer, but I have started the text upload for this one; when it is finished, it should be added to WS:TP#Projects needing to be proofread. --John Vandenberg (chat) 01:14, 24 August 2008 (UTC)
[edit] Holly
There's a Italian girl/woman (I don't know, I only know her by web) that is translating into Italian "Equitation" of de Bussigny into a Google doc, so I can ensure you that she knows English and that she is willing and careful. I invited her to come here and validate Equitation, but I have been very surprised to see that validation is nearly done! I'm really happy!
No matter... I hope that wikisource would (perhaps) gain a good user. I hope that she would like this exciting environment. I presume, she will use the same nick that she uses into my forum: Holly.
About my "python wordlist generator": I just learned how to enter into Firefox dictionaries and to edit them. So, the first use of my routine will be, to produce lists of old words to edit those dictionaries. I guess, you know everything about, nevertheless if you don't I'll be happy to share with you (and with any other wiki user) my "discoveries". --Alex brollo 21:01, 23 August 2008 (UTC)
- I have welcomed her, and would love to see what you come up with for the Firefox dictionary. I do know how to create those dictionaries, but I dont scale, so it is great that you're discovering these things. John Vandenberg (chat) 01:32, 24 August 2008 (UTC)
[edit] OCR bot
Hi John, I heard you have a bot for amassive OCR; is it possible to have the code and run without asking, or is it a toolserver thing we don't access to? Thank you ve ry much for your kindness. Cheers --Aubrey 21:56, 23 August 2008 (UTC)
- My bot does not do OCR; it uploads the pre-existing text layer that is held in a djvu file. The code is open source, and can be obtained by downloading m:pywikipedia. It is called "djvutext.py".
- User:ThomasV has an OCR bot, User:ThomasBot, however that is a toolserver thing and can only be invoked from a Wikisource project.
- Cheers, John Vandenberg (chat) 01:20, 24 August 2008 (UTC)
-
- Ok, I think I counfounded 2 things. In italian Wikisource we have the OCR button for ns: Page, but I was wondering if there is some automatic produre (like a bot) to push a whole book in the OCR queue without doing by hand... Thank you anyway --Aubrey 19:57, 24 August 2008 (UTC)
[edit] Capitalization in Henley's "Invictus"
The Wikipedia source text of Henley's "Invictus" shows Circumstance and Chance capitalized - is this correct? I think capitalization often suggests a metaphorical reference to God, dramatically altering the interpretation of the poem. Most renderings show lowercase - which is faithful to the original?
Thanks, gary merkel
- This is a great question; thanks for coming over and asking on Wikisource.
- The Wikipedia page w:Invictus is wrong now, but it once was good (how often do you hear that?)
- The Wikisource page Invictus doesnt show Circumstance and Chance as capitalised, for good reason. Our edition was transcribed from Committed to Memory, and Google Books shows that circumstance and chance are not capitalised in that edition. The edition does show Pit as capitalised, and horror as lower case.
- I havent quickly put my hands on scans of the original, but here it is in a PD anthology: Page:Oxford Book of English Verse 1250-1900.djvu/1045, and it appears unchanged in a new edition of the same: Page:Oxford Book of English Verse 1250-1918.djvu/1043 and Page:Oxford Book of English Verse 1250-1918.djvu/1044. (if you see red links here, you can still click them to see what I mean..)
- The Wikisource page Invictus now is accurate to one fixed published edition.
- I recommend removing the text of the poem from Wikipedia, as it is of unknown provenance, and many edits have messed up what was once good. If you replace it with another good version, it wont be long before someone thinks that the capitalisation which appears in a different published edition should be used.
- Thank you for the very interesting query. John Vandenberg (chat) 04:40, 24 August 2008 (UTC)
[edit] Equitation validation has been done!
I can't believe it... the validation of Index:Equitation.djvu hes been finished! I'm so happy (and bold)! Thanks you and thanks to all friends that did that work!
I hope, Holly will find something to do here.... I gave her a couple of suggestions ;-). Let we see if she will find this strange wikisource environment as exciting as I found it. --Alex brollo 19:22, 24 August 2008 (UTC)
[edit] ManOfAllah/Qassim
Searching Qassim on the DloI I come up with the two books you list, but trying to view the pagescans in HTML, TXT or TIFF (PDF and GIF don't seem to work) both bring up a Foresty book from the 1960s; I assume you had the same trouble with both books? Sherurcij Collaboration of the Week: Author:Charles Spurgeon 20:30, 25 August 2008 (UTC)
- I did see the Foresty book, while we are bitching about Indian resources, http://www.isical.ac.in/~library/ has a "Web based library catalog is now available" and the link is to http://192.168.54.38/ *cry*.
- I was able to see pagescans of one book, and where I have said in my COPYVIO blurb that there is a 1930s dated preface I gave a link to the pagescan. John Vandenberg (chat) 20:35, 25 August 2008 (UTC)
[edit] Re Statute of Anne
It was Eclecticology that noticed it was a duplicate, see - Wikisource:Scriptorium#Category:Copyright_law Kathleen.wright5 05:01, 26 August 2008 (UTC)
[edit] Cinderella
Hi, could you please help me with something. I found another short, nice book to work on, but I can'r download the *.djvu for some reason. I can only download it as a Firefox Document. Could you please download it from here, and put it on Commons under the name Image:Cinderella (1865).djvu? diego_pmc 05:53, 26 August 2008 (UTC)
- To download, you need to click on "HTTP" on the left, find the djvu file in the list of files, right-click and download.
- If that doesnt work for you, keep the requests coming, as my bot does all the work for me: Index:Cinderella (1865).djvu --John Vandenberg (chat) 06:34, 26 August 2008 (UTC)
[edit] The Marriage of Heaven and Hell
- Moved to User talk:Jack Merridew#The Marriage of Heaven and Hell to keep the discussion together. John Vandenberg (chat) 08:21, 26 August 2008 (UTC)
- look now — 25, 26, 27, too. there has to be a more convenient way to edit the header and footer. Cheers, Jack Merridew 11:54, 27 August 2008 (UTC)
[edit] Template:Page
It seems that this template often doesn't get the 'num' arg passed to it; the Blake page, for example. This is causing the default id of pr_position being assigned to every little positioned page link along the left; and IDs are supposed to be unique on a page. Naughty, naughty. Seems to me that the first bit below (from first line of template code) should be changed to something along the lines of the second. In these circumstances I don't think the IDs are needed at all, so it would be best to omit them entirely.
- id="{{{num|pr_position}}}"
- {{#if:{{{num}}}|id="{{{num}}}"}}
Cheers, Jack Merridew 08:12, 26 August 2008 (UTC)
- Oh, id="pr_page" is being duplicated, too; not sure what it's for. Cheers, Jack Merridew 08:15, 26 August 2008 (UTC)
- They are needed by MediaWiki:Common.js, {{Option}}, and they are very brittle and could do with a ground up redesign. John Vandenberg (chat) 08:20, 26 August 2008 (UTC)
- That and the warning to not cavalierly edit the template is why I asked. When you get to that validator bot, hook it up to not allow saving edits that break validation. See these; [3] [4] [5]. Cheers, Jack Merridew 08:40, 26 August 2008 (UTC)
- Ha! I have all three installed, at work! :-) The bot will need to be requested at WS:BOTR, as I cant promise to be the one for tackle that. A dev mentioned that the sitenotice problem is unlikely to be fix with a high priority, especially if a bug isnt raised on bugzilla:. --John Vandenberg (chat) 08:54, 26 August 2008 (UTC)
- mebbe you need to pass those three links along to a few devs. oh well. Cheers, Jack Merridew 09:04, 26 August 2008 (UTC)
- and saw this? Jack Merridew 09:07, 26 August 2008 (UTC)
- Im not 100% sure about prefixing the anchors with "a" - it will break any existing incoming links from otherwikis and the internet, but the problem it fixes is important. We need a guideline on anchor names. John Vandenberg (chat) 09:14, 26 August 2008 (UTC)
- Another job for the before-you-save validator. I used 'a' for 'anchor'. On the Queensland page, I made up nice names, but SA had 150ish. I do most of that sort of thing with regex s&r. Any inbound links will still land on the same page. I've used 'n' occasionally (number). People change section headings all the time without realizing they're breaking links; and I've seen it done deliberately in a few cases. Cheers, Jack Merridew 09:30, 26 August 2008 (UTC)
- Im not 100% sure about prefixing the anchors with "a" - it will break any existing incoming links from otherwikis and the internet, but the problem it fixes is important. We need a guideline on anchor names. John Vandenberg (chat) 09:14, 26 August 2008 (UTC)
- Ha! I have all three installed, at work! :-) The bot will need to be requested at WS:BOTR, as I cant promise to be the one for tackle that. A dev mentioned that the sitenotice problem is unlikely to be fix with a high priority, especially if a bug isnt raised on bugzilla:. --John Vandenberg (chat) 08:54, 26 August 2008 (UTC)
I dont think we can require validation on save; the tricks being used on The Marriage of Heaven and Hell require invalid HTML on one page in order to have valid HTML on another. We can have a bot to check the validity of pages that are "finished". John Vandenberg (chat) 12:04, 27 August 2008 (UTC)
-
-
- [butting in] Whatever tags you need to add to validate the page can be put inside noinclude tags, so that the transcluded pages validate too. Hesperian 14:14, 27 August 2008 (UTC)
-
-
-
-
- We're discussing a deprecated 'start' attribute for ol-elements; and they are in the noinclude at the top. The invalid nature is only on the individual page, not the presentation page. The W3C validator is not picking-up on the deprecated attribute; somewhere along the line we'll get a warning. See post below with same timestamp. Cheers, Jack Merridew 14:52, 27 August 2008 (UTC)
-
-
- I though of that as I was using the evil start attribute on the ol-element; it means this is a hack, not a truly good technique. I expect that start was booted because it is viewed as an encapsulation violation; messing with the internals of the list. It's not failing outright;[6] the tool's not senseing the deprecated attribute (but it may someday). Local validator is not squawking either. Cheers, Jack Merridew 12:21, 27 August 2008 (UTC)
The page template of causing a ton of validation errors when there is a page number; it's creating id that are the number. These will have to change at some point and the easiest/cleanest change is to prefix the number with 'p' — which will break whatever inbound links to the page anchors. An icky option would be to change over the 'old' a-elements with a 'name' attribute, which do allow the first character (or all) to be a digit. Cheers, (or tears) Jack Merridew 14:52, 27 August 2008 (UTC)
[edit] Template:Hyphenated word
I'm missing the point of this. For words broken across the page, I get it, but for ones within a page, why bother preserving this artifact of the original layout? There is such a hyphenated word on Page:The works of Horace - Christopher Smart.djvu/13, and when I worked on that page, I simply removed the hyphen. There are probably few other pages I've done this one; maaf. Cheers, Jack Merridew 15:07, 27 August 2008 (UTC)
- Preserving the original line breaks makes a text much easier to proofread. The Page namespace should display these line breaks, so that proofreaders can quickly compare text and image in a line by line process. I thought I had raised this on WS:S but couldnt see it quickly just now. John Vandenberg (chat) 23:39, 27 August 2008 (UTC)
-
- Hmm… I can see that, sort of. It would seem to imply that you expect all of the original line-breaks to be preserved. Most of the pages I've seen don't do that; I've glued consecutive lines together by replacing newlines with a space. This is pretty standard stuff with wiki-text. Not doing this rather makes some assumptions about the user's screen width. If I edit Page:The works of Horace - Christopher Smart.djvu/11 (which has the original line-breaks intact) with the browser window at a less than large size, the line-breaks in the editbox look terrible; merging the lines and letting them wrap as appropriate makes the editbox a lot more readable. And none of this is respected when looking at the preview, which is what editors really should be checking. I've found proofreading tedious because I have to scroll up and down to compare the preview with the scan. From this perspective, the scan should be displayed alongside the preview, not the editbox. Cheers, Jack Merridew 06:44, 28 August 2008 (UTC)
-
-
- I dont expect much at the moment, except that we are starting to put our thinking caps on. The thread "getting my wikisource bearings" explains my experimentation so far. It sounds like you have come to the same conclusion as myself: it is the preview that needs to be different, and I threw some ideas about that in the WS:S(2008-08)#retain line breaks in the proofreading output thread I was looking for earlier. John Vandenberg (chat) 09:15, 28 August 2008 (UTC)
-
-
-
-
- I'm skeptical that bending over backward to preserve the original line-breaks is a good idea. The span wrapping you mentioned there is doable, but massively snots-up the wiki-text. I'm thinking you would want to have scripts rewrite the UI to hide them in the edit box? — I'm cringing at the thought. I make most non-trivial edits in an external editor; many do; you must. And all the gore will be right there in front of you. The whole page space is a tool to compare with the scan. Another tool, used late in the game, could strip such spans out and glue-up the lines, but that would rather preclude going back for further proofreading (unless tool2 forked the content, which has its own issues). I don't see the page space as really ever dispensable; that's where further edits will be made. See my query above at #Wind in the Willows (1913); if I'm right, thousands of pages will need to have the double line-breaks in the header removed. I've done some more; Horace/13, for example — stuck the {{hw}} in, too. Comment before I do too many more. I'm not sure where to begin with that gmane thread, but will scan it further. Cheers. Jack Merridew 10:29, 28 August 2008 (UTC)
-
-
-
-
-
-
- Ah! I hadnt thought of a bot creating&maintaining the mainspace pages. We could lock those pages to prevent people editing the main space. That would allow us to break free of the limitations that transclusion imposes, and add "Page:" semantics that the bot understands. John Vandenberg (chat) 10:39, 28 August 2008 (UTC)
-
-
-
-
-
-
-
-
- So you like the content forking idea. Would this be rather like de:wp does with their vetted pages (not quite the term they use; but pages are 'draft' until some admin approves the edits). Cheers, (before we ec again) Jack Merridew 10:53, 28 August 2008 (UTC)
-
-
-
-
See this, based on Page:The works of Horace - Christopher Smart.djvu/11;
In the present edition of Smart's Horace, the trans{{{3}}} has been revised wherever it seemed capable of being rendered closer and more accurate. Orelli's text has been generally followed, and a considerable number of useful annotations, selected from the best commentaries, ancient and modern, have been added. Several quotations from Hurd on the "Ars Poetica," though somewhat lengthy, have been introduced, as their admirable taste can not but render them accept{{{3}}} to readers of every class.
look at it in the editbox; assume that the p-elements came from MediaWiki and the style for the span from a style sheet, but that the spans really were in the wiki-text. 1) ick. 2) this has hopelessly broken the {{hw}}. Mebbe a pair of templates could be used. There's also the narrow screen user to consider. If things do go anywhere near this, avoid span; use some rare element to avoid selectors matching all the ones on things like {{sc}}. Cheers, Jack Merridew 10:47, 28 August 2008 (UTC)
- I think you are assuming that the span tags would be added before the templates are called. For starters, I am pretty sure that the templates are evaluated first, and then the HTML is added. But, I dont care whether the templates or HTML conversion happens first at the moment - if we are going to get a dev to add build something specifically for our Page: namespace, they are going to have to do it in the way that works for our needs. The main issue is we dont yet know our needs; we need to think it through and come up with something that works. When I posted that idea to WS:S, I was thinking that the templates must be go first, and then the span tags, which means the first two lines are:
In the present edition of Smart's Horace, the trans- lation has been revised wherever it seemed capable of
- That approach may have other problems. I'm not sure.
- Fixing our proofreading so that the interface is as usable as the w:Distributed Proofreaders interface is important enough that I'd develop it myself, or put a bounty on it for someone else to do it. John Vandenberg (chat) 11:07, 28 August 2008 (UTC)
-
- Ya, I was assuming you meant that the spans would be permanently in the wiki-text. You're talking about a later phase; Steve Sanbeg made mention of this to me on ws:s and I'd like to read more about this; got a link? You mentioned this to me once, too. Something along the lines of MediaWiki massaging the wiki-text on its way to web pages. Obviously it's generating a lot of code, but the idea of it running transforms and doing replacements is interesting.
- re rare elements, even post template processing, there may be other spans present; hard-coded in the wiki-text, generated for some other reason. So using some weird tag might be better; a class would help assure correct selection of what to apply block to, but bulks-up the code. Possibly a normally-block tag could be used, which would obviate the need of a css rule that targeted the element.
- I just created The works of Horace, which needs a bit of TLC; a bit premature, I know.
- Cheers, Jack Merridew 11:44, 28 August 2008 (UTC)
- This email is where I was at when I was posting WS:S(2008-08)#retain line breaks in the proofreading output (and just earlier I realised I had sent the list the wrong URL; no wonder there was no follow up:/) The wiki text eventually comes out as HTML. We can build extensions to do this transformation any way we like. Somewhere and when in the gutvol-d list we discussed using our own tag (i.e. not a HTML tag) to mark the start and end of a line, and that has also been discussed on wikisource-l. No need to go looking; there wasnt much to those discussions, and I'll prolly remember where I filed them while I am asleep :-). Basicly you are independently confirming what most of us have been thinking, which is not bad for someone without many mainspace edits :-) The more difficult part is doing it, esp. as that probably requires improvements to either core mediawiki code, or an extension. John Vandenberg (chat) 13:55, 28 August 2008 (UTC)
-
-
-
- I considered mentioning a custom xmlish tag, but we've enough validator issues. Good to see you saw the narrow window issue (or that I saw it, too.) See w:Microformats; basically a span with a class; pick its name well. fyi, I ran the tool you used on Giggy's RfA and I didn't have edits to the 25 mainspace pages it cuts off at. I ran it for id:wp, too, and found out that 40% of my edits there are to templates. Cheers, Jack Merridew 14:21, 28 August 2008 (UTC)
-
-
[edit] History of Iowa OCR
It looks like the proofread text here was more incomplete than I thought. Could you please upload the OCR text of leafs 591–660 of Index:History of Iowa From the Earliest Times to the Beginning of the Twentieth Century Volume 3.djvu? Psychless 22:55, 28 August 2008 (UTC)
- This is in progress now. John Vandenberg (chat) 23:54, 28 August 2008 (UTC)
[edit] Links to Australian Copyright Act 1968 and Copyright Amendment (Digital Agenda) Act 2000
I've just posted links to these acts at Wikisource:Scriptorium#Announcements. Kathleen.wright5 02:05, 29 August 2008 (UTC)

