Wikipedia:WikiProject Molecular and Cellular Biology/Proposals 

Discussion
This is an appropriate place for general discussion about the project and its direction.
Announcements
This is an appropriate place to make announcements to other project members.
Help Requests
This is an appropriate place to ask help of other project members.
Proposals
This is an appropriate place to make and discuss proposals with other project members.
Discuss proposals concerning the Molecular and Cellular Biology Wikiproject here.
Please click here to start a new proposal.
Archive

Archives


1 2 3


Contents

Wikipedia, Pfam/Interpro and SMART

Most of you know about Pfam/Interpro that provides brief but very systematic annotations (short summaries) for different protein families, and also about SMART that does the same for different protein domains. These summaries are at the level of "stubs" or better. I understand that Pfam/Interpro and SMART operate under the same "open access" policy as Wikipedia, which means that everyone can copy and modify the content. It would be possibile to identify a set of most important protein families and domains that are missing in Wikipedia but present in Interpro and SMART, and copy their summaries as the initial Wikipedia "stubs" with a reference and link to the corresponding Interpro or SMART entries. We could also ask people from Interpro and SMART what they think about such idea, and they might be even willing to help. Biophys 17:04, 16 November 2006 (UTC)

See list of SMART domains: [1]. Few of them can be found in Wikipedia. I think the summaries can be downloaded to Wikipedia automatically, but it is important to have a consent from SMART authors. Of course, the idea is to improve these short summaries in the future. Biophys 17:43, 16 November 2006 (UTC)
SMART uses annotation from InterPro which contains copyrighted information such as PROSITE annotation. I have e-mailed Pfam and asked about the copyright status of their database. TimVickers 19:42, 16 November 2006 (UTC)
Their reply was as follows:

Hi Tim,

Pfam is distributed under the terms of the GNU GPL license. According to that license any derivatives should also be distributed under GNU GPL. However, we tend to take a pragmatic view for small parts of the data to make Pfam maximally useful. Do you have an example of the kind of info you would take from Pfam?

Pfam is really a database of protein family annotations rather than for individual proteins. We would certainly be interested in providing links etc and whatever information we can.

Yours sincerely Alex Bateman

Good. If I understand correctly, Wikipedia operates under GNU license. What I mean is this. For example, Wikipedia has no article about C2 domains. I would go to SMART C2 domain annotation : [2], copy the annotation, maybe modify this annotation (but maybe not), make internal Wikipedia references within the annotation, and provide this link to SMART [3]. That would be a stab about C2 domains. Someone could improve in the future. Whould that be fine? I can do this for a couple of domains as an experiment, and then ask Alex Bateman if he likes it. Of course, it would be much better if people from SMART/PFAM team generate such Wikipedia stubs automatically (but one have to make sure that the corresponding article is not already in Wikipedia). Then, someone could look through these stubs and wikify them. Biophys 22:51, 11 December 2006 (UTC)
You can't do that with SMART, because as I said earlier, this contains copyrighted information from Prosite. However, you can do this with Pfam. TimVickers 23:21, 11 December 2006 (UTC)
Then I will use Pfam if needed. Actually, the annotation in SMART consists of two parts. One part is abstract from INTERPRO, and it is exactly the same as in Pfam. Another part is a kind of header ("Description"), which is not taken from PROSITE but can be found only in SMART. Biophys 00:56, 12 December 2006 (UTC)
I have created several new articles using this method. Pfam helps a lot, but some editing is usually required. Unfortunately, some Pfam entries are poorly annotated. Biophys 04:05, 12 December 2006 (UTC)
Pfam got back in touch this morning. TimVickers 16:58, 2 February 2007 (UTC)
Hi Tim,
I have speoken to several members of the Pfam consortium and there is unanimous support for you doing this. Please let me know if we can help with this.
Are you also interested in RNA families? I am also in charge of the Rfam database. One of our goals for the coming year is to make the annotation for Rfam into a community resource using a wiki. However if this were part of Wikipedia then so much the better. Do you think that is feasible?
Yours sincerely
Alex Bateman

Scientific citations

Would your WikiProject like to endorse Wikipedia:Scientific citation guidelines? If so, please let those editors at that guideline know. --ScienceApologist 19:07, 1 December 2006 (UTC)

I agree. Consistent referencing is important for all articles (although I can only be bothered to do Harvard and will leave it to some wikignome to convert Harvard to footnote references (which I do prefer over Harvard; just not enough to go through the bother)). I predict support from the rest of MCB and am now on my way to let those editors know that we endorse the guidelines (if enough people disagree (I think unlikely) we can always revoke our endorsement. --Username132 (talk) 22:20, 15 December 2006 (UTC)
Doesn't seem to be much activity on this issue; if there's an official stage of endorsement or whathaveyou, we can probably move on to that. Opabinia regalis 04:13, 20 December 2006 (UTC)
I'm not sure what would constitute an "official stage". These guidelines are already operational. The page currently starts off with:
This page is a guideline for Mathematics, Physics, and Chemistry.
It expresses the consensus of editors in those projects about specific details of inline citation. Editors in other scientific projects should follow the practice followed by those projects.
WikiProject Chemistry was just added today, following a "vote" of endorsement at Wikipedia talk:WikiProject Chemistry#Wikipedia:Scientific citation guidelines. The question here is: can Molecular and Cellular Biology be added to the projects explicitly listed on the guideline page? At the moment there is no indication of consensus, but only the absence of manifest opposition.  --LambiamTalk 08:37, 20 December 2006 (UTC)

Vote on proposal CLOSED I set up a vote on this. TimVickers 18:39, 20 December 2006 (UTC) Vote page

Proposal from Novartis/GNF

I got an interesting e-mail this morning.

Hi Tim,
Since it looks like you have some official (or at least very active) role in the MCB project at Wikipedia, I thought I'd try emailing you first. I'm wondering if there is a potential synergy between our two projects...
I lead the "Symatlas" project at GNF (http://symatlas.gnf.org/SymAtlas/). The goal of this application is two-fold. First, we want this to serve as a "gene portal" (with a mammalian bias) which collates all the relevant information in the public domain for all genes. Second, we use this application to release our data into the public domain. Right now, we primarily have gene expression data, centered around our "GeneAtlas" data set which measures expression across an anatomically diverse set of tissues. In the future, we will also post our data for large-scale siRNA screening.
Right now, we're in the process of rebuilding SymAtlas to improve the user interface, responsiveness, features -- pretty much everything. One of the things on our list of new features is a wiki. We were originally thinking of maintaining (and possibly coding) our own wiki for tighter integration with SymAtlas, but actually the MCB effort may be a good partner. You guys have seeded quite a bit of content and probably have a pretty broad audience. We have a bunch of custom data that we could contribute, and we also have a decent sized audience (3000 visitors and 50,000 pageviews per week).
Anyway, let me know if you think this might be mutually beneficial.
Cheers,
-andrew
Andrew Su, Ph.D.
Genomics Institute of the Novartis Research Foundation

I replied.

Hi Andrew
Thank you, this sounds like a good opportunity. We are always happy to co-ordinate with people who want to add content to Wikipedia. Obviously, anything that is added must be licensed under the GFDL and be verifiable (published elsewhere). The advantage to using Wikipedia for distributing data is that it has very high visibility and can be integrated with an unlimited number of other resources. The disadvantage is that, due to open editing, the data can be altered and is thus less reliable than information maintained on a third-party website.
With these advantages and disadvantages in mind, what information and what form of presentation were you considering? One possibility that come to mind is the Protein Infobox
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Molecular_and_Cellular_Biology/Style_guidelines#Infoboxes
It might be possible to import data from your project to this standard format and produce a basic summary for each gene/protein in your database.
I will post this on the MCB website and canvas for other ideas. Please feel free to join and participate directly!
Thank you again for your interest.
Dr Tim Vickers

So does anybody else have other suggestions as to how we could coordinate? This could be a very valuable collaboration. TimVickers 16:56, 5 January 2007 (UTC)

Hi Tim!
It's a very tempting prospect, and I'd support it all; way to go, big N! :) I'm worried only that the uploading of new (albeit verifiable) data to Wikipedia would violate WP:NOR. Could we get a special dispensation for uploading data that anyone could verify, e.g., the results of publicly available web-servers for a given protein sequence and similar stuff? Alternatively, they could "publish" at their site and then copy the data over to us (that's probably what you were thinking?) but the unnecessary duplication pains me somewhat; Occam's razor and all that.
Perhaps they'd be willing to open up their wiki to members of the MCB WikiProject as a GFDL resource, so that we and others could add content with a good conscience? I'd devote hours to describing my favourite proteins, as would others here, I believe; at least I recall several people on our membership list who wanted to make a page for every known protein... Willow 18:05, 5 January 2007 (UTC)


Thanks Tim for putting this in the appropriate place for discussion. I thought I'd supplement my rather general email above with a few specific proposals/ideas.
First, two ideas on how we might be able to contribute content to Wikipedia that could be accomplished pretty easily (if they were deemed to be desirable).
1) For all the gene entities that are currently in Wikipedia, we could add the gene expression profile from our SymAtlas database to the Wikipedia gene page. Yes, these data have been published previously. I would need to check with our legal department to confirm that we can release it under GFDL, but I'm confident that could happen. In terms of whether the data is appropriate/interesting, we feel that information on where a gene expressed in the human/mouse anatomy is a basic piece of gene annotation.
2) As mentioned above, one of the goals of SymAtlas is to be a gene portal, and as such we have a database of nonredundant mammalian genes, plus all the links to native data sources (Entrez Gene, Ensembl, Affymetrix chips, GO, etc.) In total, we have ~190K species-specific entries between mouse, human, and rat, which means just over 60k species-independent genes. (I know, counts are high, but so far we've been conservative with collapsing gene entities.) Anyway, we could do some sanity filtering, and then create stubs for a large number (thousands or tens of thousands) of genes with expression patterns and appropriate database links.
And now, two ideas on possible ways SymAtlas could utilize Wikipedia:
3) We'd like to embed a wiki section in a SymAtlas "gene report" (together with other gene annotation information). If the wiki were Wikipedia, then the link to edit a page would clearly just redirect to Wikipedia itself. But for display in the gene report, we'd need to figure out a way to capture the content without a lot of the surrounding elements. For example, we'd want to take off the left navigation bar and much of the top header. Is this a faux pas?
4) We'd like to create simple ways for people to add content in a structured way. For example, I'd like it if we had a simple text-box where people could enter a Pubmed ID or a URL, and hitting "submit" would trigger a bot to add that link to an appropriate area in the wiki. We get a lot of users at SymAtlas who don't have a lot of computer sophistication, so if we want them to contribute their biological info, we need a pretty darn low barrier of activation.
And finally, two ideas/thoughts on SymAtlas-specific needs that come to mind during this initial brainstorming process:
5) Although Wikipedia wants to catalog previously published findings, I think there's probably a use to have a wiki that allows contribution of less-substantiated results. (Perhaps this has already been discussed in this forum?) I think it would be cool if someone could, for example, take a list of 100 genes that were differentially expressed in their gene expression experiment, search for them in SymAtlas, create a tag that describes the preliminary finding, and post it to each of those 100 wiki pages (in a specific section, of course). This might be a decent way to foster new collaborations. However, given the less rigorous nature of these data, maybe that's an argument to set up a parallel wiki effort, and SymAtlas would combine content from them both.
6) And finally, (the answer the question of what does GNF/Novartis have to gain from all this) I'd really like an internal-only section for proprietary content. This would house data like "four small molecules targeting this gene all failed due to liver toxicity" with links to the appropriate internal reports. Perhaps this is an argument for a third parallel wiki (or building a custom wiki solution with a security model built-in).
Okay, I think that's it for now. Sorry for the long-winded reply, but we've been thinking about the possibilities for a while and are quite excited! AndrewGNF 18:27, 5 January 2007 (UTC)
A few quick notes. First, does the fact that these data have all been published satisfy the WP:NOR? Second, we don't yet have a wiki, but when we do have it set up, it will be open to the entire community. Whether we use wikipedia, the MediaWiki software, or build our own is still up for discussion. But we hope to have something in place in six months. Third, in case you want to see what a SymAtlas gene report currently looks like, this is the one for ITK (though as mentioned above, the user interface is currently undergoing a thorough refactoring, also targeted to be done in ~six months...) AndrewGNF 19:02, 5 January 2007 (UTC)
We have been discussing using Pfam summaries as the basis of short articles. If we could merge your expression data and links to other databases with the relevant Pfam description, this would be a good solid base for a gene page. TimVickers 19:46, 5 January 2007 (UTC)
That could be a great collaboration! But I just received a message from User:Where who said that Wikipedia is under the GFDL, and not the GPL, so he is not sure if we can use Pfam summaries. So, can we actually use them? Biophys 20:08, 5 January 2007 (UTC)
We do have links to protein domains, but actually through Interpro. Interpro is to protein domains what SymAtlas is to genes. Given many different data providers with different IDs for a single concept (genes/families), SymAtlas and Interpro seek to create a nonredundant index. Check out, for example, the SymAtlas page for CDK2. Half way down the annotation column on the right, you'll see a section for "Protein Family" and a link to "Protein Kinase". That InterPro ID links to Pfam, as well as the corresponding model for "Protein kinase" in Prodom and Prosite. But bottom line, our expression data is on a per-gene basis, so it wouldn't make sense to link on that level (since protein families contain multiple genes). These expression data (and all the other links to public databases we've assembled) I think would be most appropriately presented on a "gene" page (e.g., P53) AndrewGNF 20:20, 5 January 2007 (UTC)
This may be slightly off topic, but have you guys considered adding OMIM content? e.g,. P53. In terms of having highly curated and referenced annotation, I can't think of a better source. Their terms of use also appear to be very permissive. AndrewGNF 21:28, 5 January 2007 (UTC)
Sadly we can't use that. All content in Wikipedia must be licensed by the GFDL licence, which allows unrestricted copy, modification and use. TimVickers 21:45, 5 January 2007 (UTC)

This collaboration looks great but i do wonder how this will work with respect to copyright issues. I am unclear on what you mean, Andrew, by an internal-only section. If this is what you need then you will certainly have to have your own wiki. i wonder if your best bet is to use the wikipedia as a testing ground for what you want to do on your own wiki. This allows all scientist (actually anyone) to have some input with regard to the format and content. In other words develop it here and then tranfer the results to your own database. In that way you will be able to tap into opinions and expertise here without running foul of the open access. It seems you gain more than us but if you do a good job we will have access to a lot of great data and that is all that matters. David D. (Talk) 22:28, 5 January 2007 (UTC)

Obviously there are some issues to work out here, but this sounds like a great idea overall. To go through the list above..
  1. Is an excellent idea. We'll need to have a centralized list of all our gene articles (I'm sure there's some stubs in the wrong category/not in any category), but that can probably be arranged.
  2. Sounds like the MCB answer to Rambot's articles :) Andrew, have you given any specific thought to how this might be automated? (Maybe Rambot itself could be modified and repurposed.)
  3. You can reuse, redistribute, and modify Wikipedia content as much as you like as long as you stay within the terms of the GFDL. It sounds like you might want a Wikipedia mirror; note that you probably can't "remote-load" from Wikipedia's servers on-the-fly. There are instructions and information for obtaining regular static database dumps here and here if you want to have a look and see if this might suit your purposes. You can still link to Wikipedia for editing, but your local copy won't update until you load the next data dump containing any edits made by your users.
  4. I think this idea needs some fleshing out. What is the user intending to do with this PMID or URL? It's unlikely that people here will approve of users elsewhere being able to trigger a bot that systematically adds links to a large number of articles - that would be interpreted as (and has the potential for abuse as) spam, even if the links are well-meant. Adding links to appropriately titled external links sections of individual articles, however, is very easy, and possibly someone could write very simplified editing instructions just for this if you anticipate that it will be a common activity. (Bear in mind that, even if not bot-assisted, large numbers of users arriving from the same site to add related external links will set off some people's spam alerts anyway, so this may require some coordination.)
  5. I agree that this would be useful, but it doesn't sound like the sort of content that Wikipedia hosts, and doesn't sound like the sort of thing you'd want on a publicly editable wiki. (Vandalism to a list of genes could make a very big mess.)
  6. Internal data absolutely should be on your own internal wiki, not here. That's definitely not the sort of thing that Wikipedia hosts, and the nonprofit Wikimedia Foundation probably couldn't legally host that kind of information on its servers. Not to mention the fact that, from your perspective, it would be incredibly insecure to host internal data and documentation on external servers, maintained at least partially by volunteers, that also contain data routinely modified by the general public. Opabinia regalis 03:32, 6 January 2007 (UTC)

A few clarifications based on questions/issues above. First, by "internal-only section", I was meaning a place for GNF/Novartis scientists to contribute to a wiki that would not be visible to the public domain. Clearly this would be independent of Wikipedia, maybe even a separate MediaWiki instance that we would host locally (and within our firewall). I'd be interested to hear if anyone has done any work creating a single user interface that combined content from two (or more) wikis.

I hadn't given any specific thought on how to automate the creation of stubs, but I assume we could create some sort of bot to do it. If anyone has any specific suggestions or knowledge (beyond the Rambot links above which we will check out), please point us in the right direction...

Based on the feedback above, the current tentative plan (for item #3 above) would be for SymAtlas to link up three separate wikis. One would be Wikipedia, which hosts well-substantiated findings. The second would be a wiki for more speculative content, and this would be where we may try to put the simple URL / PMID / keyword tagging feature described in #4 and #5 above. We would host this wiki but it would be publicly accessible. The third wiki would be for "internal GNF/Novartis content" (described in #6 above), and we would host this internally within our firewall. SymAtlas would then aggregate all content from these three wiki sources (taking into account the remote-loading policies) and display it integrated within our gene reports. Any comments on this plan are welcome...

Finally, it sounds like there is reasonable consensus and support for using our content to seed Wikipedia stubs. This would contain links to NCBI, Ensembl, Interpro, etc. and also a chart showing the gene expression pattern across anatomic regions (#1 and #2 above). I'll propose that we start with creating just 5-10 gene stubs to get feedback. Anyone have any comment on plan? Also, I don't want to commit to any timeline yet until our SymAtlas development plan is clearer. AndrewGNF 02:30, 11 January 2007 (UTC)

This sounds like a good plan. Thanks Andrew. TimVickers 03:35, 11 January 2007 (UTC)

I suggest consultation, becase such a link could be misinterpreted. I think a good way to proceed if you want to use an internal wiki is to go boldly ahead and do so--MediaWiki is available for anyone, and many organizations have done so. One way to link WP to it would be a federated search engine , run by you, that merely searches WP (as mentioned). Alternatively, you could maintain a WP mirror, many organizations do, and use it however you please. You could certainly make it available to the public, and it would be very good to do so, but I suspect usingthe WP name for it would not be liked. You could also use it in any combination you like privately--there is no restriction on commercial use of WP. You could use any template or infobox in article space. WP will gladly accept PD content from anywhere, especially from such a reliable source as you, and I think would make arrangements to load the material as it has many other PD sources. We can use outside software that is explicitedly GFDL or PD.

Material that has been published in a peer-reviewed article is always acceptable. Information that is not, but has been taken from an authoritative web source that is known to screen material and maintains integrity can also be used--we use PubChem without concern for where exactly they obtained the data.

OK, you;ve got three similar views. I agree with the suggestion thatthe best step would be for your people to conribute to WP according to the usual WP standards. perhaps this is already being done? DGG 04:25, 11 January 2007 (UTC)

PD = Public Domain? When you say "explicitly GFDL or PD", is there a similarly specific definition of PD? AndrewGNF 21:37, 11 January 2007 (UTC)

Migration To White Background Images

When I ecounter an image on a black background, I find I have to turn up the brightness/contrast of my monitor which is unnecessary when the background is white. Do you find it acceptable that the MCB project should support the swapping of protein representations with black backgrounds to ones with white backgrounds? I'm not saying that we should take on the task of changing all pictures with a black backgrounds, but I think it is acceptable to do so (cf. with it being unacceptable in most cases, to swap a white-background image for a black-background version).

I've been told that most visualization programs default to black backgrounds because it's easier to make colors appear to blend with black than white when not using anti-aliasing. Observation bears this out, though I'm not sure why. Raytraced shadowing and depth cuing also look more realistic to me on black, although I agree that white looks nicer in articles (though I've never had any contrast issues) and that white should be preferred in most cases. I just don't always remember to switch ;)
More generally: the recommendations section of the pymol tutorial is currently way down the bottom; should it be further up and/or on a separate page? Opabinia regalis 01:08, 31 January 2007 (UTC)
I think it's best at the bottom, since people aren't ready for recommendations when they're still learning to use the program. I have my monitor brightness turned all the way down and contrast pretty low most of the time. I find it more comfortable (I think the monitor more closely resembles paper this way. --Seans Potato Business 21:43, 31 January 2007 (UTC)

Advice and guideline subpages

Why do we have both [the help page] and Wikipedia:WikiProject Molecular and Cellular Biology/Advice? The title sounds redundant and the current content doesn't really read like a place to get advice. The contents of the advice and external links subpages seem like they ought to be merged to something called "resources"; am I missing the point of these? Opabinia regalis 03:17, 1 February 2007 (UTC)

I think you're right. We need to consider the best way to deal with our advice pages:

Wikipedia:WikiProject Molecular and Cellular Biology/Advice
Wikipedia:WikiProject Molecular and Cellular Biology/Style guidelines
Wikipedia:WikiProject Molecular and Cellular Biology/External links
Wikipedia:WikiProject Molecular and Cellular Biology/References
Wikipedia:WikiProject Molecular and Cellular Biology/Pymol tutorial
Wikipedia:WikiProject Molecular and Cellular Biology/Diagram guide
They should be integrated in whatever manner deemed suitable rather than allowed to develop independantly of each other. --Seans Potato Business 17:18, 1 February 2007 (UTC)

Proposal: Delete Page: Articles Needing Attention

I propose that the needs of the articles needing attention page is met by the article worklist and should be removed. Unless I'm missing something of course... --Seans Potato Business 01:41, 14 February 2007 (UTC)

I think the idea there was to trigger immediate work on particularly abominable articles, though plainly that hasn't panned out yet. Possibly because it's hard to keep track of category contents? Opabinia regalis 02:04, 14 February 2007 (UTC)
But since the work list combines a rating of how important an article is with how complete it is, do we really need to keep this page? If we do, shouldn't we have it update automatically using the info from the worklist (i.e. high-importance yet low-state-of-completeness articles)? --Seans Potato Business 06:07, 14 February 2007 (UTC)
I would suggest to keep the page. There may be articles that need attention for other reasons, e.g. NPOV on controversial topics, incorrect statements, articles tagged by other editors/users as too technical or needing expert attention etc. - tameeria 16:00, 19 February 2007 (UTC)

MCB Template Text - Comments Too Small

When someone makes comments that are presented on an MCB-supported article talkpage template, they are far too small. I have to struggle to read it. I'm talking about, for example, where I say: Needs more yeast-based coverage. Some sections over-represent bacterial-based methods, relative to yeast. on [Talk:Two-hybrid_screening] - could someone increase the size to normal? I had a go but couldn't produce an effect. Thanks. --Seans Potato Business 05:49, 14 February 2007 (UTC)

Does it look better now? I increased the font size to 90%. If it doesn't look any different, try refreshing your cache?
On a related note, do we want to keep the recent change that adds the collaboration of the month to the template? Announcing the collaboration article on 4000 pages that may have only a tangential relation to the collaboration topic seems a little spammy to me, but maybe it'd draw in more contributors. Opabinia regalis 17:20, 19 February 2007 (UTC)


Userboxes

I propose that a userbox for WikiProject Molecular and Cellular Biology should be created. This may spread publicity about this project , bringing more people to work for related articles. It may also look nice on a userpage. Sodaplayer talk contributions 01:05, 23 February 2007 (UTC)

{{user Mol Cell Bio}} is probably what you're looking for? Opabinia regalis 01:10, 23 February 2007 (UTC)

new orthologs template

It's taken a while to get around to it, but I've put together a first-draft proposal of a protein info box that we (GNF) could populate in an automated fashion for ~10K genes. (Recall the previous discussion here.) I put this draft on my user page out of ignorance of a better place to put it (moved a working example to ITK (gene) AndrewGNF 19:27, 13 March 2007 (UTC)). Also, I created/modified two templates (Template:GNF_Ortholog_box and Template:GNF_Protein_box) to create this example; the "GNF" prefix was to make sure I wouldn't muck up anything existing.

In the example, I tried to integrate the ortholog box into the main protein box here, but it obviously didn't work. Anyone have thoughts on how to accomplish this? Possibly a related question, how do I find out how the "drugInfoBox" works?

Finally, any other comments/questions/suggestions would be welcome. If people generally like this, then the next step on our end would be to write a bot to create 5-10 of these stubs for further comment. (And in case it's not clear, we're definitely Wikipedia newbies, so any and all suggestions are welcome...)

AndrewGNF 18:06, 7 March 2007 (UTC)

Excellent. I'm afraid I'm not a technical person, but I will try to help in any administrator or proofreading way I can. TimVickers 19:25, 7 March 2007 (UTC)

Having a look at Wikipedia:Bot policy might help, we could also request for somebody to write it at Wikipedia:Bot requests. TimVickers 01:42, 9 March 2007 (UTC)

Thanks, we've definitely had a look at the Bot policy. Looks like we have the expertise here to handle it. Bandwidth is a little more uncertain, but I'm trying to get a couple interns to consider this project. If not, then perhaps I will post it over there as a bot request. (Or, if there is anyone here who's interested in collaborating on writing the bot, let me know...)
BTW, I moved the example template to ITK (gene)... AndrewGNF 22:29, 9 March 2007 (UTC)
Andrew, did you get the templates to tile as you intended? Now that she's back, you might want to ask Willow if it's still not working; she's done some nice and fancy template work. Opabinia regalis 00:04, 14 March 2007 (UTC)
Nope, still haven't gotten it resolved. Thanks for the tip -- I will see if Willow can work her magic... AndrewGNF 00:53, 15 March 2007 (UTC)

FYI, I figured out the whole nested table issue and updated my example gene (ITK). Since my last post here, I've also added a bunch of other information that we have in our database. It's not pretty, but I think it has a lot of useful information. I think we're close to finding a student to take on the project of writing the bot (in collaboration with the bioinformatics program at SDSU). As always, I'd love to get any feedback... AndrewGNF 01:12, 30 March 2007 (UTC)

Oh, and in full disclosure, I also just approached the CZ folks with the same idea. Personally I'm pretty agnostic with respect to where we do this, and it's certainly not mutually exclusive either. Anyway... the CZ forum post AndrewGNF 01:17, 30 March 2007 (UTC)

Matching page titles with HUGO names

A colleage and I were looking at the entry for his favorite protein, initially called Zif268 and later renamed as Egr1. Egr1 is now the official name at HUGO (HGNC:3238), and HUGO is the "official" source for gene names. Right now, Egr1 redirects to Zif268, but I wonder if we should reverse this so that Zif268 redirects to Egr1. We'd also need to update ZENK and Early Growth Response Protein 1, other alternate symbols/names which redirect to Zif268, to avoid double redirects. Two questions -- is this the correct thing to do, and is it an important thing to do? Presumably there are many cases where the main page is not found under the HUGO title, and perhaps this is another candidate for a bot to fix (and certainly our stub creation bot should be aware of the best practice in this regard). AndrewGNF 02:26, 10 March 2007 (UTC)

I think the rule is that the gene/protein should be found under it's most common name. Obviously, for most genes this is a moot point, as they have no real common name. Otherwise, of a gene has a protein product that has a famous name (such as trypsin or PRSS1) then the info should really be integrated into any existing content at the trypsin page. In practice, 99% of human genes will have no Wikipedia entry, so I think adding the genes under the HUGO names and ignoring trying to automatically fix redirects would be safest. As you say, changing a page to a redirect is best done manually, as there are dependencies in other pages that may need altered. TimVickers 05:05, 10 March 2007 (UTC)

ProteinBoxBot specs

Now that our test gene, ITK, is nearing completion (thanks in no small part to recent efforts by David D.), I'm preparing to request approval for development and a trial run of the ProteinBoxBot. I've moved the proposed specs over to the ProteinBoxBot's user page. Comments and feedback are welcome. I'm tentatively going to put in the request for approval at the end of the week... Cheers, AndrewGNF 22:10, 3 April 2007 (UTC)

Hi Andrew,
ITK looks beautiful! Would you be so kind as to post the source code that you're using the ProteinBoxBot? It might be helpful to me and probably others.
I'd also recommend making a special category for the ProteinBoxBot stubs, like Category:ProteinBoxBot stubs as a sub-category of Category:Protein stubs. There are stubs and there are stubs, and it'd be helpful to have a separate list of the stubs produced by the ProteinBoxBot. I'll do the same for Daisy's taxonomic stubs, once I finish proofreading the taxonomic files.
Well done and good luck; the fun is about to start! :) Willow 22:24, 3 April 2007 (UTC)
Hi Willow... I'd be happy to share the source code with whomever is interested. (Although Rambot posts in his FAQ that he won't give out source code out to discourage "script kiddies", so perhaps our code shouldn't be posted freely.) Also, to be clear, we've yet to write a single line of code, and we'll probably extensively use one of the available libraries. A ProteinBoxBot stub should be simple enough to add. Thanks for the encouragement! AndrewGNF 01:21, 4 April 2007 (UTC)

Proposal from Rfam database

Hi everybody, I got a interesting e-mail from the Rfam database curator today.

Dear Tim

I work on the Rfam database run by Alex Bateman at the Sanger Institute in the UK. You contacted Alex recently regarding a proposal for Pfam annotations. One of our goals for Rfam this year is to make our family annotations a community resource. We would prefer if this annotation was implemented using Wikipedia and after some browsing we think the Molecular and Cellular Biology Project format would really suit our requirements. In addition we think MCB project would benefit from a daughter ncRNA project.

Would the MCB be interested in this contribution from Rfam?

I am not sure what is required for us to implement this within Wikipedia but our intention for Rfam is to download the relevant Wikipedia entry for each family and display the Wikipedia information ( clearly identified as Wikipedia ) and to provide links back to Wikipedia to encourage the experts in our user community to contribute to these annotations. Currently there are only really generic Wikipedia entries for most of our families such as 'ribozyme' but we would like to extended to create entries/stubs specifically for all our families.

We could see the role of Rfam as coordinating the effort and championing the use of Wikipedia. We are planning to attend the annual RNA meeting at the end of May to drum up support for this effort.

Could you let us know if MCB would be interested in this contribution from us? And if so could we discuss how we need to go about implanting this Wikipedia?

I hope to hear from you soon

Regards

Jennifer Daub

I responded by saying that we were interested and putting her in touch with Andrew Su from GNF, who has already done some of the groundwork for this. TimVickers 16:31, 4 May 2007 (UTC)
It's so flattering that they want to collaborate with us! :)
I tried my hand at crafting an initial draft of Bicoid 3'-UTR regulatory element. Is this what Jennifer had in mind? Unfortunately, I have to travel soon to see my sister graduate from college, so I won't be able to reply for a while! Willow 17:26, 4 May 2007 (UTC)
PS. Everyone should know that X-ray crystallography is the Science Collaboration of the Month for May. I added a bit yesterday, but it could still use lots of work! Thanks, everyone! :)
From the perspective of trying to get a similar daughter "gene wiki" project off the ground, we'd be happy to provide feedback or collaborate on a sibling ncRNA project.  ;) Jennifer, this talk page is probably the best forum to talk over the ideas -- great way to get feedback from the MCB folks, and if you haven't edited in wikitext before, it's a gentle introduction (that's how I got started).
From the gene wiki side, we're pretty close to getting started on the 10-gene test (to expand on the initial ITK (gene) example). The bot is approved for trial, we're almost done extracting the data from our database, and we have a student who is going to take on the coding of the bot. More to come soon! Jennifer, if you have your data in a well-structured format, should be no problem to adjust our bot to populate rfam stubs too. Or, if you want to have a go at it yourself, the Wikipedia:Bot_policy is a good place to start. Cheers, AndrewGNF 20:51, 4 May 2007 (UTC)
Hi Andrew, your bot got approved, that is good news. Look forward to seeing what your intern can do. Willow, did daisy get approved? David D. (Talk) 21:13, 4 May 2007 (UTC)
I'm still proofreading the reference files (I'm on 4/19), so it may be a while yet before Daisy would be ready to apply. She should make the files only once, so I'd like to confirm that the PMID links are all correct, and that every comma, etc. is in the right place.
Could the Right People please sign off on Bicoid 3'-UTR regulatory element, or make recommendations for the Rfam pages? If I don't hear anything soon, I'll just make the full set of 574 families; the code to do so will likely be ready tomorrow sometime. Thanks, Willow 18:17, 8 May 2007 (UTC)
Superb work, thank you. On the bicoid page, that RF00551.jpg image doesn't appear to be uploaded, is that correct? TimVickers 19:12, 8 May 2007 (UTC)
Thanks, Tim! :) I just discovered that the images are themselves already in the public domain; I had been worried that the Rfam people might be mad if we uploaded their images without their permissions. Give me an hour or two and I'll upload them all to the Commons.
Consensus secondary structure for the U12 RNA
P.S. (to everyone here) It's awfully lonely over at X-ray crystallography. There's tons left to be done, and much that doesn't require any expertise, such as wiki-linking, making images, correcting and clarifying wording, etc. Serious bonus points for adding references! :D I'm just dashing stuff off, but it's embarrasingly colloquial and hardly encyclopedic in quality and would really benefit from all your inputs. Thank you, thank you, thank you! :) Willow 19:56, 8 May 2007 (UTC)
You're doing wonders, Tim! :) I've started uploading the Rfam images, which should be done in maybe half an hour or so. The problem is that I don't know how to specify that an image is in the public domain once the image has been uploaded. Does anyone know how to do that? We'll want to add categories, etc. for each page as well, but I think I can handle that tomorrow. Any other thoughts or ideas would be most welcome! :) See you around, I have to dash off to work soon! :) Willow 22:27, 8 May 2007 (UTC)
I figured out how to add the license, summary and category information to the Rfam images, which I did for the first 10 images. Please check out the miage at the right; does its page seem OK? Is there any other information that we should add? Thanks for your time! :) Willow 05:26, 9 May 2007 (UTC)

Hi All at MCB. You have been busy!. The images look great. Sorry for my slow reply I had not anticipated things would kick of quite so quickly.

We are all really pleased to hear your positive response to receiving contributions from Rfam . We have been discussing how we want to impliment this given your previous comments about using the ProteinBoxBot and the example page made for Bicoid 3'-UTR by made by Willow. I think I am slightly unclear exactly how some of this is implimented so please correct me as I go. Some of our comments:

(1) The amount and type of data we have for each of our families will vary greatly. Some of them there is very little literature while others there is a large body of published structural, phylogenetic or expression data. As a result currently we do not invisage the use of template as such as the ITK-gene example template. Initially we thought we would create stubs for a small sample set (10-20) of well known/loved ncRNA familes and encourage some of our research communuity to provide annotation. Depending on the response and the type of data that was provided we thought we would then reconsider a more structured template for our stubs. The use of the ProteinBoxBot would be hugely appreciated when it comes to this.

(2) The Bicoid 3'-UTR example created by Willow is exactly the type of entry we intially imagined we would create. There is other data in our database (GO annotations, database cross references) that can be slurped in later but initially I think this is how we wanted to go. Alex has already added some more annotation for hammerhead (http://en.wikipedia.org/wiki/Hammerhead_ribozyme) and begun conctacted researches for contributions. We are excited to see how the community will respond.

(3) You have uploaded images for all our families already. yes? Was this in preparation for generating stubs for all our families? You seem to have this in hand already? Is this to create stubs from our existing annotations ?

Again we wanted to say we think this is a really exciting project and pleased you are keen for Rfam contributions. Please bear in mind I am new to working in the wikipedia community so I may need pointed in the right direction how this gets co-ordinated. I have further questions about adding ncRNA catagories to the MCB pages but I will wait to hear back from you first. Thanks Jennifer Rfm 09:50, 9 May 2007 (UTC)

Hi Jennifer,
I did upload the Rfam images to the Wikipedia Commons, albeit without their "public domain" license, which was a Wiki-peccadillo that I'm trying to fix. I'm also writing a computer program that translates the Rfam flatfile from its Stockholm format into Wikipedia pages. Once generated, I'll upload the pages and perhaps tweak a few before saving them. The bicoid example was one such page, but now I'm slightly more ambitious. ;) Unfortunately, I had a lot of errands to run today, so the program isn't finished; please be patient! :) Willow 21:50, 9 May 2007 (UTC)
Hi Willow, please don't think my last comments were meant impatiently in anyway. We are suprised and pleased that you are helping us set this in motion so quickly. I had fully expected I would have to deal with getting the images and pages uploaded myself. Given I am only just learning how this works it would take me MUCH longer. Your help is really appreciated. Could you let me know what it is your are planning to do so we can contribute and not duplicate your efforts? also should I move this conversation to my/your talk page and off the proposals? Jennifer 10:41, 10 May 2007 (UTC)
The figure looks good. I have added a bit more text in the figure caption a la the Rfam website, rather than just the undescriptive RF00007. Alexbateman 12:39, 10 May 2007 (UTC)
Dear Jennifer,
Please don't worry at all; I never thought that you were being impatient. I was just being a little impatient with myself, that's all. :( I'm trying to get a lot done before tomorrow (when I leave for a friend's graduation) and my energy is at low ebb, so I'm not as cheery as usual.
I'll be happy to do (or not do) whatever you all at Rfam would like. Any advice or directions would be most welcome, especially since I really am a clueless Chloe about RNA — although I hope to learn more through our collaboration! :) My initial thought was to generate the 574 stubs semi-automatically, and then we all — including your larger RNA community — would gradually refine them by hand. Is that still an OK plan?
Why don't we move to your talk page, where anyone interested can join in the conversation? Talk to you soon, Willow 13:11, 10 May 2007 (UTC)

Recruitment drive and expert review

We need more editors. Could everybody please invite at least one person who they think would be a good contributor to come and join the project? You can also approach this by asking experts to look over a page in their field of interest. So far I've invited a crystallographer from Australia to look over the X-ray crystallography page and several enzymologists to review the enzymes pages. Let's use our contacts! TimVickers 02:38, 9 May 2007 (UTC)

I've always thoguht the the emeritus faculty might be interested. A good example is Kimball who has been moving his book into an online format. It will be interesting to see how Fersht responds to the enzyme article. He must be close to retirement of even retired? David D. (Talk) 16:33, 9 May 2007 (UTC)

He's still listed as current staff on the Departmental webpage, maybe they are refusing to let him go! TimVickers 17:37, 9 May 2007 (UTC)

I just looked on his wikipedia article and he is younger than i expected, although nearing retirement age. But, as we know, many scientists carry on like the energizer bunnies if health permits. David D. (Talk) 17:52, 9 May 2007 (UTC)
Does anyone have a good template letter to invite external experts to contribute to an article? I guess it needs to set the right tone, but also possibly introduce wikipedia in a nutshell. I've sent invites to a couple of people so far, through the ncRNA project we might well ask a lot more Alexbateman 15:38, 18 May 2007 (UTC)

Daisy's Decasmon ;)

Our musical friend Daisy

Hi all,

The beautiful spring weather inspired our musical friend Daisy to create Wikipedia stubs for the 574 families of Rfam today, ten of which I uploaded for her:

Would you be so kind as to look them over and make suggestions? No detail or suggestion is too small or too great, and all will be received gratefully. For example, should we use {{molecular-cell-biology-stub}} or {{molecular-biology-stub}}? A more important point is to come up with a set of keywords to be wiki-linked in the articles, e.g., RNA, gene regulation, intron, etc. Jennifer and the others at Rfam will surely have many ideas, but we should help them by being proactive collaborators, don't you agree? Thank you one and all for your ideas, Willow 02:29, 16 May 2007 (UTC)

Speaking as a quasi-lay person (it's been awhile since I did any biochemistry) these initial articles seem remarkably complicated with little to no context to the subject at large. Will these pages cover only non coding RNA families or will they also cover other structured RNA elements listed in the Rfam database? A quick glance reveals terms such as enzymatically active ribonucleoprotein, microRNA, small nucleolar RNA, Y RNA and amino acid operons - how do these differ from each other, can they be grouped? An introductory sentence along the lines of "Page Name is one of the 574 known families of non-coding RNA..." where "574 known families" is wikilinked to a complete list of known families and "non-coding RNA" is also wikilinked. The next sentence should explain what type of family it belongs too (if relevant), with the remaining article going into more detail hopefully in a similar manner to the other Rfam pages.
I think also templates carefully designed will help link the Rfam articles together and put them in context with one another. Much like the "Nucleic acids" template does, rather than just having one giant list. CheekyMonkey 12:35, 16 May 2007 (UTC)

Those are excellent ideas, CheekyMonkey! :) If we could come up with "boilerplate" prose for various types of RNA families, that might help us explain them better to our readers, setting them in context and significance. Unfortunately, I'm not really the one to come up with those SNPets of prose, although I'll be happy to teach them to Daisy once we've agreed on them.

I will try to give some thought to producing navigational templates, although we might need more than one level (or maybe a show/hide thing) to cover all the relevant families. Willow 18:29, 16 May 2007 (UTC)

Hi Willow. This is really great and the general repsonse from out community about improving the ncRNA annotations in Wikipedia has so far been resoundingly positive. Thank you again for your efforts and heres hoping we will get lots of keen annotators.
RE: comments from CheekyMonkey and providing context and organisation. Yes this is still something we need to address and comments are really appreciated. To explain more: these 10 test stubs are purely meant as starter pages (for families that there is nothing else relevant in wikipedia ) from which we really hope to direct users of our database to edit and provide wider context. For other families eg hammerhead_ribozyme where a relevant entry already exist we have put some effort into improving this page and then linking this to our familes.
We do hope to have a comprehensive representation for all of our families in Wikipeida. One of our aims is to help co-ordinate some sort of organisation and catagorisation for existing RNA entries and the new pages we want to introduce. There has been some new entries to Category:RNA but we feel this needs work. This bring me to Willows comments on Stubs and Categories. Currently the MBC project pages are soley focused on proteins and we would really like to increase the profile of ncRNA ( a growing research field) on these pages. How does the rest of the MCB feel about this? We would at the very least like to introduce the Cat:RNA onto the MCB home page but also more text or a section relating to RNA in order to help direct users to it?
As for a list of keywords to mark up our stubs we can definitely put some effort into generating this Jennifer 13:44, 16 May 2007 (UTC)
I realise it's early days and I'll be watching this project develop with interest. Good luck :o) CheekyMonkey 14:00, 16 May 2007 (UTC)
btw your comments are really appreciated Jennifer 16:18, 16 May 2007 (UTC)
Great job Willow, the pages look fantastic! in response to your queries: either stub template seems fine, I see that most are tagged with the molecular-cell-biology-stub template, I would just keep them consistent. The wikilinks will prove to be very important here: 2° structure, Seed alignment, and Avg identity in the boxes should probably be linked, eukaryotes and Stem-loop (i.e. hairpin), were a few others, but obviously some terms will need to be wikilinked on a case by case basis.
A few other suggestions: I agree that CheekyMonkey's context and organizational suggestions need to be addressed. Also, is there a way to have the bot italicize and link the scientific names of creatures (Bacillus subtilis)? I would guess the bot probably can not do this, but it will need to be done. The references in their current format might be difficult to work with should the articles be expanded, is there any way to make them in the WP:FOOT style? All in all great job, and I look forward to seeing the project bloom!
--DO11.10 16:28, 16 May 2007 (UTC)

Daisy is clever enough to recognize taxonomic names, so it might be feasible for her to italicize and wiki-link them automatically. Daisy was lazy about the references, but she could probably do that as well; would you mind having both a "References" section for the inline citations and a "Bibliography" section where all the pertinent references are listed? Willow 18:29, 16 May 2007 (UTC)

That's okay, but if they are "references" shouldn't they ultimately be used in the article? Also where would you draw the line at "pertinent"? It just seems that a bibliography section might become just a long list of marginally applicable/useful references, how do you sort out the really good ones?--DO11.10 19:07, 16 May 2007 (UTC)
Bot-driven reference integration could possibly be done post-insertion as long as the refs are consistent - mixed/inconsistent formatting has been the roadblock with converting most existing articles automagically. As for keyword-based linking, the bulk of it seems like it could be possible by linking the first occurrence in the article, though the usual English language variations issue will cause some problems. I've made a pass through the first two articles and come up with the following keyword-link list:
I guess the question is what's worth having on a "master keyword list" for a bot versus hand-editing. -- MarcoTolo 17:29, 16 May 2007 (UTC)

Thank you, MarcoTolo! That's exactly the kind of list I was hoping for. I'll try to produce a few myself, following your example. Willow 18:29, 16 May 2007 (UTC)

Would it be useful to create a working page of these, i.e. a master keyword list for Daisy to work with? -- MarcoTolo 18:38, 16 May 2007 (UTC)

That's a great idea for coordinating all our efforts. Here's a start. :) Willow 18:45, 16 May 2007 (UTC)

Okay, I've added more keyword links to the list - that should be a rough cut for all ten of the samples you posted. -- MarcoTolo 19:32, 16 May 2007 (UTC)

Thanks, MarcoTolo, they look great! If everyone agrees, I'll make those terms automatic wikilinks in Daisy's files. Daisy will try hard not to overlink, despite her enthusiasm. ;) Willow 20:09, 16 May 2007 (UTC)

PS. I uploaded a not-too-redundant set of Rfam lines that we can all mine for good wiki-links. There seem to be too many for a master list; perhaps we should just identify the most common ones and fix the others by hand? But the more terms that people define now, the less hand-editing that we'll have later! Willow 20:09, 16 May 2007 (UTC)

A quick look at the types of RNA (I only looked at the place-your-letter-hereRNAs) at the not-too-redundant set of Rfam lines shows that Wikipedia is currently even lacking in this relatively simple department. I've added an alphabetical list at User:WillowW/Daisy Rfam wikilinks as a start but even this is lacking - does "anti sense RNA" have an acronym for instance? What is tmRNA? Hope this helps. CheekyMonkey 12:10, 17 May 2007 (UTC)

MCB daughter Project for ncRNA

Hi All, given our recent discussions about trying to provide some organisation and context for for the nCRNA pages we want to introduce we wondered if the MCB project would be open to having a daughter project for ncRNA (as there is for cell signalling etc). We think it might be a good structure for us to follow and perhaps a good way to ensure this effort doesn't come across as quite so 'Rfam' specific as that really isnt our aim. For us the whole point of this recent efforts is to get the range/depth of RNA annotations expanded and encourage our RNA community. A daughter project front end would be really useful entry site for other users and perhaps more accessible? Jennifer 14:01, 18 May 2007 (UTC)

Either way will probably work. There is not that much traffic here and you may catch more eyes. On the other hand I can see the attraction of having one page to focus the discussion. No one will stop you starting a new wikiproject, I would say go ahead. I'm not sure we could stop you even if we hated the idea ;) Be bold, as they say here, and we'll follow along. David D. (Talk)
OK we've moved in this direction and started a daughter project page. We started using the metabolic pathway sister project page as a template. Anyway for those that want to have a look at our humble beginnings please see Wikipedia:WikiProject_RNA. Thanks to everyone who has already made edits to our pages. So far we have 57 pages touched (about 10% of the total) by 23 different users. There is still a lot to be done!!! Alexbateman 16:24, 14 June 2007 (UTC)

RNA family stubs

Hey all,

I uploaded almost all of the RNA family stubs; I ran out of time to do the last 12 out of the 574. Unfortunately, and I'm really sorry about this, I had trouble doing the automatic wiki-linking and taxonomy recognition; it was more complicated than I expected and I was running out of time since I'm leaving soon for yet another graduation, and I have lots to get ready for. So we'll have to do it by hand. :( I'll add the remaining 12 families and also more re-direct pages once I get a chance to catch my breath. Thanks, all! :) Willow 10:46, 26 May 2007 (UTC)

PS. In addition to wiki-linking, you might want to replace the references with proper inline ones, and add any additional information you have about each family. :) Good luck!

Wow thanks for all your hard work on this. Now its largely in wikipedia it starts to make a lot of sense. We now have a pretty comprehensive list of RNA control elements and genes, having the categories attached is really nice. Now all we have to do is persuade the RNA community out there to help us bring these into a better shape. We already have a few edits made :) Alexbateman 13:21, 29 May 2007 (UTC)

Thank you as well, Alex, for your faith in us and for the improvements to Wikipedia's coverage of RNA, both past and future. It's a fascinating area, as I'm sure you know, and many of us are eager to learn more. :) Please encourage your colleagues to contribute and assure them that they'll receive a warm welcome among us. :)

I've uploaded the last few families, as well as a few images that I had missed somehow. Right now, I'm in the middle of uploading the licensing information for the Rfam images, so they should be safe from the copyright policers on the Commons. Just in time, too, since I'm leaving early tomorrow morning for a few weeks to see another sister graduate. To my MCB friends, could you all please take on, say, 20 Rfam families apiece and spruce them up? We want to be looking our best when the new contributors arrive. ;) I'll tackle a bunch of families when I return as well. Willow 16:21, 29 May 2007 (UTC)

Trial run for the ProteinBoxBot complete

FYI, the previously discussed ProteinBoxBot completed its first trial run last night. Eight pages were created, logged here: User:ProteinBoxBot/PBB_Log_Wiki_Live_Run#Condensed_Log_-_Date:_00:39.2C_13_August_2007_.28UTC.29.

Note that only the eight pages under the "Created Protein Pages" were created. The entries under "Skipped Proteins" and "Redirected Proteins" hit existing articles in the main WP namespace and hence were flagged for manual inspection to merge the new protein box. Will do a couple of those this afternoon. More information on the flow and usage of the bot can be found at User:ProteinBoxBot.

Also note that one page for PPARG was speedily-deleted. Discussion ongoing over on the bot approval page.

Comments and suggestions are most welcome! Also, special acknowledgement of User:JonSDSUGrad, who actually wrote the bot. He'll of course also be actively involved in digesting any feedback and incorporating it into PBB for future runs. AndrewGNF 20:02, 13 August 2007 (UTC)

I've also added two semi-automated edits for Apolipoprotein_E and Amyloid_precursor_protein. These were existing pages that the ProteinBoxBot identified and flagged for manual integration with the new ProteinBox. Comments on these changes are also welcome (in particular, the best way to dealing with the "Summary" section that the ProteinBoxBot adds). Cheers, AndrewGNF 21:54, 13 August 2007 (UTC)
Also, Tim reminded us that adding references would be very beneficial. Easily done, the only problem is selecting which references. For example, take PPARG. From the Bibliography section of the Entrez Gene page, you can click a link to Pubmed which retrieves 431 linked publications. Seems like too many to add to the WP page. If anyone has a suggestion on how to pick which we should add and how many we should add, we're very open to suggestions. Cheers, AndrewGNF 22:00, 13 August 2007 (UTC)

NOTE: For simplicity, let's move all subsequent discussions over to the bot approval page (Yes, I realize it's just been me discussing with myself over here...) Cheers, AndrewGNF 23:37, 13 August 2007 (UTC)

Daisy woke up after a long nap :)

Hi, you might recall my friend Daisy who wanted to make referenced stub articles for every taxon in the NCBI last March? Well, she woke up again and has improved herself slightly. If you have a free moment, please review the nine pages at Category:Archaea taxonomic classes. Daisy would appreciate any advice on her output before she starts making lots of articles. Her plan is to work "top-down", i.e., do all the known phyla, then all the known classes, etc. to keep the scope of her work modest.

She still has a few bugs in her throat, such as the page numbers at Bergey's Manual. Any other alerts to problems or suggestions would be most welcome. At User:David D.'s suggestion, her pages also have two templates, one for database links and the other for references from PubMed, etc. Any improvement in those templates would be most welcome as well! :) Willow 21:15, 14 August 2007 (UTC)

Daisy is to be commended - overall the results are impressive. A few suggestions:
1. Under Further reading, put the Scientific databases sub-section below the journals and books - with the idea that database results may be more-data-than-is-useful-to-readers in many cases.
2. When formatting references, I'd like to see us stick to the PubMed-like style, specifically authors listed as Lastname Intial (e.g. Wilson EO, Chandra SK, Fuji R).
3. If DOIs are available, include them in the ref.
4. Rather than using id = ISBN xxx, the recommendation is now to use isbn = xxx (note lower case); same is true for PMIDs (pmid = xxx).
5. Halobacteria appears to have a parse error when dealing with the [No abstract available] term (Cavalier-Smith T (1986) ref).
6. In Methanobacteria, the second Bergey's ref has an odd way of dealing with a unknown ISBN - perhaps just leave it undisplayed?
-- MarcoTolo 21:42, 14 August 2007 (UTC)
7. (continued from above) The Tree of Life links seems to be erroring out (or is it just me...). -- MarcoTolo 23:41, 14 August 2007 (UTC)

Thanks, Marco! I think I may have fixed all of those errors. For #2, I guess you were objecting to the "and" that preceded the final author's name? I just added four more pages at Category:Archaea taxonomic phyla; any more comments would be welcome. I noticed that the NCBI cites a lot of literature that may not be so germane, but I suppose that we can always delete them afterwards. Willow 17:57, 15 August 2007 (UTC)

These look very good indeed. Great job! Tim Vickers 18:03, 15 August 2007 (UTC)
The "and" is part of it - the rest is a function of using the last, first, and coauthor tags rather than a blanket author. <Beginning personal bias> For academic publications - especially scientific papers - the notion of "author" versus "coauthor" can get messy. Rather than sort it out, I find it better to use a blanket "author" tag, leaving the niceties of author position up to the individual disciplines. For example, rather than
{{ cite journal | last = Palys | first = T | coauthors = Nakamura LK, Cohan FM | date = 1997 | title = Discovery and classification of ecological diversity in the bacterial world: the role of DNA sequence data | journal = Int. J. Syst. Bacteriol. | volume = 47 | pages = 1145–1156 | pmid = 9336922}}
which displays as
Palys, T; Nakamura LK, Cohan FM (1997). "Discovery and classification of ecological diversity in the bacterial world: the role of DNA sequence data". Int. J. Syst. Bacteriol. 47: 1145–1156. PMID 9336922. 
I prefer something like this
{{cite journal |author=Palys T, Nakamura LK, Cohan FM |title=Discovery and classification of ecological diversity in the bacterial world: the role of DNA sequence data |journal=Int. J. Syst. Bacteriol. |volume=47 |issue=4 |pages=1145–56 |year=1997 |pmid=9336922}}
which displays this way
Palys T, Nakamura LK, Cohan FM (1997). "Discovery and classification of ecological diversity in the bacterial world: the role of DNA sequence data". Int. J. Syst. Bacteriol. 47 (4): 1145–56. PMID 9336922. 
In addition to avoiding extraneous "." after initials, this avoids the odd ";" versus "," usage that the cite template uses to separate authors and coauthors. </Beginning personal bias> I realize this is becoming rambling and probably much more than you wanted to hear - thanks for humoring me (regardless of whether you choose to use any of this). In any case, Daisy is doing great work - keep it up. -- MarcoTolo 18:20, 15 August 2007 (UTC)

That sounds fine to me and Daisy both. She's a very fair-minded passerine and I've never liked that peculiar semicolon and the reversal of names. Daisy had a little trouble learning to replace the - with – but otherwise she's ready to sing her next audition. Does anyone have any preferences? Perhaps the phyla of the Bacteria? the classes of the Fungi? something coloratura from Mozart? I have to dash off to work, but I'll upload her songs tomorrow. :) Willow 21:51, 15 August 2007 (UTC)

OK, please start at the genera Pyrococcus, Sulfolobus and Vulcanisaeta and work your way up the taxonomic ladder, via the taxobox. Do all the references, etc. seem good to you? We probably won't go any lower than the genera, at least for now. Thanks, all! :) Willow 21:35, 17 August 2007 (UTC)
PS. What does the "sp." stand for in species? Do we need to list them, e.g., in Thermococcus? Willow 22:40, 17 August 2007 (UTC)
Wow, a lot of redlinks here. Rhodobacteraceae Tim Vickers 00:51, 20 August 2007 (UTC)
There's fewer now. ;) How do they look? I was worried about the proper categories for them; what do you think? Willow 19:54, 20 August 2007 (UTC)
Looks good! Tim Vickers 20:02, 20 August 2007 (UTC)
I finished all the Rhodobacterales families and genera and all the Archaea up to the families. Would someone please look them over and see if Daisy is making any consistent mistakes? For example, in the taxoboxes on the genus pages, should she abbreviate the genus in the species names, e.g., R. sphaeroides in Rhodobacter? Any guidance on whether the pages are good and whether this approach is worth pursuing further would be most welcome. She'll probably add the Archaea genera next, but we'll wait to hear back from you all re:the species names. Thanks! :) Willow 23:32, 21 August 2007 (UTC)

Wow. Amazing work, thank you. One worry is that does the output list ALL the spp in a genus? That might get pretty unwieldy in bacteria. Tim Vickers 23:40, 21 August 2007 (UTC)

Ummm, Daisy lists all the species that the NCBI knows about. I'm sorry, but neither of us knows what a "spp" is; we're also confused about the meaning of all those sp. BLAH species, as in Thermococcus; do we need to include them? What do they mean? Are they just codes for laboratory species that haven't been named properly yet? or maybe not identified as one of the named species yet? Confused, Daisy and Willow 23:47, 21 August 2007 (UTC)
Couple of quick points: A) I think abbreviating the genus name in the taxobox is a Good Idea - saves a ton of space, if noting else; B) the Rhodobacter article has a id = ISBN attribute in one of the cite templates - it should be isbn =. Will comb through the rest of Rhodobacterales when I get a chance. -- MarcoTolo 00:13, 22 August 2007 (UTC)
Hi all, Daisy may have finished all of the genera of the Archaea. Please check out some of the 98 genera at Category:Archaea genera and please, please let us know if we should be doing something different! Daisy did abbreviate the genus name to its first letter in the species list in the genus taxoboxes, and also kept all the "sp. XYZ-12" species as well. Chirpy and cheery, Daisy and Willow 19:44, 29 August 2007 (UTC)
Fantastic work with bacterial systematics including correct categories!Biophys (talk) 06:14, 19 November 2007 (UTC)

Affiliated organisations

Hi there. I'm reading more and more about other Wiki-based biology/molecular biology/biochemistry organisations on the web. As a way to raise all of our profiles and share expertise, are there any objections to me creating a link to a new "Related projects and resources" page from our homepage that can trade links with these organisations and provide an overview of who is doing what? Tim Vickers 18:26, 15 August 2007 (UTC)

Go for it. I'd like to see what other WikiFolks are up to - and the MCB project page could use some new content.... -- MarcoTolo 18:37, 15 August 2007 (UTC)

Page created at Wikipedia:WikiProject Molecular and Cellular Biology/Related projects and resources. Feel free to add links! Tim Vickers 19:45, 16 August 2007 (UTC)

Cheers Tim. I was working on such a page on my own project here http://biodatabase.org/index.php/Related_projects_of_interest - I put that link here because that page of my project is directly trying to assess the same aim as this proposal (just in case you thought I was spamming a bit too hard!) I already found a mountain of related work, and I haven't even been looking very hard! I think this is a great proposal - A well organized well maintained and (near) comprehensive archive of 'Affiliated organisations' as you put it (or simply 'similar work') would be a really great resource. Now that I think about it, it is essential for any scientific project! Keep up the good work! --Dan|(talk) 11:37, 17 August 2007 (UTC)

Introduction to...

In response to a comment on the FAC review of the Oxidative phosphorylation article, I am wondering if it might be a good idea for us to produce an introductory article on biochemistry - similar to Introduction to genetics. We could link to it from the top of our main articles and have it give a broad overview of the subject. I've started a list of what would be included in my sandbox. (Link). Suggestions welcome! Tim Vickers 19:47, 26 August 2007 (UTC)

Assessment of Daisy's articles?

I was thinking of adding assessments to the Talk pages of Daisy's bacterial and archaeal taxonomic articles, but I was uncertain which Wikiproject(s) and which ratings would be best. Do you all think they'd be appropriate for the Microbiology WikiProject or maybe the Tree of Life one — or maybe this one? I was thinking of rating them as "stubs" and of "mid" importance; but maybe they should be of "low" importance, since only an expert would want to learn more about Thermococcus? Suggestions would be most welcome! :) Willow 22:17, 31 August 2007 (UTC)

Does anyone have a suggestion for me? My hands are flying blind, like little chiropters. ;) Willow 23:48, 6 September 2007 (UTC)

Link proposal for the "reporter gene" page

Hi all, I'm the editor of Reportergene. Such site, that to date have the form of a blog, aims to be a repository of updated informations about reporter genes, catched from research highlights. To do such work, I review up to 50 journals other than searching in literature databases for appropriate keywords. In fact, I'm a molecular pharmacologist doing research in academia, and I need the work previously described to stay updated (I'm not going to blindly promote my site). In the best of my knowledge, no open journals are devoted mainly to reporter genes, and usually new developments in reporter gene technology comes from historically separated fields (chemistry, medicine, physics, optics... other than obviously molecular biology). I strongly believe that my work could be very useful to people reading the wikipedia "reporter gene" page, so last month I tried to insert Reportergene link as a external link, but the Evil Spartan rejected my insertion, suggestin to have a discussion with you. Here I am. I think the site I'm developing complies with several wikipedia guidelines. Please check [4].

faithfully yours, 96well

Partnership with ACS

Hi there, we have a proposal from ACS Chemical Biology to form a partnership with our Wikiproject. This would involve at the beginning the MCB logo on this webpage and them featuring Wikipedia articles as links from their website (probably as "See also" links from relevant papers). Does anybody have any objection to this happening, or do you think this is OK? If this sounds good, does anybody have any other ideas as to how we can cooperate in the future? Tim Vickers (talk) 17:57, 5 January 2008 (UTC)

Do people think we